Skip to main content
Log in

A new transfer learning framework with application to model-agnostic multi-task learning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In case of a general nonlinear model, \(d\le N_{t'}\). For linear models, assuming a linearly independent set of data in task \(t'\), \(d={\text {min}}\left( M,N_{t'}\right) \).

  2. http://au.mathworks.com/matlabcentral/fileexchange/31036-random-forest.

  3. The underestimations may be noticed in magnitude of relatedness, irrespective of its sign, i.e. positive relatedness values are often estimated as lower positive values, while negative relatedness values are often estimated as lower negative values.

  4. Ethics approval obtained through University and the hospital—12/83.

  5. http://www.who.int/classifications/icd10/.

References

  1. Aggarwal CC, Yu PS (2008) A general survey of privacy-preserving data mining models and algorithms. Springer, Berlin

    Google Scholar 

  2. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Google Scholar 

  3. Baxter J (2000) A model of inductive bias learning. J Artif Intell Res (JAIR) 12:149–198

    MathSciNet  MATH  Google Scholar 

  4. Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. pp 567–580

  5. Bickel S, Brückner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. In: Proceedings of the 24th international conference on machine learning, ACM, pp 81–88

  6. Bonilla EV, Chai KM, Williams C (2007) Multi-task Gaussian process prediction. In: Advances in neural information processing systems, pp 153–160

  7. Bonilla EV, Agakov FV, Williams C (2007) Kernel multi-task learning using task-specific features. In: International conference on artificial intelligence and statistics, pp 43–50

  8. Bonilla EV, Kian CMA, Williams CKI (2007) Multi-task gaussian process prediction. In: Nips, vol 20, pp 153–160

  9. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MathSciNet  MATH  Google Scholar 

  10. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    MathSciNet  Google Scholar 

  11. Chen M, Weinberger KQ, Blitzer J (2011) Co-training for domain adaptation. In: NIPS, pp 2456–2464

  12. Clifton C, Kantarcioǧlu M, Doan A, Schadow G, Vaidya J, Elmagarmid A, Suciu D (2004) Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, ACM, pp 19–26

  13. Dai W, Xue G-R, Yang Q, Yu Y (2007) Transferring naive bayes classifiers for text classification. In: Proceedings of the twenty-second AAAI conference on artificial intelligence, vol 22, AAAI Press, p 540

  14. Dai W, Yang Q, Xue G-R, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the 24th international conference on machine learning, ACM, pp 193–200

  15. Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Processing of the 25th conference on uncertainty in artificial intelligence, pp 135–142

  16. Daume III H, Marcu D (2006) Domain adaptation for statistical classifiers. J Artif Intell Res, pp 101–126

  17. Davis J, Domingos P (2009) Deep transfer via second-order markov logic. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 217–224

  18. Evgeniou A, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, vol 19, The MIT Press, p 41

  19. Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res, 615–637

  20. Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 109–117

  21. Fung BCM, Wang K, Yu PS (2007) Anonymizing classification data for privacy preservation. Knowl Data Eng IEEE Trans 19(5):711–725

    Google Scholar 

  22. Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 283–291

  23. Geibel P, Brefeld U, Wysotzki F (2003) Learning linear classifiers sensitive to example dependent and noisy costs. In: Advances in intelligent data analysis V, Springer, pp 167–178

  24. Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 895–903

  25. Gupta SK, Phung D, Adams B, Venkatesh S (2013) Regularized nonnegative shared subspace learning. Data Min Knowl Discov 26(1):57–97

    MathSciNet  MATH  Google Scholar 

  26. Gupta SK, Phung D, Venkatesh S (2012) A slice sampler for restricted hierarchical beta process with applications to shared subspace learning. In: Proceedings of the twenty-eighth conference on uncertainty in artificial intelligence, Catalina Island, CA, USA, 14–18 Aug 2012, pp 316–325

  27. Gupta SK, Phung D, Venkatesh S (2013) Factorial multi-task learning: a bayesian nonparametric approach. In: International conference on machine learning, pp 657–665

  28. Gupta SK, Rana S, Phung D, Venkatesh S (2015) Collaborating differently on different topics: A multi-relational approach to multi-task learning. In: Advances in knowledge discovery and data mining, Ho Chi Minh City, Vietnam. Springer, Berlin Heidelberg, pp 303–316

  29. Gupta SK, Rana S, Phung D, Venkatesh S (2015) What shall I share and with whom? A multi-task learning formulation using multi-faceted task relationships. In: Proceedings of the SIAM international conference on data mining, Vancouver, Canada, pp 703–711

  30. Jawanpuria P, Nath JS (2012) A convex feature learning formulation for latent task structure discovery. In: Proceedings of the 29th international conference on machine learning (ICML)

  31. Jebara T (2004) Multi-task feature and kernel selection for svms. In: Proceedings of the twenty-first international conference on machine learning, ACM, p 55

  32. Kang Z, Grauman K, Sha F (2011) Learning with whom to share in multi-task feature learning. In: Proceedings of the 28th international conference on machine learning, pp 521–528

  33. Kumar A, Daumé III H (2012) Learning task grouping and overlap in multi-task learning. In: International conference on machine learning (ICML)

  34. Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the twenty-first international conference on machine learning, ACM, p 65

  35. Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems, pp 801–808

  36. Lee S-I, Chatalbashev V, Vickrey D, Koller D (2007) Learning a meta-level prior for feature relevance from multiple related tasks. In: Proceedings of the 24th international conference on machine learning, ACM, pp 489–496

  37. Lenarcik A, Piasta Z (1998) Rough classifiers sensitive to costs varying from object to object. In: Rough sets and current trends in computing, Springer, pp 222–230

  38. Lenk PJ, De Sarbo WS, Green PE, Young MR (1996) Hierarchical bayes conjoint analysis: recovery of partworth heterogeneity from reduced experimental designs. Mark Sci 15(2):173–191

    Google Scholar 

  39. Li S (2011) Concise formulas for the area and volume of a hyperspherical cap. Asian J Math Stat 4(1):66–70

    MathSciNet  Google Scholar 

  40. Liao X, Xue Y, Carin L (2005) Logistic regression with an auxiliary data source. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 505–512

  41. Ling X, Dai W, Xue G-R, Yang Q, Yu Y (2008) Spectral domain-transfer learning. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 488–496

  42. Mardia KV, Jupp PE (2009) Directional statistics, vol 494. Wiley, New York

    MATH  Google Scholar 

  43. McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752, Citeseer, pp 41–48

  44. Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: AAAI, vol 7, pp 608–614

  45. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Google Scholar 

  46. Passos A, Rai P, Wainer J, Daume III H (2012) Flexible modeling of latent task structures in multitask learning. arXiv preprint arXiv:1206.6486

  47. Pavlov D, Balasubramanyan R, Dom B, Kapur S, Parikh J (2004) Document preprocessing for naive bayes classification and clustering with mixture of multinomials. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 829–834

  48. Pearl J (2012) Some thoughts concerning transfer learning, with applications to meta-analysis and data-sharing estimation. Technical report, Technical Report Technical Report r-387, cognitive systems laboratory, Department of Computer Science, UCLA

  49. Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. Citeseer

  50. Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: proceedings of the 24th international conference on machine learning, ACM, pp 759–766

  51. Saha B, Gupta SK, Phung D, Venkatesh S (2014) Multiple task transfer learning with small sample sizes. In: Knowledge and information systems, pp 1–28

  52. Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244

    MathSciNet  MATH  Google Scholar 

  53. Thrun S (1996) Learning to learn: introduction. In: Learning to learn, Citeseer

  54. Van Belle VMCA, Van Calster B, Timmerman D, Bourne T, Bottomley C, Valentin L, Neven P, Van Huffel S, Suykens JAK, Boyd S (2012) A mathematical model for interpretable clinical decision support with applications in gynecology. PloS one 7(3):e34312

    Google Scholar 

  55. Wang Q, Zhang L, Chi M, Guo J (2008) MTForest: ensemble decision trees based on multi-task learning. In: European conference on artificial intelligence (ECAI), pp 122–126

  56. Wang Z, Song Y, Zhang C (2008) Transferred dimensionality reduction. In: machine learning and knowledge discovery in databases, Springer, pp 550–565

  57. Wu P, Dietterich TG (2004) Improving svm accuracy by training on auxiliary data sources. In: Proceedings of the twenty-first international conference on machine learning, ACM, p 110

  58. Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with dirichlet process priors. J Mach Learn Res 8:35–63

    MathSciNet  MATH  Google Scholar 

  59. Yang J, Yan R, Hauptmann AG (2007) Cross-domain video concept detection using adaptive svms. In: Proceedings of the 15th international conference on multimedia, pp 188–197

  60. Yu K, Tresp V, Schwaighofer A (2005) Learning gaussian processes from multiple tasks. In: Proceedings of the 22nd international conference on Machine learning, ACM, pp 1012–1019

  61. Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 114

  62. Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Third IEEE international conference on data mining, 2003 (ICDM 2003), IEEE, pp 435–442

  63. Zhang Y, Yeung D-Y (2010) A convex formulation for learning task relationships in multi-task learning. In: UAI, pp 733–442

  64. Zhou J, Sun J, Liu Y, Hu J, Ye J (2013) Patient risk prediction model via top-k stability selection. In: SIAM conference on data mining. SIAM

  65. Zhu J, Chen N, Xing EP (2011) Infinite latent svm for classification and multi-task learning. In: NIPS, pp 1620–1628

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunil Gupta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Rana, S., Saha, B. et al. A new transfer learning framework with application to model-agnostic multi-task learning. Knowl Inf Syst 49, 933–973 (2016). https://doi.org/10.1007/s10115-016-0926-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0926-z

Keywords

Navigation