Skip to main content
Log in

Multiple task transfer learning with small sample sizes

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Prognosis, such as predicting mortality, is common in medicine. When confronted with small numbers of samples, as in rare medical conditions, the task is challenging. We propose a framework for classification with data with small numbers of samples. Conceptually, our solution is a hybrid of multi-task and transfer learning, employing data samples from source tasks as in transfer learning, but considering all tasks together as in multi-task learning. Each task is modelled jointly with other related tasks by directly augmenting the data from other tasks. The degree of augmentation depends on the task relatedness and is estimated directly from the data. We apply the model on three diverse real-world data sets (healthcare data, handwritten digit data and face data) and show that our method outperforms several state-of-the-art multi-task learning baselines. We extend the model for online multi-task learning where the model parameters are incrementally updated given new data or new tasks. The novelty of our method lies in offering a hybrid multi-task/transfer learning model to exploit sharing across tasks at the data-level and joint parameter learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. A detail description on optimization methods can be found in [25].

  2. Ethics approval obtained through University and the hospital—12/83.

  3. http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html.

References

  1. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Article  Google Scholar 

  2. Argyriou A, Pontil M, Ying Y, Charles MA (2007) A spectral regularization framework for multi-task structure learning. In: Advances in neural information processing systems, pp 25–32

  3. Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New York

    MATH  Google Scholar 

  4. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 92–100

  5. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learning 3(1):1–122

    Article  Google Scholar 

  6. Chelba C, Acero A (2006) Adaptation of maximum entropy capitalizer: little data can help a lot. Comput speech Lang 20(4):382–399

    Article  Google Scholar 

  7. Chen M, Weinberger KQ, Blitzer J (2011) Co-training for domain adaptation. In: NIPS, pp 2456–2464

  8. Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York

    Google Scholar 

  9. Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence, pp 135–142

  10. Duan L, Xu D, Tsang IW (2012) Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans Neural Netw Learn Syst 23(3):504–518

    Article  Google Scholar 

  11. Eaton E, Ruvolo PL (2013) Ella: an efficient lifelong learning algorithm. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 507–515

  12. Evgeniou A, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems: proceedings of the 2006 conference, vol 19. The MIT Press, p 41

  13. Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 109–117

  14. Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 283–291

  15. Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: Proceedings of the 18th ACM SIGKDD. ACM, pp 895–903

  16. Gupta S, Phung D, Venkatesh S (2012) A bayesian nonparametric joint factor model for learning shared and individual subspaces from multiple data sources. In: Proceedings of the SDM, pp 200–211

  17. Gupta S, Phung D, Venkatesh S (2013) Factorial multi-task learning: a bayesian nonparametric approach. In: Proceedings of international conference on machine learning, pp 657–665

  18. Hastie T, Tibshirani R, Jerome J, Friedman H (2001) The elements of statistical learning, vol 1. Springer, New York

    Book  MATH  Google Scholar 

  19. Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Neural information processing systems, pp 964–972

  20. Jebara T (2004) Multi-task feature and kernel selection for SVMs. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 55

  21. Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 457–464

  22. Kang Z, Grauman K, Sha F (2011) Learning with whom to share in multi-task feature learning. In: Proceedings of the 28th international conference on machine learning, pp 521–528

  23. Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 65

  24. Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 339–348

  25. Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 547–556

  26. Nemirovski A (2005) Efficient methods in convex programming. Lecture Notes. http://www2.isye.gatech.edu/~nemirovs/

  27. Nesterov Y, Nesterov UE (2004) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin

    Google Scholar 

  28. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  29. Rai P, Daume H (2010) Infinite predictor subspace models for multitask learning. In: International conference on artificial intelligence and statistics, pp 613–620

  30. Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264

    Article  Google Scholar 

  31. Schmidt M (2010) Graphical model structure learning with l1-regularization. PhD thesis, The University of British Columbia

  32. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288

    MATH  MathSciNet  Google Scholar 

  33. Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with dirichlet process priors. J Mach Learn Res 8:35–63

    MATH  MathSciNet  Google Scholar 

  34. Yang H, Lyu MR, King I (2013) Efficient online learning for multitask feature selection. ACM Trans Knowl Discov Data 7(2):6:1–6:27

    Article  Google Scholar 

  35. Zhang Y, Yeung D-Y (2010) A convex formulation for learning task relationships in multi-task learning. In: UAI, pp 733–442

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Budhaditya Saha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, B., Gupta, S., Phung, D. et al. Multiple task transfer learning with small sample sizes. Knowl Inf Syst 46, 315–342 (2016). https://doi.org/10.1007/s10115-015-0821-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0821-z

Keywords

Navigation