Abstract
Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening.
This is a preview of subscription content, log in via an institution.
References
Timna, J., Marc, M., Henrietta, C.: Primary-aged students in private schools perform only slightly better: NAPLAN. The Age Victoria, July 2015. http://goo.gl/hQ1q8V
Luo, W., Cao, J., Gallagher, M., Wiles, J.: Estimating the intensity of ward admission and its effect on emergency department access block. Stat. Med. 32(15), 2681–2694 (2013)
Bay, S., Pazzani, M.: Detecting group differences: mining contrast sets. Data Min. Knowl. Disc. 5(3), 213–246 (2001)
Neubarth, K., Conklin, D.: Contrast pattern mining in folk music analysis. In: Meredith, D. (ed.) Computational Music Analysis, pp. 393–424. Springer, New York (2016)
Hilderman, R., Peckham, T.: Statistical methodologies for mining potentially interesting contrast sets. In: Guillet, F.J., Hamilton, H.J. (eds.) Quality Measures in Data Mining, vol. 43, pp. 153–177. Springer, Heidelberg (2007)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, ser. VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Bay, S., Pazzani, M.: Detecting change in categorical data: mining contrast sets. In: The 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302–306. ACM (1999)
Simeon, M., Hilderman, R.: COSINE: a vertical group difference approach to contrast set mining. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 359–371. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21043-3_43
Simeon, M., Hilderman, R., Hamilton, H.: Mining interesting correlated contrast sets. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXIX, pp. 49–62. Springer, London (2012)
Nguyen, D., Nguyen, L.T., Vo, B., Hong, T.-P.: A novel method for constrained class association rule mining. Inf. Sci. 320, 107–125 (2015)
Jabbar, M.S., Zaïane, O.R.: Learning statistically significant contrast sets. In: Khoury, R., Drummond, C. (eds.) AI 2016. LNCS (LNAI), vol. 9673, pp. 237–242. Springer, Heidelberg (2016). doi:10.1007/978-3-319-34111-8_29
Suzuki, E.: Autonomous discovery of reliable exception rules. In: KDD, vol. 97, pp. 159–176 (1997)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)
Liu, G., Zhang, H., Wong, L.: Controlling false positives in association rule mining. Proc. VLDB Endow. 5(2), 145–156 (2011)
Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance. In: Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), pp. 5–12. EUROSIS (2008)
Geng, L., Hamilton, H.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)
Chapman, S., McLeod, K., Wakefield, M., Holding, S.: Impact of news of celebrity illness on breast cancer screening: Kylie Minogue’s breast cancer diagnosis. Med. J. Aust. 183(5), 247–250 (2005)
Acknowledgment
This work is partially supported by the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Nguyen, D., Luo, W., Phung, D., Venkatesh, S. (2016). Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-50127-7_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)