Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious

Nguyen, Dang; Luo, Wei; Phung, Dinh; Venkatesh, Svetha

doi:10.1007/978-3-319-50127-7_39

Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious

Dang Nguyen²¹,
Wei Luo²¹,
Dinh Phung²¹ &
…
Svetha Venkatesh²¹

Conference paper
First Online: 29 November 2016

3229 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Abstract

Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening.

This is a preview of subscription content, log in via an institution.

References

Timna, J., Marc, M., Henrietta, C.: Primary-aged students in private schools perform only slightly better: NAPLAN. The Age Victoria, July 2015. http://goo.gl/hQ1q8V
Luo, W., Cao, J., Gallagher, M., Wiles, J.: Estimating the intensity of ward admission and its effect on emergency department access block. Stat. Med. 32(15), 2681–2694 (2013)
Article MathSciNet Google Scholar
Bay, S., Pazzani, M.: Detecting group differences: mining contrast sets. Data Min. Knowl. Disc. 5(3), 213–246 (2001)
Article MATH Google Scholar
Neubarth, K., Conklin, D.: Contrast pattern mining in folk music analysis. In: Meredith, D. (ed.) Computational Music Analysis, pp. 393–424. Springer, New York (2016)
Chapter MATH Google Scholar
Hilderman, R., Peckham, T.: Statistical methodologies for mining potentially interesting contrast sets. In: Guillet, F.J., Hamilton, H.J. (eds.) Quality Measures in Data Mining, vol. 43, pp. 153–177. Springer, Heidelberg (2007)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, ser. VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Bay, S., Pazzani, M.: Detecting change in categorical data: mining contrast sets. In: The 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302–306. ACM (1999)
Google Scholar
Simeon, M., Hilderman, R.: COSINE: a vertical group difference approach to contrast set mining. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 359–371. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21043-3_43
Chapter Google Scholar
Simeon, M., Hilderman, R., Hamilton, H.: Mining interesting correlated contrast sets. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXIX, pp. 49–62. Springer, London (2012)
Google Scholar
Nguyen, D., Nguyen, L.T., Vo, B., Hong, T.-P.: A novel method for constrained class association rule mining. Inf. Sci. 320, 107–125 (2015)
Article MathSciNet MATH Google Scholar
Jabbar, M.S., Zaïane, O.R.: Learning statistically significant contrast sets. In: Khoury, R., Drummond, C. (eds.) AI 2016. LNCS (LNAI), vol. 9673, pp. 237–242. Springer, Heidelberg (2016). doi:10.1007/978-3-319-34111-8_29
Chapter Google Scholar
Suzuki, E.: Autonomous discovery of reliable exception rules. In: KDD, vol. 97, pp. 159–176 (1997)
Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Liu, G., Zhang, H., Wong, L.: Controlling false positives in association rule mining. Proc. VLDB Endow. 5(2), 145–156 (2011)
Article Google Scholar
Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance. In: Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), pp. 5–12. EUROSIS (2008)
Google Scholar
Geng, L., Hamilton, H.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)
Article Google Scholar
Chapman, S., McLeod, K., Wakefield, M., Holding, S.: Impact of news of celebrity illness on breast cancer screening: Kylie Minogue’s breast cancer diagnosis. Med. J. Aust. 183(5), 247–250 (2005)
Article Google Scholar

Download references

Acknowledgment

This work is partially supported by the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning.

Author information

Authors and Affiliations

School of Information Technology, Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, Australia
Dang Nguyen, Wei Luo, Dinh Phung & Svetha Venkatesh

Authors

Dang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Dinh Phung
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dang Nguyen .

Editor information

Editors and Affiliations

University of Tasmania, Hobart, Australia
Byeong Ho Kang
Auckland University of Technology, Auckland, New Zealand
Quan Bai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, D., Luo, W., Phung, D., Venkatesh, S. (2016). Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-50127-7_39
Published: 29 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics