Mining permission patterns for contrasting clean and malicious android applications

https://doi.org/10.1016/j.future.2013.09.014Get rights and content

Highlights

  • A pattern mining algorithm is proposed to identify contrast permission patterns.

  • We collected a new dataset with 1227 clean Android applications.

  • We considered both required and used permission.

  • Biclustering method has been employed to provide visualization.

  • The permission patterns show big contrasts between clean apps and malware.

Abstract

An Android application uses a permission system to regulate the access to system resources and users’ privacy-relevant information. Existing works have demonstrated several techniques to study the required permissions declared by the developers, but little attention has been paid towards used permissions. Besides, no specific permission combination is identified to be effective for malware detection. To fill these gaps, we have proposed a novel pattern mining algorithm to identify a set of contrast permission patterns that aim to detect the difference between clean and malicious applications. A benchmark malware dataset and a dataset of 1227 clean applications has been collected by us to evaluate the performance of the proposed algorithm. Valuable findings are obtained by analyzing the returned contrast permission patterns.

Introduction

Smartphone is used to describe a mobile device equipped with enhanced computing capability and connectivity  [1], such as Nexus by Google   [2], iPhone by Apple   [3], Blackberry by RIM   [4] and Windows Phone by Microsoft   [5]. In the past few years, the global telephony industry has witnessed an upsurge in the sales of smartphones. A smartphone is usually sold with an in-built mobile operating system (OS) together with a number of pre-installed “applications” packaged by the device manufacturer. An application, the software running on smartphones, enhances the smartphone’s functionality and supports the interaction with end users to accomplish their tasks. Calendar, address book, alarm clock, media player and web browser are the common applications provided by the device manufacturers, but one important application exists on every smartphone—the “application store”, which allows end users to access online application markets to browse and download additional applications of their choice.

Every device manufacturer hosts an application market for its own OS platform, such as Apple’s App Store   [6], Blackberry’s App World   [7] and Google Play   [8]. However, far before the first official application markets were introduced in 2008 by Apple, smartphone application distribution was highly dependent on third-party sources, where individual application developers were free to upload their products. Due to a huge number of low-price applications being available, there is still a large group of end users who prefer visiting third-party application markets, but not all the applications from markets are “safe”. The software that is specially designed to harm a device, its OS or other software is called “Malware”, which stands for malicious software   [9]. The increasing sales of smartphones has pushed the rapid growth of mobile malware.

As pointed out by Zhou and Jiang  [10], malware or malicious applications might cause a series of user unexpected operations, for example, stealing user’s personal information, making calls or sending an SMS without the user’s knowledge. Such malicious behaviors not only cost users extra data usage, but also potentially bring privacy issues. Furthermore, the users may not be aware of running malware on their smartphones because in many cases the malware are downloaded and/or installed without authorization. Accordingly, an efficient and effective malware detection technique is highly demanded to protect smartphone users from the potential prevalence.

To effectively detect malware from millions of applications available on official and third-party markets, many efforts have contributed to studying the nature of smartphone platforms and their applications in the past decade. As the most popular mobile platform, Google’s Android overtook others to be the top mobile malware platform. The Android platform employs the permission system to restrict applications’ privileges to secure the users’ privacy-relevant resources  [11]. An application needs to get a user’s approval for the requested permissions to access the privacy-relevant resources. Thus, the permission system was designed to protect users from applications with invasive behaviors, but its effectiveness highly depends on the user’s comprehension of permission approval. We refer to the permissions that are requested during application installation as required permissions. Unfortunately, not all the users read or understand the warnings of required permissions shown during installation. To improve this situation, many researchers have tried to interpret Android permissions and their combinations  [12], [13], [14], [15]. Frank et al.  [11] proposed a probability model to identify the common required permission patterns for all Android applications. Zhou and Jiang  [10] listed the top required permissions for both clean and malicious applications, but only individual permissions were considered by frequency counting. A problem is still remaining of whether the patterns in a permission combination can provide better performance for malware detection. Furthermore, in the existing literature, only the required permissions are considered in permission pattern mining, no work has incorporated the used permissions that are extracted from static analysis by the Andrubis system (http://anubis.iseclab.org)  [16]. Therefore, we are the first group to explore both the required and used permissions. Accordingly, our aim is to propose an efficient pattern mining method to identify a set of contrast permission patterns that effectively distinguish malware from the clean applications.

By using a pattern mining technique to identify the desired permission patterns, we need two datasets: one has only clean Android applications and the other contains all malicious ones. In 2012, Zhou and Jiang  [10] published the first benchmark dataset of malicious applications in 49 malware families, which was collected from third-party markets between August 2010 and October 2011. This is an ideal malware dataset for our experiments. On the other hand, due to the lack of a dataset of clean applications published at the same time period as Zhou and Jiang’s, we collected our own clean dataset. The clean applications were collected from two popular third-party Android applications markets: SlideME (http://slideme.org) and Pandaapp (http://android.pandaapp.com). We sorted the collected applications based on the times of their download and the ratings given by the users, and only the top ones were picked. Each application was scanned by forty-three antivirus engines on VirusTotal (https://www.virustotal.com)  [17], and only the ones that passed all virus tests were considered as “clean” and kept to form the clean dataset. These clean applications do not impede on the smooth execution of the OS. Like Zhou and Jiang, we represent applications in the collected clean dataset using a vector of 130 binary values, each of which is associated with one of the 130 official Android permissions. A value 1 is assigned to a permission only if it is required or used by an application, otherwise, 0 is given instead.

The novelty and contributions of this work can be summarized as follows:

  • We collected a new dataset that contains 1227 clean applications that were uploaded to third-party markets from August 2010 to October 2011.

  • Beyond the current studies that focused on required permissions only, we also considered the used permissions.

  • We utilized a hierarchical Biclustering method to initially analyze both clean and malware datasets. The obtained resulting figures provided a straightforward preview of the data distribution, from which we built up our model of mining a set of permissions rather than using individual permissions as the patterns.

  • We proposed a contrast permission pattern mining algorithm to identify the interesting permission sets that can be used to distinguish applications from malicious to clean.

  • Our demonstration of the proposed Contrast Permission Pattern Mining proved that both required and used permissions should be considered in late malware detection tasks.

The rest of the paper is organized as follows: Section  2 briefly reviews the concepts of the Android platform, its applications, the permission system and the current research work in malware detection. In Section  3, we present our initial analysis on the collected datasets using a statistical method and biclustering followed by the proposed contrast pattern mining algorithm. The experiments and the obtained results are then reported in Section  4 followed by a further discussion on findings. Finally, Section  5 concludes the entire paper together with our future work.

Section snippets

Android

Android is a Linux-based OS which was designed and developed by the Open Handset Alliance in 2007  [18]. The Android platform is made up of multiple layers consisting of the OS, the Java libraries and the basic built-in applications  [19]. Additional applications can be downloaded and installed from either official or third-party markets.

Google provides the application developer community with a Software Development Kit (SDK)  [20] to build Android applications and it includes a collection of

Mining permission patterns

The common methods used widely to analyze Android permissions are statistical ones, such as frequency counting by Zhou and Jiang  [10], and the probabilistic model by Frank et al.  [35]. Thus, we started our work from an initial analysis on the clean and malware datasets using frequency counting following Zhou and Jiang’s work and extend it to explore used permissions. As inspired by the work of Barrera et al.  [12] who utilized SOM for application clustering and visualization, we would like to

Experiment settings

According to the statistical analysis and biclustering resulting figures, not all the permissions are required or used. In the experiment to evaluate the proposed Contrast Permission Pattern Mining algorithm, we ignore the permissions that are not required or used in each sub-datasets respectively. Table 3 gives more details of the four new sub-datasets.

The statistical analysis results also show that only a small set of permissions have support that are greater than 0.1 (10%), so we follow the

Conclusion

In this paper, we studied the Android permission system as the smartphone platform makes use of permissions to regulate access to system resources and users’ private information. In order to understand and identify permission patterns, the existing work considers only those permissions that are declared in the AndroidManifest.xml files. We refer to those permissions as ‘required’ permissions. However, there is another permission check that takes place after an application has been installed and

Veelasha Moonsamy is a current Ph.D. candidate at Deakin University, Australia. Her research thesis focuses on security and privacy in smartphone applications. She received her Bachelor (Hons) in Information Technology, majoring in IT Security and Mathematical Modelling from Deakin University in 2011. Her research interests include mobile technology, malicious software, machine learning algorithms and security protocols. Veelasha is also a member of the Australian Computer Society and IEEE.

References (48)

  • J. Rong et al.

    A behavioral analysis of web sharers and browsers in Hong Kong using targeted association rule mining

    Tourism Management

    (2012)
  • R. Law et al.

    Identifying changes and trends in Hong Kong outbound tourism

    Tourism Management

    (2011)
  • PC Magazine, Encyclopedia. http://www.pcmag.com/encyclopedia_term/0,2542,t=Smartphone&i=51537,00.asp  (accessed in...
  • Google, Nexus. http://www.google.com/nexus  (accessed in March...
  • Apple Inc., iphone. http://www.apple.com/iphone  (accessed in March...
  • Research in Motion Ltd., Blackberry. http://au.blackberry.com  (accessed in March...
  • Microsoft, Windows phone. http://www.windowsphone.com/en-gb  (accessed in March...
  • Apple Inc., Welcome to Apple store. http://store.apple.com/au  (accessed in March...
  • BlackBerry, Blackberry world. http://appworld.blackberry.com/webstore  (accessed in March...
  • Google, Google play. https://play.google.com/store  (accessed in March...
  • Google, Malware—what’s the policy?...
  • Y. Zhou, X. Jiang, Dissecting Android malware: characterization and evolution, in: Proceedings of the IEEE Symposium on...
  • M. Frank, B. Dong, A.P. Felt, D. Song, Mining permission request patterns from Android and Facebook applications, in:...
  • D. Barrera, H.G. Kayacik, P.C. van Oorschot, A. Somayaji, A methodology for empirical analysis of permission-based...
  • A.P. Felt, K. Greenwood, D. Wagner, The effectiveness of application permissions, in: Proceedings of the USENIX...
  • A.P. Felt, E. Ha, S. Egelman, A. Haney, E. Chin, D. Wagner, Android permissions: user attention, comprehension and...
  • P.H. Chia, Y. Yamamoto, N. Asokan, Is this app safe? A large scale study on application permissions and risk signals,...
  • International Secure Systems Lab. Andrubis: analyzing Android binaries. http://anubis.iseclab.org/?action=home...
  • virusTotal, Credits & acknowlegements. https://www.virustotal.com/en/about/credits  (accessed in March...
  • Open Handset Alliance, Android. http://www.openhandsetalliance.com/android_overview.html  (accessed in November...
  • F. Ableson, Introduction to Android development. http://www.ibm.com/developerworks/library/os-android-devel  (accessed...
  • Google, Android SDK. http://developer.android.com/sdk/index.html  (accessed in December...
  • The Eclipse Foundation, Eclipse ide for java developers....
  • Android Open Source Project, Bytecode for the Dalvik virtual machine....
  • Cited by (85)

    • Malicious application detection in android - A systematic literature review

      2021, Computer Science Review
      Citation Excerpt :

      In Table 4, total 17 features are mentioned, these are the features which are used by the different researchers for the development of static analysis techniques. The frequency of permissions used by malicious and benign applications is used for the detection process [84]. It may be used solely or as a combination with other one or more features.

    • Detection of malicious Android applications using Ontology-based intelligent model in mobile cloud environment

      2021, Journal of Information Security and Applications
      Citation Excerpt :

      The AndroidManifest.xml file holds the complete meta-information, such as permissions, components, and intents that are needed for the installation and execution of an app. Generally, four protection levels of permissions (Normal, Dangerous, Signature, and SignatureOrSystem), four components (activity, service, content provider, and a broadcast receiver), and a message-passing system (intents) are present in the Android platform to develop apps [37,38]. An intent is a message that is passed between apps, within apps, and also from OS to apps.

    • A comprehensive review on permissions-based Android malware detection

      2024, International Journal of Information Security
    View all citing articles on Scopus

    Veelasha Moonsamy is a current Ph.D. candidate at Deakin University, Australia. Her research thesis focuses on security and privacy in smartphone applications. She received her Bachelor (Hons) in Information Technology, majoring in IT Security and Mathematical Modelling from Deakin University in 2011. Her research interests include mobile technology, malicious software, machine learning algorithms and security protocols. Veelasha is also a member of the Australian Computer Society and IEEE.

    Jia Rong, Ph.D. is a research associate at the School of Information Technology, Deakin University, Australia. Her research interests are data mining, multimedia data analysis, and technological applications to tourism and hospitality. She was awarded the Professor of Information Technology Award (2010) for the most academically outstanding Ph.D. student, School of IT, Deakin University, Australia.

    Shaowu Liu is a current Ph.D. candidate at Deakin University, Australia. He received the Bachelor of Computer Science with Honors degree from Deakin University in 2012. His research interests include data mining and machine learning.

    View full text