Keywords

1 Introduction

Over the last decade, a significant amount of research has been conducted to investigate the effectiveness of mobile learning apps in school education [1]. The number of apps has increased exponentially in the last decade and there are millions of educational apps available for educators and students. However, with overload of choice, and the increasing speed of technological development—much of which is driven by corporate markets rather than education [2], it has become challenging for teachers to efficiently select an app that best supports appropriate learning activity types and assessment strategies, and associated pedagogical preferences [3]. To add to these challenges, the majority of apps in repositories such as the iTunes Store are ‘drill and practice’ or ‘instructive’ in nature [4], underpinned by traditional behaviourist principles—essentially replicating traditional transmissionist approaches to learning [5,6,7]. As mobile apps develop and proliferate, the challenge for educators is to move beyond the hype and rhetoric [8] to focus on new mobile pedagogical opportunities with apps.

App stores often provide the facility for user feedback (comments and ratings) in order to help teachers select apps, and for app developers to improve their designs. These customer ratings and reviews play a critical role in the mobile app market and directly influence app downloads. User feedback has already been used by practitioners and app developers as a source of information in activities such as selection of apps, customer satisfaction, versioning, and bug reports [9,10,11]. However, the main challenge is processing and synthesising this feedback into useful information. Considering review volumes, analysing every review manually is laborious and time consuming.

Sentiment analysis is an automated approach that aims to determine the polarity of sentiments and emotions within large textual datasets [12]. This approach is used to develop tools for calculating and monitoring the attitude and behaviour of app users from their feedback, comments and reviews in online social media and app review sites [10]. Sentiment analysis tools are a powerful utility in app ranking and selection; however, it has so far been underutilized in the education domain.

In this paper, we present the results of our preliminary investigation exploring the utility of a new technique for evaluating the pedagogical affordances of educational apps. The feedback and comments of app users are assessed for their alignment against evaluation criteria from a well-accepted, rigorous mobile pedagogical framework [13]. This framework focuses on three distinctive mobile pedagogies: personalization, authenticity, and collaboration. The objective of our research is to explore the utility of our novel technique incorporating sentiment analysis and informed by the m-learning pedagogical framework [13].

The main contributions of this research are: (1) Feature based sentiment analysis using the three mobile pedagogical constructs i.e. personalization, authenticity, and collaboration; and (2) initial confirmation of the usefulness of sentiment analysis for evaluating apps in education.

2 Background

2.1 Mobile Learning

Mobile learning (or m-learning) is described in numerous ways, but these descriptions all consider the nexus between working with mobile devices and the occurrence of learning: the process of learning mediated by a mobile device. Numerous characteristics of m-learning have been identified in the literature [14].

Over the last decade, a significant number of initiatives have been launched that aim to fully utilize and exploit mobile technologies and apps for educational purposes [15]. There is evidence that m-learning environments enhance students’ performance [16, 17]. However, increase in the use of mobile devices does not imply their effective incorporation in educational policies and in practice, mobile devices are not effectively utilised in formal education [18, 19] for a variety of reasons [17, 20].

Various lists of recommended educational apps are available online [21, 22]. These lists are limited in what they provide because they don’t guide the teachers and students about pedagogical understanding of how an app could be used to support teaching and learning. Therefore, they are not sufficiently practical to facilitate strong instructional planning and implementation [21]. This has resulted in a pressing need for an evaluation framework/rubric that facilitates the analysis of pedagogical affordances of educational apps.

2.2 A Pedagogical Framework for m-Learning: iPAC

Numerous frameworks have been proposed in the literature, ranging from complex multi-level models (e.g. [23]) to smaller frameworks that often omit important socio-cultural characteristics of learning or of pedagogy. Common themes include portability of m-learning devices and mobility of learners; interactivity; control and communication. The theoretical underpinning for the work described in this paper is a robust and validated mobile pedagogical framework [13]. Informed by sociocultural theory [24], it highlights three central and distinctive pedagogical features of m-learning: personalisation, authenticity and collaboration (or ‘PAC’). The critical influence of context is signalled by the central location of ‘time-space’ at the core of the ‘iPAC’ framework, as depicted in Fig. 1.

Fig. 1.
figure 1

Adapted from ([13], p. 8)

The mobile pedagogical framework (iPAC) comprising three distinctive features of mobile learning experiences.

The personalisation construct consists of the sub-constructs of ‘agency’ and ‘customisation’. High levels of personalisation would mean the learner is able to enjoy an enhanced degree of agency [25] and the flexibility to tailor both tools and activities, interacting with a strong sense of ownership of both the device and the learning process. The authenticity construct privileges opportunities for in-situ, participatory learning [26]. The sub-constructs of ‘task’, ‘tool’ and ‘setting’ focus on learners’ involvement in rich, contextualised tasks, making use of tools in a realistic way, and driven by relevant real-life practices and processes [27]. The collaboration construct captures the conversational, networked features of m-learning. It consists of ‘conversation’ and ‘data sharing’ sub-constructs, as learners engage in negotiated meaning-making, forging connections and interactions with peers, experts and the environment [28]. This iPAC framework provides a useful lens to analyse mobile apps and how use of their features might leverage mobile pedagogies in a range of learning environments.

The iPAC framework has recently been used to inform research on m-learning in school education [29], teacher education [30, 35], indigenous education [31] and other areas of higher education [32]. For example, Viberg and Grönlund [33] used the framework to develop a survey instrument for eliciting students’ attitudes toward mobile technology use in and for second and foreign language learning in higher education.

2.3 Feature Based Sentiment Analysis

Sentiment analysis is used to analyse human opinions, sentiments, judgements, reviews and behaviours about many aspects of life such as products, business, people, problems, subjects and their features [12]. Sentiment analysis aims to calculate the polarity of emotions in textual data by identifying the positivity or negativity of a statement. Sentiment analysis is one of the widely used evaluation techniques around the world, helping companies to improve their products based on customers’ feedback. App stores provide users with facilities to submit their feedback and rank the apps with star ratings [10]. This data is used by the companies to monitor the app users’ behaviours and sentiments.

Feature-based sentiment analysis is a specific type of sentiment analysis which aims to capture nuances about objects of interest. Different features of a product can generate different sentiments, for example a mobile phone can have a user-friendly interface but the battery life is very low. This scenario requires identifying relevant entities of interest, extracting these features from the data, and determining whether an opinion expressed on each feature is positive, negative or neutral.

3 Study Design

The investigation was carried out in the context of school-based mathematics and science education for two main reasons. Firstly, we are currently conducting a larger ongoing research projectFootnote 1 about the effectiveness of mobile apps for science and mathematics in school education, driven by the strong ‘political will’ in many countries to improve maths and science learning and to build the capability of the workforce for future job markets. Secondly, there is currently a burgeoning interest in STEM education—see for example, the major recent reviews of m-learning research in both science [17] and mathematics [16] education. The investigation was quantitative in nature and addressed the key question: What is the utility of feature based sentiment analysis for evaluating the mobile pedagogical affordances of educational apps?

The steps in this investigation were as follows:

  1. (1)

    We selected ten popular discipline-specific education apps (5 science and 5 mathematics) suitable for school students, as described in Table 1. The apps were chosen based on popularity in various forums and blogs.

    Table 1. Selected apps for investigation (based on popularity in various forums and blogs)
  2. (2)

    We used a commercial sentiment analysis tool, Appbot (https://appbot.co/), that extracts user reviews and ratings from the app stores and provides full utility of qualitative and quantitative analysis. Other similar sentiment analysis tools could provide the same functionality but we chose Appbot because it provides functions to search within reviews for specific words (i.e. feature extraction) and also allows filtering out relevant reviews that match particular concepts or words.

  3. (3)

    We developed a word bank based on words in the literature associated with the three main constructs of the iPAC framework. Figure 2 shows sample words from these word banks.

    Fig. 2.
    figure 2

    Word clouds relating to the three constructs of the iPAC framework [13]

  4. (4)

    After using the word bank for feature extraction on all of the ten selected apps, we used Appbot to analyse the extracted reviews for the polarity of the sentiments. Data was collected covering reviews from the period of one year i.e. from January 2016 to January 2017 to limit the scope of our investigation.

4 Results

Table 2 shows the total number of reviews, the percentage of positive sentiments, average of star ratings and the scores. These scores ranged from D− to A+ and are calculated based on the trends in the review sentiments, review volume, and star ratings as expressed by the app users.

Table 2. Results from sentiment analysis of selected maths and science apps

In the domain of mathematics, MalMath and Myscript Calculator are on top of the list, whereas for science, Little Alchemy and NASA has received highest number of reviews. MalMath and Little Alchemy has received a significant number of reviews from the users and above 90% of these reviews contained positive sentiments. Next, we extracted the data based on the aforementioned word bank. Table 3 shows results for the number of reviews that matched the iPAC word bank. Table 4 shows the breakdown of the sentiments for the extracted reviews for the apps.

Table 3. Feature based sentiment analysis
Table 4. Breakdown of feature based sentiment analysis (P = positive, Nt = Neutral, Ng = Negative)

Overall, a nuanced picture of pedagogical affordances emerges for these ten sample apps. MalMath produced considerably more positive sentiments in Personalisation and Authenticity, however there weren’t many reviews about the app relating to Collaboration. Myscript Calculator has generated positive sentiments for the Personalisation and Authenticity constructs, but have received only negative reviews for Collaboration. In the case of Little Alchemy, it generated significantly positive reviews for app features relating to Personalisation and some positive reviews in Collaboration, however received low volume of reviews with mixed sentiments in Authenticity. NASA received positive sentiments for Personalisation but not so favourable sentiments in Collaboration and Authenticity (Fig. 3).

Fig. 3.
figure 3

Feature based sentiment analysis results. This provides the graphical representation of the feature based sentiment analysis results.

5 Discussion

The results provide evidence of what mobile pedagogical features of apps that users are choosing to comment on in their reviews, without any prompting from rubric or other more formal evaluation instruments. There was a trend in our results showing higher frequencies of positive sentiments relating to the Personalisation aspect of the selected apps. Past work has revealed that of the three iPAC constructs, Personalisation is the least exploited by teachers in their mobile learning task designs [29], with teachers evidently struggling to give opportunities for learners to control their learning (e.g. the pace of lessons and how m-learning tasks are undertaken). So, in some ways the result from the present study is surprising. However, assuming that our results comprised comments mostly from students and teachers, and given that our study only used recent app reviews (1 year), perhaps the ‘struggle’ discussed in [29] is the very reason that users ‘noticed’ these pedagogical affordances (relating to personalisation) i.e. app reviewers were mindful of their previous mobile learning experiences that quite likely lacked a sense of learner control. This claim is entirely speculative and future research will need to triangulate these findings. This triangulation can be performed in two ways: (1) with interviews and surveys of app users, (2) qualitative analysis of the text of the reviews and feedback from app stores and social media (e.g. Facebook or Twitter).

It is too early to draw conclusions about ‘low volumes’ of comments, other than users were not choosing to comment on such features. For example, just because users were evidently not frequently commenting on (or not ‘noticing’) app features relating to Collaboration, doesn’t mean that these features are absent. Further research is needed to clarify the exact implications of low and high frequencies of sentiments. Informed by these results, we posit that sentiment analysis technique is a novel and effective augmentation of other more traditional app evaluation procedures. This type of innovative, two-tiered evaluation procedure will ultimately help educators (and app designers) to more accurately evaluate the pedagogical potential and value of education apps.

We also recognise the risk of deterministic views of emerging technologies [34] such as mobile apps, and we are certainly not advocating a ‘one-size-fits-all’ approach to selecting and using educational apps. There are many other factors (beyond pedagogical approaches) that contribute to the effective use of apps for learning, such as the teacher expertise, student characteristics and provision of technical support. However, there is value in teachers using procedures such as the one outlined in this paper to critically examine app features and their potential for leveraging transformational pedagogies [4] in and beyond the classroom.

6 Conclusion and Future Directions

In this paper, we have presented a preliminary investigation of exploring the utility of a new technique for evaluating the pedagogical affordances of educational apps. This technique uses feature based sentiment analysis approach to extract the feedback and comments of app users. We have used the results of sentiment analysis to assess their alignment against evaluation criteria from a rigorous mobile pedagogical framework (iPAC) [13]. The main objective of our research was to explore the utility our novel technique incorporating sentiment analysis and informed by the iPAC pedagogical framework. The preliminary results of our investigation have provided empirical evidence that using sentiment analysis is an effective way of incorporating the opinions of past users of educational mobile apps with use of the iPAC framework. These forms of feedback are very useful for the authors who developed the iPAC framework [13] in their ongoing studies of the usefulness and utility of this framework.

In relation to threats to validity, we concede that the precision of our results is impacted by the accuracy of the word bank and the limitations of the sentiment analysis tool (appbot), so the words may not have matched well against those words used in the reviews even though they may have been synonyms. We plan to extend our study to include deeper semantic analysis of the textual content of the reviews by using cutting edge Natural Language Processing technologies as well as newly emerged algorithms for opinion mining. Another future direction is the integration of the sentiment analysis technique with the recently developed rubric instrumentFootnote 2 for app evaluation emerging from iPAC framework. It is recommended that teachers should ideally use this instrument after thoroughly exploring an app, and if possible, after using the app in their teaching. We plan to design a software tool that would seamlessly extract the sentiments of past users of an app to provide additional information about the app within this type of rubric instrument.