15 minute read

Abstract

In this project, I analyze customer needs through text mining of healthcare app reviews and based on this, I propose a design strategy for healthcare apps. I have collected 34,230 reviews from 10 healthcare apps in the Google Play Store. I performed LDA topic modeling to analyze customer needs in-depth through.

Dataset

1. User reviews

Collected 34,320 reviews from 10 health care apps in the Google Play Store

Data Collection Method: Crawiling the reviews on Google Play store

2. Preprocessing

The dataset used for preprocessing purposes is as follows:

  1. List for word substitution

Since LDA topic modeling provides results based on the most frequent vocabulary, unifying words with the same meaning into a single word is an effective way to perform semantic analysis on text. For example, ‘iPhone’ and ‘galaxy s8’ are both the same word as ‘smartphone’. A human can determine that the words all have the same meaning, but the computer recognizes them as all different words. This may cause keywords to be missed because the number of occurrences of a word with a specific meaning is counted less as it is used as a different word even though it is a frequent vocabulary. Therefore, in text mining techniques where the frequency of occurrence of words is important, such as LDA topic modeling, prior word replacement is one of the ways to increase the effectiveness of data analysis.

  1. List for stopword

A stopword is a word that appears frequently in text mining, but it is a predicate or investigation that is far from the user’s reactions or opinions. They have nothing to do with user experience. Therefore, it is necessary to organize these stopwords well in the preprocessing stage.

LDA topic modeling concepts

Topic Modeling is a text mining methodology that finds key topics in text-based document data. In particular, Latent Dirichlet Allocation (LDA) is the most representative algorithm for topic modeling. Specifically, LDA topic modeling analyzes which topics in a document and at what ratio by analyzing a large amount of document data through a probability-based modeling technique (Blei et al., 2003). In addition, since it provides information on what keywords are configured for each topic, it has an effective advantage in deriving insights through keyword combinations. Recently, research has been actively conducted in various fields, such as automatically classifying similar topics on SNS through LDA topic modeling or deriving customer needs by analyzing airline online reviews (Lu et al., 2013, Kwon et al., 2021).

LDA topic modeling visualization

In this project, considering the review rating is out of 5, I classified 4-5 as positive reviews and 1-2 reviews as negative reviews. LDA topic modeling will be performed and visualized for each review group that received positive/negative ratings, as shown in Figures 1 and 2 below. image image

Code

1. Google drive mount

from google.colab import drive

drive.mount('/content/gdrive')
Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).

2. Package install and import

!pip install pyLDAvis==2.1.2
import numpy as np
import pandas as pd
import warnings # ignore warning msg
warnings.filterwarnings(action='ignore')
# Use NLTK
import nltk
import pickle
import re
nltk.download('all')

from tqdm import tqdm # work process visualization
import re # Regular expression package for string
from gensim import corpora # word frequency counting package
import gensim #LDA
import pyLDAvis
import pyLDAvis.gensim
from collections import Counter

3. Load dataset

dataset_raw = pd.read_excel('/content/gdrive/MyDrive/NLP-final-project/dataset_raw.xlsx')
dataset_raw.head()
app review rating
0 FitCoach: Fitness Coach & Diet Its a nice app but so many features don't actu... 3
1 FitCoach: Fitness Coach & Diet Deceptive. Not at all as advertised. Annoying ... 1
2 FitCoach: Fitness Coach & Diet Not the app shown in the Facebook ads. I signe... 1
3 FitCoach: Fitness Coach & Diet Updated review. The app is good, the range of ... 4
4 FitCoach: Fitness Coach & Diet Was alright in the beginning. You can't change... 2

4. Data exploration

dataset_raw.info
  <bound method DataFrame.info of                                   app  \
0      FitCoach: Fitness Coach & Diet   
1      FitCoach: Fitness Coach & Diet   
2      FitCoach: Fitness Coach & Diet   
3      FitCoach: Fitness Coach & Diet   
4      FitCoach: Fitness Coach & Diet   
...                               ...   
34315    8fit Workouts & Meal Planner   
34316    8fit Workouts & Meal Planner   
34317    8fit Workouts & Meal Planner   
34318    8fit Workouts & Meal Planner   
34319    8fit Workouts & Meal Planner   

                                                  review  rating  
0      Its a nice app but so many features don't actu...       3  
1      Deceptive. Not at all as advertised. Annoying ...       1  
2      Not the app shown in the Facebook ads. I signe...       1  
3      Updated review. The app is good, the range of ...       4  
4      Was alright in the beginning. You can't change...       2  
...                                                  ...     ...  
34315        It's the perfect app for a perfect workout.       5  
34316      good app. i like the meal plans and workouts.       5  
34317  Easy and practical to use. Love the variety of...       5  
34318                            It helps ME and MY body       5  
34319           Good app so far with the first exercise.       5  

[34320 rows x 3 columns]> ### 5. Data preprocessing

1. Check for missing

dataset_raw.isnull().sum()
app       0
review    2
rating    0
dtype: int64 #### 2. Remove missing values ``` python # axis = 0: remove missing value's row dataset = dataset_raw.dropna(axis = 0) dataset.isnull().sum() ```
app       0
review    0
rating    0
dtype: int64

3. Load dictionary for preprocessing

stopword_list = pd.read_excel('/content/gdrive/MyDrive/NLP-final-project/stopword_list.xlsx')
stopword_list.tail()
stopword
147 really
148 great
149 nice
150 like
151 love
replace_list = pd.read_excel('/content/gdrive/MyDrive/NLP-final-project/replace_list.xlsx')
replace_list.head()
before_replacement after_replacement
0 cell phone phone
1 smartphone phone
2 iphone phone
3 galaxy phone
4 ipad phone

4. Word substitution

def replace_word(review):
    for i in range(len(replace_list['before_replacement'])):
        try:
            # Perform data replacement only when there is a word to be replaced
            if replace_list['before_replacement'][i] in review:
                review = review.replace(replace_list['before_replacement'][i], replace_list['after_replacement'][i])
        except Exception as e:
            print(f"Error: {e}")
    return review
dataset['review_prep'] = ''
review_replaced_list = []
for review in tqdm(dataset['review']):
    review_replaced = replace_word(str(review)).lower() #lower case
    review_replaced_list.append(review_replaced)
dataset['review_prep'] = review_replaced_list
dataset.head()
app review rating review_prep
0 FitCoach: Fitness Coach & Diet Its a nice app but so many features don't actu... 3 its a nice application but so many features do...
1 FitCoach: Fitness Coach & Diet Deceptive. Not at all as advertised. Annoying ... 1 deceptive. not at all as advertised. annoying ...
2 FitCoach: Fitness Coach & Diet Not the app shown in the Facebook ads. I signe... 1 not the application shown in the facebook ads....
3 FitCoach: Fitness Coach & Diet Updated review. The app is good, the range of ... 4 updated review. the application is good, the r...
4 FitCoach: Fitness Coach & Diet Was alright in the beginning. You can't change... 2 was alright in the beginning. you can't change...

5. Remove non-English text.

review_removed = list(map(lambda review: re.sub('[^a-zA-Z ]', '', review), dataset['review_prep']))
dataset['review_prep'] = review_removed

6. Separation of data based on rating

The Google Play Store has a rating of 5 out of 5. Therefore, in thisproject, 4-5 were classified as positive reviews, and 1-2 were classifiedas negative reviews. This is to distinguish between positive and negative reviews related to the experience of using the service.

# Positive review (4, 5 out of 5)
review_pos = dataset[(4 == dataset['rating']) | (dataset['rating'] == 5)]['review_prep']
# Negative review (1, 2 out of 5)
review_neg = dataset[(1 == dataset['rating']) | (dataset['rating'] == 2)]['review_prep']
review_pos
3        updated review the application is good the ran...
13       great way to stay on track with lots of variet...
18       despite the negative reviews i read i find the...
20       update after posting review my next workout di...
21       i loved this application while using it it kic...
                               ...                        
34315    its the perfect application for a perfect workout
34316    good application i like the meal plans and wor...
34317    easy and practical to use love the variety of ...
34318                              it helps me and my body
34319      good application so far with the first exercise
Name: review_prep, Length: 21017, dtype: object

7. Tokenization

Nouns are a key morpheme to understand the context in a sentence andhave the advantage of being able to easily identify frequent words, only nouns are extracted from the review.

review_tokenized_pos = list(map(lambda review: nltk.word_tokenize(review), review_pos))
review_tokenized_neg = list(map(lambda review: nltk.word_tokenize(review), review_neg))

8. Remove stopwords

def remove_stopword(tokens):
    review_removed_stopword = []
    for token in tokens:
        # When the number of characters in the token is 2 or more
        if 1 < len(token):
            # Include as review data for analytics only if the token is not a stopword
            if token not in list(stopword_list['stopword']):
                review_removed_stopword.append(token)
    return review_removed_stopword
review_removed_stopword_pos = list(map(lambda tokens : remove_stopword(tokens), review_tokenized_pos))
review_removed_stopword_neg = list(map(lambda tokens : remove_stopword(tokens), review_tokenized_neg))
  1. Select a specific range of reviews In general, the longer the review, the more likely it will contain user feedback, such as user experience or technical issues. However, reviews that are rather long may have difficulties in identifying topics or extracting features using combinations of words in the review (Vasa et al.,2012). Therefore, in this project, only reviews with 3 or more and 15 or less nouns extracted from each review were used for analysis.
    MIN_TOKEN_NUMBER = 3 # Min
    MAX_TOKEN_NUMBER = 15 # Max
    
    def select_review(review_removed_stopword):
     review_prep = []
     for tokens in review_removed_stopword:
         if MIN_TOKEN_NUMBER <= len(tokens) <= MAX_TOKEN_NUMBER:
             review_prep.append(tokens)
     return review_prep
    
    review_prep_pos = select_review(review_removed_stopword_pos)
    review_prep_neg = select_review(review_removed_stopword_neg)
    
  2. Check the preprocessing result
review_num_pos = len(review_prep_pos)
review_num_neg = len(review_prep_neg)
review_num_tot = review_num_pos + review_num_neg

print(f"Total: {review_num_tot}")
print(f"Positive Reviews: {review_num_pos}({(review_num_pos/review_num_tot)*100:.2f}%)")
print(f"Negative Reviews: {review_num_neg}({(review_num_neg/review_num_tot)*100:.2f}%)")
Total: 13167
Positive Reviews: 10306(78.27%)
Negative Reviews: 2861(21.73%)

6. LDA Topic Modeling

1. Hyperparameter tuning

NUM_TOPICS = 10
# passes:  The same concept as the epoch, determining the number of times to train the model with the entire corpus
PASSES = 15

2. Model training

def lda_modeling(review_prep):
    # Word encoding and frequency counting
    dictionary = corpora.Dictionary(review_prep)
    corpus = [dictionary.doc2bow(review) for review in review_prep]
    # LDA model training
    model = gensim.models.ldamodel.LdaModel(corpus,
                                            num_topics = NUM_TOPICS,
                                            id2word = dictionary,
                                            passes = PASSES)
    return model, corpus, dictionary

3. Word composition output function by topic

def print_topic_prop(topics, RATING):
    topic_values = []
    for topic in topics:
        topic_value = topic[1]
        topic_values.append(topic_value)
    topic_prop = pd.DataFrame({"topic_num" : list(range(1, NUM_TOPICS + 1)), "word_prop": topic_values})
    topic_prop.to_excel('/content/gdrive/MyDrive/NLP-final-project/result/topic_prop_' + RATING +  '.xlsx')
    display(topic_prop)

4. Visualization function

def lda_visualize(model, corpus, dictionary, RATING):
    pyLDAvis.enable_notebook()
    result_visualized = pyLDAvis.gensim.prepare(model, corpus, dictionary)
    pyLDAvis.display(result_visualized)
    # Save result
    RESULT_FILE = '/content/gdrive/MyDrive/NLP-final-project/result/lda_result_' + RATING + '.html'
    pyLDAvis.save_html(result_visualized, RESULT_FILE)

5. Modeling positive review topics

Using the previously defined model training, topic-specific word composition output function, and visualization function, I will train and visualize the topic modeling model for each positive review and negative review. Here, a total of 10 constituent words (=NUM_WORDS) per topic were set.

model, corpus, dictionary = lda_modeling(review_prep_pos)
NUM_WORDS = 10
RATING = 'pos'
topics = model.print_topics(num_words = NUM_WORDS)
print_topic_prop(topics, RATING)
topic_num word_prop
0 1 0.094*"workout" + 0.066*"best" + 0.026*"fitnes...
1 2 0.122*"use" + 0.114*"easy" + 0.028*"simple" + ...
2 3 0.044*"weight" + 0.039*"add" + 0.020*"training...
3 4 0.056*"easy" + 0.046*"workouts" + 0.024*"amazi...
4 5 0.045*"workouts" + 0.028*"meal" + 0.025*"time"...
5 6 0.045*"workouts" + 0.043*"free" + 0.029*"worko...
6 7 0.022*"workout" + 0.022*"exercises" + 0.020*"g...
7 8 0.019*"get" + 0.013*"recipes" + 0.013*"music" ...
8 9 0.021*"fit" + 0.020*"update" + 0.016*"wish" + ...
9 10 0.051*"track" + 0.038*"keep" + 0.023*"keeps" +...
lda_visualize(model, corpus, dictionary, RATING)

6. Modeling Negative review topics

model, corpus, dictionary = lda_modeling(review_prep_neg)
NUM_WORDS = 10
RATING = 'neg'
topics = model.print_topics(num_words = NUM_WORDS)
print_topic_prop(topics, RATING)
topic_num word_prop
0 1 0.020*"year" + 0.013*"lost" + 0.011*"app" + 0....
1 2 0.026*"sign" + 0.026*"google" + 0.015*"keeps" ...
2 3 0.028*"charged" + 0.027*"subscription" + 0.022...
3 4 0.020*"steps" + 0.016*"working" + 0.016*"data"...
4 5 0.036*"steps" + 0.015*"track" + 0.014*"count" ...
5 6 0.023*"use" + 0.017*"account" + 0.016*"dont" +...
6 7 0.039*"pay" + 0.022*"free" + 0.019*"money" + 0...
7 8 0.018*"keeps" + 0.013*"plan" + 0.013*"used" + ...
8 9 0.024*"update" + 0.019*"use" + 0.018*"time" + ...
9 10 0.035*"free" + 0.026*"download" + 0.022*"get" ...

6. How to interpret the results?

LDA topic modeling provides information on which keywords are composed of each topic and in what ratio. In other words, the user should understand the specific content of the topic through keywords. For example, a topic consisting of keywords such as ‘workouts’, ‘progress’, ‘easy’, and ‘plans’ will most likely be related to the ‘exercise record’ feature. As such, it is important for the LDA topic modeling technique to identify which keywords are in the topic and in what ratio. Considering these characteristics, I will discuss how to effectively interpret the data visualized through pyLDAvis.

1. Relevance

Relevance() can be adjusted through the sliding bar on the upper right in the figure below. Relevance is a hyperparameter that balances the frequency of occurrence of a word in a topic with the frequency of its occurrence in the entire document. That is, when there is a word with a high frequency of appearance in a specific topic, whether the word has a high frequency of appearance because it is a keyword that distinguishes the topic from other topics, or simply because it is a word widely used in various document data. It is a parameter that helps to clearly distinguish whether or not was high.

image

Figure 1. Review topic modeling visualization results of positive evaluation.

The Relevance value is a value between 0 and 1, and the closer it is to 0, the less the number of occurrences in the entire document is, but the focus is on whether the topic is a word that can differentiate it from other topics. On the other hand, the closer the Relevance value is to 1, the more likely it is to be a keyword that appears frequently in the entire document data rather than a keyword constituting a specific topic. For example, in healthcare app review data, the word ‘exercise’ is very likely to appear in multiple reviews. Therefore, it is difficult to clearly distinguish one topic from another by simply using the word ‘exercise’. In these cases, setting the Relevance close to zero can penalize the importance of the word ‘exercise’, which appears in many reviews. This shows that the word ‘exercise’ is a widely used word throughout the document, rather than only appearing a lot in that topic. According to a study by Sievert & Shirley (2014), a Relevance value of 0.6 is known to be the most effective. However, this value is not always correct. This is because the optimal Relevance value may differ depending on each research domain, dataset, etc.

2. Topics and keywords

All circles on the left in the figure below are each topic. The distance between circles means how similar topics are to each other. A larger circle means that the topic has more words (=tokens). If you hover your mouse over the circle, the ratio of the words constituting the topic to the total document data is displayed on the right side of the current topic’s keywords. It also provides the ratio of the words that the topic constitutes to the words of the entire document data. In this way, by identifying which words are composed of each topic and at what ratio, the topic of each topic can be inferred, and furthermore, which topic is composed of the entire document data and at what ratio (= importance).

image

Figure 2. Review topic modeling visualization results of Negative evaluation

7. insights

Analyze user needs based on the visualization results.

1. Positive review analysis

First, the results of topic modeling of positive reviews are as follows.

image

  • Exercise action description function

You can see that the words ‘useful’, ‘accurate’, ‘easy’, ‘simple’, ‘steps’, and ‘follow’ appear frequently in topic 2. It can be interpreted that content that explains the movement of exercise step by step received positive reviews. Therefore, when planning a health care app service, I can consider video-based exercise action lecture content.

  • Meal record feature

In topic 5, words such as ‘meal’, ‘track’, ‘schedule’, and ‘plans’ appeared frequently. Through this, it can be interpreted that the meal record function, such as taking a picture of the meal and saving it, had a positive effect on controlling the diet. Computer Vision technology allows you to analyze what food you ate and how much you ate through food photos. This technology is expected to provide a positive user experience in terms of convenience by reducing the hassle of having to record dietary information one by one.

  • exercise record feature

In topic 10, words such as ‘workouts’, ‘progress’, ‘motivated’, ‘steps’, ‘track’, and ‘helps’ appeared frequently. This can be interpreted as having a positive effect on providing interest and motivation for exercise by making an exercise plan through exercise log recording and checking exercise quantity through exercise amount measurement. As such, adding an exercise log recording feature to the health caring app service that can help users record the amount of exercise and type of exercises, is expected to help promote regular exercise and use of the app.

2. Negative review analysis

image

  • Automatic paid subscription complaint

You can see that words such as ‘charged’, ‘refund’, ‘subscription’, ‘trial’, ‘cancel’, and ‘free’ appear frequently in topic 3. This is a number of complaints caused by the operating method of some healthcare apps provide free services for the first 1 to 3 months and then switch to paid subscription services without the user’s additional consent after the trial period. There are a lot of reviews in these service policies, where many users request immediate cancellation of subscription and refund. Therefore, in the healthcare app operation method, it is necessary to switch to providing payment and paid service only when the user is additionally asked for payment before switching to a paid subscription service and agrees to this.

  • Exercise tracking accuracy issue

You can see that the words ‘steps’, ‘track’, ‘count’, ‘accurate’, ‘gps’, ‘distance’, ‘mile’, ‘work’, and ‘wearable’ appear frequently in topic 5. There is a lot of negative feedback related to accuracy issues in exercise tracking, such as step count. For example, the app says that you have taken 10,000 steps, but your smartwatch only counts 5,000 steps. Users may underestimate the reliability of the service as a whole because of the low accuracy of these workout tracking. Therefore, when designing a healthcare service, it is necessary to continuously improve the accuracy of exercise tracking not only in the app but also in the wearable device environment.

Leave a comment