As Q.Lab interns, we joined the Qrious data science team and focused on a variety of interesting and innovative AI projects, in areas such as natural language processing (NLP), machine learning and deep learning (e.g. computer vision). The interns were divided into teams, and each team picked two or three projects to start with. With a mentor per project, we worked to bring each to life.
Our team’s first project sought to analyse how people felt about a major brand online. This project was for a client who wanted to analyse the overall sentiment about their brand and know what aspects about their brand people discussed on social media. This knowledge could help the brand make business and marketing decisions that served customers better.
Businesses are constantly looking for ways to gain competitive advantage in the market, and the ability to listen deeply to customers and make decisions based on their feedback is a step towards this. Customer feedback on social media has great potential to be the source of useful, nuanced business insights, but, it is large in volume and as unstructured data (i.e. data that’s not in a traditional row-column database), is complex to analyse.
We knew gathering social media insights would be a challenge. We first had to find and grapple with large amounts of unstructured data, then find a standardised way of probing it for insights. With a blank slate and guidance from our mentor, our team came up with a rough process for this analysis: we’d gather comments about the brand from different online platforms and summarise what they said using a combination of AI and human interpretation.
Fortunately a subset of AI called Natural Language Processing (NLP) existed for this purpose. NLP teaches computers to recognise and interpret human language and its tools and techniques were what we needed to start. NLP helps businesses to understand customers better and arrive at these insights quicker and more efficiently.
We first sought to understand major topics or themes that emerged from the discussions about the brand online. Doing this would give us context to frame further analyses with. We used an NLP technique called Topic Modelling for this task, which allows the extraction of important topics from a sea of textual data.
Speaking of textual data, we collected it in large volumes for this project. We gathered tens of thousands of customer comments about the brand from various online platforms over the previous year to make our analyses more comprehensive. Textual comments can’t be fed directly into machine learning algorithms, which needed numerical inputs, so we had to process and convert the comments accordingly.
In our processing pipeline, we followed NLP techniques that involved breaking paragraphs down into sentences, then those into words, and removing duplicated, unwanted, or filler words. The cleaned data-set was converted into numbers, and run through a topic modelling algorithm, which highlighted the important and most frequently discussed topics within.
Once we had extracted major topics about the brands, the next step was to interpret them by asking what people felt about them. The answer was found by running the topics through a standard but useful NLP technique called Sentiment Analysis. As the name suggests, this technique summarises the sentiment (positive, negative, or neutral) from a collection of textual data. The results of this analysis often come as a percentage of sentiments represented by the data overall. For example, data from a discussion about a product going on sale might result in a higher positive sentiment percentage, while discussion about a product price increasing might see a higher negative sentiment percentage.
This analysis has limitations, however. Although it gives insight into what customers generally think about the brand, hashing out specific customer emotions (i.e. worry, fear, happiness, disgust, etc) about the brand’s topics would make the insights more nuanced and valuable. To take this further we did emotional sentiment analysis on our data, and found, for example, that customers felt high levels of trust and anticipation for the different topics we’ve extracted previously.
In this project, we’ve explored the rapidly advancing field of NLP and looking up from the mass of academic research, implemented NLP techniques to solve a real-world problem. The learning curve was steep and we had to grasp tools quicker than usual to achieve this, but the personal growth that came out of this was massive.
Although we’ve focused on social media comments for this project, we’ve realised it applies to other sources of customer feedback as well, such as:
Including these would not only improve the accuracy of the algorithms used to analyse our data, but they would also make the interpreted results more comprehensive.
Finally, we had to ‘deploy’ our project, and our team did this in multiple ways. One was through frequent sprint meetings relaying our progress to the wider team. Another was through a web dashboard we’ve built for visualising our results that can eventually be accessed by the client. Finally, a blog like this serves as another medium - reaching anyone that may be interested in this space.
This ties into our final learning, which concerns the importance of collaboration and communication in the field of AI and Data Science. Our team spanned two locations (Wellington and Auckland) and represented people from different backgrounds (software engineering, analytics, and biomedical science) and familiarity with NLP concepts. We felt our differences helped us ideate better, fill in gaps that came up, and learn from each other more effectively.
This is how technology advances - with the joint effort of a lot of people who care about the same cause. And of course, we know our success was in large part due to the generous coworkers and mentors who did the same for us!
Dhilip, Nicole and Will are Qrious data science interns and members of Q.Lab, our innovation hub.