SFU student tracks chocolate with Instagram data

Saif Charaniya (pictured) was inspired by his sweet tooth to look at chocolate for his study. - Photo by Lisa Dimyadi

One SFU student is using his skills with managing big data to find out how chocolate is posted about on Instagram, especially when analyzing the content of posts and how it relates to geographical data.

Charaniya began the project as part of his CMPT 732 course about “Big Data.” He collected 9.4 million posts using the hashtag “#chocolate” over a one year period. It took him around five weeks to gather all his data.

Charaniya used an application processing interface (API), provided by Instagram, to mine data from the app. An API, according to Charaniya, is “a set of commands and tools that someone designs for a specific task.” The interface allows for the user to input a simple command and have the data returned without having the user go through the process of doing the complicated steps themselves.

Why look at chocolate? “I have a bit of a sweet tooth,” he said with a laugh. But Charaniya’s interest in chocolate extends beyond his tastebuds: “Chocolate is used everywhere. In every festive location, religious or secular, there is gifting or receiving of chocolate or of some kind of sweet or confectionary and it’s a global phenomenon.”

Among the significant findings were ones found by narrowing the 9.4 million posts in the data set to around 1.2 million that had geolocation data through location tags. The city with the most posts about chocolate in the world was not Brussels or Los Angeles, as Charaniya had predicted, but Üsküdar, a municipality of Istanbul. Rounding out the top three was Sao Paulo in second place, and New York in third.

His study also returned interesting results about what brands were talked about most. “By far, Nutella is the most talked-about chocolate in the world,” said Charaniya. Also in the top three are Mars and Oreos, however these brands are talked about much less than Nutella, which Charaniya estimated possesses a third of the “market share.” He said about the data, “[It] only tells you which chocolate people like to talk about. It doesn’t tell you which chocolates are inherently more popular.”

Charaniya used word clouds along with geodata to analyze how social media posts about chocolate differ by location. He noted about North America and Australia that city tags are very popular, while in Brazil, oftentimes the type of chocolate being posted about is in the hashtag. However there are several challenges with this. Many non-english speaking countries use non-english hashtags which were not included in Charaniya’s data.

The research on chocolate could have significant real world implications. Charaniya talked about how information on which brands people talk about the most and where they are located can help companies better reach consumers in those locations. He also mentioned that by analyzing the performance of social media posts about chocolate, chocolate makers can find out how to better “engage their users.” Charaniya used the example of Snickers, which, while possessing a relatively small market share, averages more comments per post about the brand than its competitors.

Said Charaniya, “for someone who is using Instagram for the purpose of monetization they need to understand when the best time and [what] the best use of words is to get their post to become optimal.”