Most people think of social media comments outlets (Twitter, Instagram, Vine, and so on) as an “in the moment” type of media. What does th...
Most people think of social media comments outlets (Twitter, Instagram, Vine, and so on) as an “in the moment” type of media. What does that mean? Well, think about this: You walk into your favorite store and encounter one of the most unprofessional sales assistants you’ve ever come across. What do you do if you’re active in social media outlets like Twitter? You open your smart phone and tweet something like this:
Worst customer service I’ve ever encountered shame on you #company
There are times when we feel compelled to express our thoughts and feelings in this fashion and more often than not, someone listening to our rants responds with a sense of sympathy or consolation. By “in the moment,” we’re referring to being mindfully aware of what is going on right here and now.
This ability to immediately share our thoughts is one of the great powers of social media. This power, in turn, can be leveraged to derive business value. For example, we can monitor the pulse of a community of individuals pertaining to a specific event that is occurring right now. Trending topics are those being discussed more than others. These topics are being talked about in Twitter or any other social media data structure venue, for example more right now than they were previously. This is the reason that Twitter lists the trending topics directly on its site. This list allows users to see what a large group of individuals are talking about. In addition to identifying trends, we are able to perform additional analytics to derive insights from such conversations.
So far, we have talked about communications happening right now, but what about communications that have already occurred at some point of time in the past?
“You can’t go back to how things were. How you thought they were. All you really have is now.”
Luckily for us, this isn’t the case in social media analytics. As a matter of fact, looking back in time is often just as important as (if not more so than) looking at the present.
Worst customer service I’ve ever encountered shame on you #company
There are times when we feel compelled to express our thoughts and feelings in this fashion and more often than not, someone listening to our rants responds with a sense of sympathy or consolation. By “in the moment,” we’re referring to being mindfully aware of what is going on right here and now.
This ability to immediately share our thoughts is one of the great powers of social media. This power, in turn, can be leveraged to derive business value. For example, we can monitor the pulse of a community of individuals pertaining to a specific event that is occurring right now. Trending topics are those being discussed more than others. These topics are being talked about in Twitter or any other social media data structure venue, for example more right now than they were previously. This is the reason that Twitter lists the trending topics directly on its site. This list allows users to see what a large group of individuals are talking about. In addition to identifying trends, we are able to perform additional analytics to derive insights from such conversations.
So far, we have talked about communications happening right now, but what about communications that have already occurred at some point of time in the past?
- Have people talked about that particular topic in the past?
- Has the sentiment surrounding the topic changed over time?
- Are there different themes surrounding the topic now as opposed to, say, last year?
“You can’t go back to how things were. How you thought they were. All you really have is now.”
Luckily for us, this isn’t the case in social media analytics. As a matter of fact, looking back in time is often just as important as (if not more so than) looking at the present.
Predictive Versus Descriptive
Most of what we think of when we talk about business analytics are what we would call “descriptive analytics.” Descriptive analytics looks at data and analyzes past events for insight as to how to approach the future. Here, we look at past performance and, by understanding that performance, attempt to look for the reasons behind past successes or failures. Descriptive statistics are used to describe the basic features of the data in a sample. They provide simple summaries about the data and what was measured during the time it was collected. Descriptive analytics usually serves as the foundation for further advanced analytics.
Descriptive statistics are useful in summarizing large amounts of data into a manageable subset. Each statistic reduces larger “chunks” of data into a simpler, summary form. For example, consider the analysis of a large block of social media data surrounding an industry trade show. If we break down the topics seen in all of the social media conversations from the show, we could produce a set of descriptive statistics similar to those shown in Table 4.1.
Descriptive statistics are useful in summarizing large amounts of data into a manageable subset. Each statistic reduces larger “chunks” of data into a simpler, summary form. For example, consider the analysis of a large block of social media data surrounding an industry trade show. If we break down the topics seen in all of the social media conversations from the show, we could produce a set of descriptive statistics similar to those shown in Table 4.1.
In this case, the metric “Percentage of Overall Discussion” describes what the conversations are generally about. We could have just as easily reported on the number of males or females who made comments or the number of comments made by people claiming to be from North America, Europe, or Asia Pacific (for example). All these metrics help us understand (or describe) the sample of data; thus, they are descriptive statistics. The purpose of descriptive analytics is simply to summarize a dataset in an effort to describe what happened. In the case of timing social media, think about the number of posts by any given individual, the number of mentions by Twitter handle, a count of the number of fans or followers of a page, and so on. The best way to describe these metrics is to think of them as simple event counters. We should also remember that the ultimate business goal of any analysis project will drive the data sources we are interested in as well as the aspects of the data we need to focus on. This dimension of time helps us collect a suitably large set of data depending on the goal of the project.
Predictive Analytics
Predictive analytics can be thought of as an effort to extract information from existing datasets to determine patterns of behavior or to predict future outcomes and trends. It’s important to understand that predictive analytics does not (and can NOT) tell us what will happen in the future. At best, predictive analytics can help analysts forecast what might happen in the future with an acceptable level of reliability or confidence.
The supposition was that as we continued to refine a data source (over time), the more valuable the residual, or resulting, dataset would become. Perhaps that same diagram can be drawn with respect to time, as shown in Figure 4.1. The more temporal the data becomes, the wider view we obtain, and therefore the greater understanding can be derived. If we are focusing on a given data source, we can improve our level of understanding of that dataset by not only gaining a wider understanding, but perhaps by modifying our perception of the data as it evolves over time. For example, if we are monitoring conversations in a community about a new email system that has been rolled out to all employees, over a period of time, we get an idea of which features are being talked about more than others, which types of users are having difficulties with a certain set of features, and who are the most vocal about their experience. This is possible because over time, we are able to refine our filters to look at the most relevant data, and then we are able to revise and refine our analytics model to expose key combinations of parameters that we are interested in.
The supposition was that as we continued to refine a data source (over time), the more valuable the residual, or resulting, dataset would become. Perhaps that same diagram can be drawn with respect to time, as shown in Figure 4.1. The more temporal the data becomes, the wider view we obtain, and therefore the greater understanding can be derived. If we are focusing on a given data source, we can improve our level of understanding of that dataset by not only gaining a wider understanding, but perhaps by modifying our perception of the data as it evolves over time. For example, if we are monitoring conversations in a community about a new email system that has been rolled out to all employees, over a period of time, we get an idea of which features are being talked about more than others, which types of users are having difficulties with a certain set of features, and who are the most vocal about their experience. This is possible because over time, we are able to refine our filters to look at the most relevant data, and then we are able to revise and refine our analytics model to expose key combinations of parameters that we are interested in.
Often, when we are looking at trends or predictive behavior, we are looking at a series of descriptive statistics over time. Think of a trend line, which is typically a time series model (or any predictive model) that summarizes the past pattern of a particular metric. While these models of data can be used to summarize existing data, what makes them more powerful is the fact that they act a model that we can use to extrapolate to a future time when data doesn’t yet exist. This extrapolation is what we might call “forecasting.”
Trend forecasting is a method of quantitative forecasting in which we make predictions about future events based on tangible (real) data from the past. Here, we use time series data, which is data for which the value for a particular metric is known over different points in time. As shown in Figure 4.2, some numerical data (or descriptive statistic) is plotted on a graph. Here, the horizontal x-axis represents time, and the y-axis represents some specific value or concept we are trying to predict, such as volume of mentions or, in this case, consumer sentiment over time. Several different types of patterns tend to appear on a time-series graph.
Trend forecasting is a method of quantitative forecasting in which we make predictions about future events based on tangible (real) data from the past. Here, we use time series data, which is data for which the value for a particular metric is known over different points in time. As shown in Figure 4.2, some numerical data (or descriptive statistic) is plotted on a graph. Here, the horizontal x-axis represents time, and the y-axis represents some specific value or concept we are trying to predict, such as volume of mentions or, in this case, consumer sentiment over time. Several different types of patterns tend to appear on a time-series graph.
What’s interesting in this graph isn’t the size (or amount) of positive, negative, or neutral sentiment over the month of March, but the pattern that seems to emerge when looking at this data over time. This analysis of industrywide mentions of topics in and around cloud computing was done by Shara LY Wong of IBM Singapore (under the direction of Matt Ganis as part of a mentoring program inside of IBM). A close inspection of this temporal representation of the data shows that every 10 days or so, there is a peak of activity around the topic. All three dimensions of sentiment (positive, negative, or neutral) seem to peak and rise at the same time in a fairly regular manner. Given that the leader of the metrics in all cases is the line representing neutral sentiment, we were able to quickly determine that on a very systematic schedule, there are a number of vendor advertisements around the topic, most with a slant toward positive sentiment. One anomaly appears to be the March 21 discussions that generated not only the largest spike in discussion, but also the largest negative reaction by far.
The essence of predictive analytics, in general, is that we use existing data to build a model. Then we use that model to make predictions about information that may not yet exist. So predictive analytics is all about using data we have to predict data that we don’t have. Consider the view of consumer sentiment about a particular food brand over time (in this case, a 20-month period). We’ve plotted the values obtained for the positive sentiment with respect to time and done a least-squares fit to obtain the trend line (see Figure 4.3). The solid line in this figure is drawn through each observation over a 20-month period. The broken line is derived using a well-known statistical technique called linear regression analysis that uses a method called least-squares fit. In this simplistic example, it is easy to see that the trend line can be utilized to predict the value of positive sentiment in future months (like months 21 and 22 and so on). We can have confidence in this prediction because the predictions seem to have been quite close to reality in the past 20 months.
The essence of predictive analytics, in general, is that we use existing data to build a model. Then we use that model to make predictions about information that may not yet exist. So predictive analytics is all about using data we have to predict data that we don’t have. Consider the view of consumer sentiment about a particular food brand over time (in this case, a 20-month period). We’ve plotted the values obtained for the positive sentiment with respect to time and done a least-squares fit to obtain the trend line (see Figure 4.3). The solid line in this figure is drawn through each observation over a 20-month period. The broken line is derived using a well-known statistical technique called linear regression analysis that uses a method called least-squares fit. In this simplistic example, it is easy to see that the trend line can be utilized to predict the value of positive sentiment in future months (like months 21 and 22 and so on). We can have confidence in this prediction because the predictions seem to have been quite close to reality in the past 20 months.
From high school math, this derived trend line is nothing more than the familiar equation for a straight line:
where the value of m is the slope of the line and b represents the y-intercept (or the point where x—in this case, time—is zero) are constants. From this, we can examine any time in the future, by substituting a value for x (in this case, we would use a value of x greater than 20 to look into the future, since anything less than 20 is already understood) and, with a fairly good level of confidence, predict the amount of positive consumer sentiment (the value of y). While simple in nature, this is a perfect example of a predictive model.
If we want to know what the sentiment will be around this brand over the next 24 to 36 months (assuming conditions don’t change), this simple relationship can be a “predictor” for us. As we said previously, it’s not a guarantee, but a prediction based on prior knowledge and trending of the data.
If we want to know what the sentiment will be around this brand over the next 24 to 36 months (assuming conditions don’t change), this simple relationship can be a “predictor” for us. As we said previously, it’s not a guarantee, but a prediction based on prior knowledge and trending of the data.
Descriptive Analytics
When we use the term descriptive analytics, what we should think about is this: What attributes would we use to describe what is contained in this specific sample of data or rather, how can we summarize the dataset?
To further illustrate the concept of descriptive analytics, we use the results from a system called Simple Social Metrics that we developed at IBM. It’s nothing more than a system that “follows” a filtered set of Twitter traffic and attempts to provide some kind of quantitative description of the data that was collected.
In this example, we use a dataset of tweets made by IBMers who are members of IBM’s Academy of Technology. The IBM Academy of Technology is a society of IBM technical leaders organized to advance the understanding of key technical areas, to improve communications in and development of IBM’s global technical community, and to engage its clients in technical pursuits of common value. These are some of IBM’s top technical minds, so an analysis of their conversation could be quite useful.
One of the first questions we want to answer is this: “Who is contributing the most?” or rather, “Who is tweeting the most or being tweeted about?” One way to do this is to analyze the number of contributions. We simply call this the “top authors,” and for the month of November, a breakdown of the top contributors looked something like that shown in Figure 4.4. Even though this diagram tells us which author had the most number of tweets, we need to go beyond the machine-based analysis and leverage human analysis to determine who really “contributed” the most.
To further illustrate the concept of descriptive analytics, we use the results from a system called Simple Social Metrics that we developed at IBM. It’s nothing more than a system that “follows” a filtered set of Twitter traffic and attempts to provide some kind of quantitative description of the data that was collected.
In this example, we use a dataset of tweets made by IBMers who are members of IBM’s Academy of Technology. The IBM Academy of Technology is a society of IBM technical leaders organized to advance the understanding of key technical areas, to improve communications in and development of IBM’s global technical community, and to engage its clients in technical pursuits of common value. These are some of IBM’s top technical minds, so an analysis of their conversation could be quite useful.
One of the first questions we want to answer is this: “Who is contributing the most?” or rather, “Who is tweeting the most or being tweeted about?” One way to do this is to analyze the number of contributions. We simply call this the “top authors,” and for the month of November, a breakdown of the top contributors looked something like that shown in Figure 4.4. Even though this diagram tells us which author had the most number of tweets, we need to go beyond the machine-based analysis and leverage human analysis to determine who really “contributed” the most.
While this data is interesting, we need to remember that these types of descriptive metrics represent just a summary over a given point in time. The view could be quite different if we look at the data and take the time frame into consideration. For example, consider the same data, but a view of the whole month versus the last half of the month (see Figure 4.5).
An important fact that comes across here is that one of the users, kmarzantowicz, came on strong during the last half of the month with a heavy amount of tweeting to move into the top five of all individuals. Perhaps this person was attending a conference and tweeting about various presentations or speeches; or perhaps this person said something intriguing and there was a flurry of activity around him or her. From an analyst’s perspective, it would be interesting to pull the conversation that was generated by that user for the last 15 days of the month to understand why there was such a large upsurge in traffic.
COMMENTS