Domain of social media data analysis

The domain of social media data analysis is broadly classified into the following two categories: External social media In this post, ...

The domain of social media data analysis is broadly classified into the following two categories:

External social media

In this post, we have spent quite a bit of time describing and focusing on external social media. Most of the time when people use the term social media, they usually mean external social media. This includes content generated on popular social media sites such as Twitter, Facebook, LinkedIn, and so on.

Internal social media

For a number of years now, many companies have been investing in enterprise social networks as a way to open communication channels between employees.

An enterprise social network (ESN) is an internal, private social network used to assist communication within a business [5]. As the number of companies investing in ESNs grows, employees are discovering more and more ways to conduct business in a more social and “open” way. According to an IDC Study in June 2013, about 79% of the companies surveyed had enterprise social networks [6]. Internal social media, also known as enterprise social media, refers to the variety of contributions (blogs, forums, communities, and so on) made by company employees with each other utilizing ESNs as the platform for communication.

social media data analysis

External Social Media

Let’s first focus on the external social media domain. There are two broad analysis types based on whether the data is at rest or in motion. And, in each of these cases, we consider use cases for simple social metrics, ad hoc analysis, and deep analysis.

Data in Motion

Earlier we described the velocity of data, or the rate at which new data arrives. The term velocity implies something in motion (such as the arrival of new entries in big data social media analytics stream). Consider the buildup to a major sporting event, say the finals of a World Cup Soccer match. Prior to the event, there may be a small amount of discussion leading up to the match, but as the day of the match approaches, the amount of conversation around the topic will grow, sometimes to a feverish pitch. This increased rate of arrival of motions is what we refer to as the velocity of the data, or its increased rate of motion.

Simple Social Metrics (SSM)

There are several use cases in which we may want to understand what is happening in real time. For example, consider the host of a large conference or trade show that is attended by customers, press, and industry insiders. The success or failure of such an event can be critical to the hosting organization. During the conference, a dedicated team of customer service professionals may be able to watch a live Twitter feed to stay on the lookout for any tweets related to a customer service or dissatisfaction issue. In another case at this same conference, a technology consulting firm that specializes in fraud prevention in the financial industry may be looking for leads by looking at any tweets that mention the terms fraud, financial, bank, and so on. In this use case, the focus is on timely processing of information that is streaming through. In both of these examples, the need for real-time alerts or social data analysis is required to complete these tasks.

In thinking about the machine capacity and the real-time nature of the data, the network bandwidth and the CPU capacity required for this type of analysis can be quite high. By network capacity, we are referring to how fast of a network connection is required to keep up with the data rates from a high-velocity feed. For example, a typical T1 network connection (about 187,500 characters per second) would be consumed (fully used) if the velocity of a Twitter feed exceeded approximately 1,340 tweets per second.

CPU bandwidth, on the other hand, is the amount of compute time needed to perform the text analytics on the tweets received from a feed. If the CPU can’t “keep up” with the arrival rate of the data, the ability to analyze tweets in real time suffers as the data “queues” up waiting for CPU to free up for the next analysis.

If, instead of real time, we are allowed to have a delay of one to five minutes (near real time), the network and CPU requirements become more moderate. The reason is that we are allowed to process the tweets we’ve already received in whatever time we have available.

Ad Hoc Analysis and Deep Analysis

Because of the short amount of time available for processing, deep analysis or ad hoc analysis usually is not possible in such cases. In our taxonomy, we use the term ad hoc analysis to describe an analysis that is produced one time to answer a single specific business question. This type of analysis refers to dealing with situations as they occur rather than ones that are repeated on a regular basis. The assumption is that there is a data store or collection of raw data that these queries or analysis can “run over.”

Data at Rest

Data at rest refers to use cases in which data has already been accumulated. This can include data from the past day, week, month, or year. This also includes custom windows of time—for example, social media data around a “water day” event in South Africa several months back for a duration of one month.

Simple Social Metrics (SSM)

SSM analysis is characterized by simple metrics, computations, and analytics. The focus is on generating some quick results. Following are some sample use cases with the SSM type of analysis:

Duration of analysis—1 day
During the IBM Insight 2014 conference in Las Vegas, at the end of the day, we wanted to identify the top hashtags, top mentions, and top authors in discussions around the topics cloud, analytics, mobile, social, and security.

Machine capacity—The network bandwidth and the CPU capacity required for this type of analysis are low.

Ad Hoc Analysis

Duration of analysis—1 month
During the first month after the release of a new version of an IBM Software product, we wanted to understand the trend of “volume of conversations.”

Duration of analysis—3 months
For a given IBM Software product, during any continuous time period, we may want to understand the “volume of conversations” around IBM. This analysis is often called the “share of voice” in a conversation, or how much of the conversation includes mentions of, say, IBM, and how much contains mentions of its competitors.

Machine capacity—The network bandwidth required is quite low, but the CPU capacity is typically low to moderate, depending on the amount of total data that we will be processing.



Analytics Case Study Content Experience How-To Mobile Marketing Social Media Strategy Strategy
The Digital Media Strategy Blog: Domain of social media data analysis
Domain of social media data analysis
The Digital Media Strategy Blog
Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy