Tuesday, February 14, 2012

Unstructured data is a myth

Couldn't resist that headline! But seriously, if you peel the proverbial onion enough, you will see that the lack of tools to discover / analyze the structure of that data is the truth behind the opaqueness that is implied by calling the data "unstructured".

The need to take a deeper look at this? See this graph:
A lot of data growth is happening around these so-called unstructured data types. Enterprises which manage to automate the collection, organization and analysis of these data types, will derive competitive advantage.

Every data element does mean something, though what it means may not always be relevant for you. Let me explain with common data sets which are currently labeled "unstructured".

  • Text: Lets start with the subsets in here. 
    • Machine generated data (sensors, etc) definitely can be deciphered once you get the meta data structures / templates that the machine uses to generate the data. Of course, some of the fields in the stream will need more advanced analysis/discovery capabilities to automate the analysis.
    • Interaction Data: This is the case for social media data where a lot of business value lies in the long open text fields where people express sentiment about other people and products. To automate the analysis of these, entity recognition and semantic analysis provide the ability to understand the data better. In other words, if you can represent the text data as a collection of entities, relationships between them and relationship attributes like sentiment, you are much closer to analyze the data than you might think!
  • Images: Image recognition algorithms have almost become mainstream (though not very well-received as seen in the reservations against Google and Facebook deploying these at scale). Again, these techniques yield entities though deriving relationships and sentiment are much more challenging.
  • Audio: Again a lot of research is yielding technology which can decipher the content of audio streams and even annotate the resultant content with mood of the speaker! You could then leverage the text analysis techniques to get closer to the analyzable data.
  • Video: Unarguably, this is the most challenging data type due to the sheer volume of data that needs to be handled. Image recognition techniques can be applied per frame or a series of frames to extract entities. Of course, deciphering the action (the video content) is further out in the future. Audio recognition can be applied to understand part of the "action" content.
Based on the above, some new data handling and analysis capabilities are required to extract more value out of these new data types.
  • Dynamic Meta data discovery: This is mainly for text data. This includes the ability to
    • Dynamically derive meta data out of sample result sets e.g. new REST end points
    • Maintain / Master metadata on an ongoing basis
    • At run time, choose the appropriate / best matching metadata set out of several possible options
  • Taxonomy Setup: You need to be able to capture / represent your business and its entities for other analysis layers to reference and annotate incoming data. As your business evolves, this taxonomy will get richer.
  • Entity Extraction and Semantic Analysis: This provides the ability to apply the taxonomy to any text data stream and derive entities and relationships expressed in that stream. This analysis can then be stored either in a relational database or as a graph.
  • Multimedia Recognition Techniques: As described earlier, various techniques for deciphering the content of images, audio and video are required to analyze these data types.
The layering is along the following lines:

A lot of action is still on the top layers but eventually it will encompass audio and video as well.

Do you still believe all of this data deserves the opaque sounding "unstructured" tag? Are you building the capabilities to put the structure back into this data?

_______________________________________________________________________________
Ram Subramanyam Gopalan - Product Management at Informatica
My LinkedIn profile | Follow me on Twitter
Views expressed here are personal and do not necessarily represent those of Informatica.
_______________________________________________________________________________

Friday, February 10, 2012

Critical path capabilities on your social integration journey

Combining the Social Integration Journey and the basic building blocks of your solution, let us look at what could be the capabilities that figure in the critical path of this journey.

The objective is to help you decide on where to invest your efforts if you are building out a solution on your own for your enterprise or if you are looking at buying a solution, what would be the critical capabilities depending on where you are, on your social journey.

Aggregating the functionality across the basic building blocks, the key capabilities are:

  • Wide Social Data Source Coverage: For Listening and Monitoring, it is essential to "cast the net wide". I would go as far as to say that you should in fact include search engine results as a key component of discovering the hot spots of relevant activity on social media! You should look for support for both API-based collection as well as Web content extraction (which has definitely become way more involved than what used to be brute-force scraping techniques). Remember that the APIs are still evolving fairly rapidly and the solution should be able to evolve at the same pace too. You might also need historical data for certain use cases.
  • High Data Volumes: As a corollary of the wide coverage, you will also need the ability to handle large raw data sets. You might also have to handle real-time streaming sources (which are being recommended by the social networks more) for large data sets. Aggregators like Gnip and DataSift also provide streaming for large result sets.
  • Data Quality/Cleansing: To improve the Signal-to-noise ratio in the raw data set, you should be able to apply tough data cleansing/filtering rules. These could be in the form of entity recognition and matching thresholds. This could also involve use case based relevance rules for e.g. if you are looking to build the network profile of a customer, you might not be interested in the details / sentiment of their activity stream. You should be able to leverage a library of DQ rules if possible.
  • Text Analytics: You will need powerful semantic and sentiment analysis capabilities to infer key signals from all the data flowing through the system. If you operate in multiple geographies, you will need the ability to do this analysis across multiple languages.
  • Enterprise Data Access: A lot of value lies untapped in the intersection of the social and enterprise data domains. You should be able to seamlessly work with CRM, ERP, PLM and MDM system data as you add the social dimension to the data. 
  • Collaboration: As you move further on your social journey, it is important to facilitate collaboration both among employees and between employees and customers. At the minimum, your solution should be able to interface with existing / established collaboration systems so that end-users do not need to switch between multiple screens to share/consume data.
  • Publishing: Content is king in social media. Community building is the queen probably! You need well-integrated Content Publishing capabilities or at least the ability to reference/identify content items in your overall solution for end-to-end analysis of results. You will also need community platforms where you can engage and innovate with highly influential customers and influencers. 
Here is my take on the critical path of capabilities:


What do you think are the capabilities on the critical path of your social enterprise journey?




Monday, January 30, 2012

How would your enterprise's social graph look like?

Imagine having a Facebook-like rich social graph tailor-made for your company with just the right information about  relevant entities and their activities being captured and maintained. You can derive accurate insights about your customers; excel in marketing with highly segmented campaigns using personalized content; get better returns on marketing by focusing on influencers; convert more of the resulting leads with personalized offers and strong referrals; get the best return on support by focusing on the key (vociferous?) customer segments; improve key product dimensions by analyzing feedback; keep a tab on competitors in the context of your most profitable products; ... If you are getting ideas, read on!

Before you do anything else, convince yourself that you might be unknowingly taking undue risk by hinging your enterprise wagon entirely on data from third-party social networks - here are a few factors you might want to consider.

Assuming you are at least ready to weigh the benefits and costs of building your own enterprise social graph, here are some seeds for your thought process. 

What would an Enterprise Social Graph (ESG) look like?
The ESG is a highly interrelated graph that comprises 
  1. Entities like people (customers, employees), companies (the enterprise itself, competitors, partners, suppliers), products (those owned by the enterprise and its competitors)
  2. Defined Relationships among these entities
  3. Activities with one or more entities as actors and/or subjects - Documents can represent these activities
Though the ESG does not really have layers or strata, it would be useful to visualize it as a layered graph with connections running between nodes in the same layer as well as across layers.

Starting with the Enterprise Graph
If you represented the relationships between entities that can be gleaned from enterprise systems (not knowledge hiding in employees" minds!) as a graph, it might look something like this:



You might have noted the following:

  • Enterprise Entities:The first layer is illustrative of the kinds of entities that are typically seen within the enterprise. The taxonomy of the entities will be industry-specific and the canvas is as large as it can get, with technologies, partners/suppliers and competitors all coming into play.
  • Activities: The kinds of actvities that are captured correspond to one of those in CRM, HR, PLM and ERP systems. You can see that this already catches relationships like "Customer A owns Product P", "Customer B complained about product Q", "Employee E manages Employee F", "Employee G knows Influencer I", etc.
  • People: Several links between people are missing in this and that is deliberate. e.g. relationships between influencers and employees/competitors. You might argue that someone in the company "knows" about this but the counter to that is if you ran an algorithm to weigh the influence of that person, we would miss out on this fact! In general, as you move further away from your employees and customer entities, you have lesser information on people and activities available for analysis.
  • Mostly the relationships captured in this version of the graph (no social dimension to the entities) are less dynamic (e.g. something more defined like a Product-Technology hierarchy). Many enterprises do not have the capabilities to process data such as Survey free-form fields, chat transcripts, etc which are captured and stored in the enterprise systems.

Adding the Social to the Enterprise Graph
Lets add the social dimension which means we incorporate the following additional information that we can glean from social media monitoring:
  • People's likes and dislikes; skills, preferences
  • People's personal and professional connections
  • Social Media activities of people and companies around technologies and products of your interest
  • Competitive moves and directions
  • Motives, Intents and Drivers for people's buying behavior
  • and so on ...

Adding just a few illustrative examples from the above list, the ESG might now look like this:

As you can see, this graph holds information for use within the enterprise as well as the external world. You can start to see missing links in your corporate puzzle that might explain several business trends that you see in your traditional BI systems but can't find reasonable causes and remedies for!

How to build and use the ESG?
You can imagine this to grow immensely dense for large enterprises or even for small enterprises which want to include more information in this graph. Beyond a point, the scale of this graph will necessitate Big Data capable solutions. You can of course start small and build this graph over time but eventually you should plan for large scale graphs especially if you want to (and you should) do temporal analysis of trends.

Looking at the building blocks for a social data integration solution, the ones that will be key for building the ESG are the Text Analytics modules and the infrastructure components to reflect the "graph" nature of the information. Topic for a joint follow-up post from our dev team and me!

As a parting thought, there are several vendors with patents around this area, but not very many published success stories of enterprises building powerful social graphs. (If you know of any, please do leave a link or two in the comments or tweet them to @ramsgopa).

There are two ways to react to that fact - either wait for other enterprises to taste success (and then follow them) or be one of them. How will you react?


_______________________________________________________________________________
Ram Subramanyam Gopalan - Product Management at Informatica
My LinkedIn profile | Follow me on Twitter
Views expressed here are personal and do not necessarily represent those of Informatica.
_______________________________________________________________________________

Thursday, January 26, 2012

Do not depend on Facebook (alone) for customer insights

Diving deeper into one of the key areas where all enterprises can find value from integrating social media, customer profile enrichment with the social dimension can help several business functions.

Even though I have been thinking that companies need to maintain their own customer profiles, I got emboldened to take a stronger stance, by this white paper (Regn./PDF) from @gxsoftware on a related subject of avoiding over-dependence on external social networks like Facebook to gain deeper insights on their customers.

Though my headline has Facebook, this applies to all of external social media sources.

Factors to consider
Here are the factors you should consider in your customer profile building strategy:
  1. Your data needs may not be easily satisfied: Though the social networks do try and expose as much data as possible through APIs and Aggregators, you might not get the data in the most efficient manner for your specific needs. There are myriad restrictions on APIs (data volumes, number of calls at various levels like IP Address, App, API and User) which need to be negotiated through to get the data. Also, not every type of activity on the social network might be of interest to you.
  2. Privacy policies unclear and likely to toughen up: As Ray Wang lays out demands in his recent post, users are likely to press for better privacy controls on their favorite destinations like Facebook. In fact, there are startups offering fine grained user-driven privacy settings e.g. Diaspora. So you should realistically expect unpleasant surprises on non-availability of data that you assumed will be available forever!
  3. Data Ownership is also unclear: Even if you had the data available, the social media that hosts the data and/or the end-users may assert uncomfortable-sounding rights on the data and its permitted usage/storage etc. This may lead to expensive workarounds/solutions to adhere to these rules if you haven't taken care to design it upfront.
  4. Data access may become pricey: Finally, even as you work through these issues, you might end up staring at steep fees to access the data. After all, Facebook and the like will monetize the data in as many avenues as they can (advertising, data access, etc). 
The WP lists a few other factors like losing competitive advantage since everyone can get the same data access - these are valid factors too but they are almost a given in the social world.

What can you do?
As you might have realized, many of the factors are beyond the enterprise's control. So the enterprise should have a plan of action to mitigate the downsides. Some of the elements in that plan could be:
  1. Start Listening now: Even as the privacy and data ownership rules continue to toughen up (and they should), you should be collecting as much data as possible (while of course still adhering to the T&C's of the social media!) by engaging with customers and listening to their activity streams. Start your social media integration journey now.
  2. Look at a policy-driven enterprise social platform: As you start listening, remember the value of an enterprise-wide platform leveraged by multiple business functions for multiple use cases. One of the first issues you will have to solve is to gain connectivity to the social media (see factor#1). In a subsequent post, I will describe the challenges in this area and what to look for in a possible solution. The platform should ideally have policy-driven (and automated) models pertaining to data ownership, access rights and storage restrictions e.g. you should be able to dynamically model a source as having members-only access with data storage restrictions on certain fields in the incoming data. The platform should be able to enforce the data access and data archival based on these settings. At scale, poilcy-driven automation is key.
  3. Create your own Enterprise Social Graph: We shouldn't expect others to do the heavy lifting for our business needs. You should evaluate how you can mashup enterprise and social dimensions for your key entities like Customers, Products and Suppliers. If you already have a MDM-based Customer/Product Hub, it would provide an ideal platform for this. What you might end up with is a logical graph which captures the dense relationships between and among these entities. Ideally, all your business cases should be able to query this graph to get the required information. This is truly Big Data territory - a salivating possibility for another post!
  4. Invest in community building: Now that your customers see enough value in engaging with you across multiple channels (mainly social media since it is ubiquitous), you should be able to invite them to participate in your own communities or at least on forums where you have access and maybe even ownership of all the data. The data access terms are now more favorable to you.
  5. Leverage this graph in CEM and PLM: The Enterprise Social Graph can now be even more relevant and enriched with very detailed customer profile information. The rich profiles that this graph provides for your customers and products is ripe for leveraging in improving the customer experience and your products.
What is your take on how the data access situation will play out in the medium-term? Where are you storing / planning to store the social dimension of your customers and products?


_______________________________________________________________________________
Ram Subramanyam Gopalan - Product Management at Informatica
My LinkedIn profile | Follow me on Twitter
Views expressed here are personal and do not necessarily represent those of Informatica.
_______________________________________________________________________________

Monday, January 23, 2012

Building blocks for your Social Data Integration solution

In the previous two posts, I have presented a basic approach for enterprises to embark on the Social Media integration journey and took a stab at how to view the wide array of use cases through the lens of data domains and departmental functions. Ideally, a long-term solution - what we might lengthily call a Enterprise Social Data Integration Platform - should support all four categories of use cases and also all (as many as possible!) departments in the enterprise.

Venturing into system building territory, a basic functional flow will help in slotting the basic building blocks of such a solution. In essence, such a flow comprises:

  • Setting up listening and publishing posts on various social media: This is a continuous process of identifying hot spots of relevant activity in the social world and setting up a process to listen and participate in the most effective manner. It is interesting to note the trend of such connectivity needing to go beyond the top-3 or even top-10 social networks to very niche, highly-specialized discussion forums and industry blogs. Do you have a well-defined and maintained list of the top social media hot-spots for your company or even industry?
  • Collecting and optionally staging raw data: As the raw data comes in from various sources, it needs to be collected and fed into further analysis steps to start extracting value from it. Each source comes with its own standards and formats for the data as well as myriad other considerations like security credentials, API rate limits, automated agent limitations, etc. There are emerging standards like Activity Streams and OpenSocial but there is no broad mainstream convergence on any of these yet. So for the foreseeable future, the solution needs to work with all the low-level complexities of social data.
  • Cleanse and prepare the raw data for analysis: Once the raw data is collected, it is imperative to improve the signal-to-noise ratio before doing heavy-duty analysis on the data. Traditional Data Quality methods will have to be adapted to look at Entity extraction based relevance scores and removing data sets falling below a threshold value of relevance. 
  • Generate Insights out of the raw data: Depending on the use case, specific text analytics/algorithms and business analytics routines will have to be applied to the cleansed data. Basic routines would include Semantic Analysis, Entity Extraction, Sentiment Analysis and Influence Analysis which would apply to individual "records" (an "activity" in the social world). These routines add annotations, if you will, to these activity records. These records and their annotations can then be summarized and sliced/diced depending on the end business use case. For some use cases, complex event processing will be required to quickly correlate events in discrete systems to find emerging patterns.
  • Most importantly, Evoke Actions: Of course, all the analysis in the world is useless, if you do not act on it (which does include getting to the conclusion that no action is required!). In the social world, action can manifest in traditional enterprise systems and channels in various ways like sharing content, designing more refined customer experiences, building communities and collaborating both internally and externally.
  • Enterprise Data Integration: This will be required to be integrated either as input or as a target for insights/enriched data. As you might recall from the data domains discussion, the most valuable use cases are in the intersection of enterprise and social data. As an example, for people related information, identity matching between CRM records and social handles provides very powerful capabilities to understanding them better. As is evident from this example, seamless integration with enterprise systems and social media is required for such mashup analyses.

Much of this flow applies to all use cases. Of course, you should close the loop eventually by feeding back inputs from each downstream stage to the listening stage!

The basic building blocks of such a solution can then be derived from this functional flow:

There is an interesting component - Information Lifecycle Management (ILM) - which is often an afterthought in such solutions, but shouldn't be! ILM becomes vitally important in Big Data scenarios (of which the social data is definitely one). ILM helps in policy-based automated management of data lifecycle - creation, validation, integration, archival, deletion.

Do you see any important pieces missing in this list? What are the key issues / success factors that you see with these components?

In subsequent posts, we will look at some interesting issues in some of these components.


_______________________________________________________________________________
Ram Subramanyam Gopalan - Product Management at Informatica
My LinkedIn profile | Follow me on Twitter
Views expressed here are personal and do not necessarily represent those of Informatica.
_______________________________________________________________________________

Monday, January 16, 2012

Value is in the intersection of social and enterprise data

Continuing on from the previous post, where a phased approach to social media was talked about, I wanted to share my thoughts on the business use cases that enterprises can realize through this journey.

Before jumping into that, I do want to set the context for these posts. I believe every participant in any market needs to have their own perspective on it. So with social media integration - customers, industry analysts, business consultants, systems integrators, enterprises and solution providers all have a point of view on it. As you might have noticed, this is an attempt at a deliberate first principles based thought process which I believe is key for any enterprise software vendor to succeed in any segment especially one which is this dynamic.

That out of the way, let's take a look at the spread of cross-functional use cases across these stages. This list of use cases is mainly attributed to the March 2010 post by Jeremiah Owyang and Ray Wang and a follow-up from Ray in August 2011 (thanks to both!).

This is not exhaustive by any means but a good starting point for enterprises. These use cases span the continuum of two key data domains:
  1. Enterprise data (Transaction, Structured)
  2. Soicial data (Interaction, Less Structured)
The expectation is that the value of integrating these two domains increases as the overlap increases and is bolstered by collaboration (internal and external). The value of using social data is incremental to what is already achieved using Traditional BI.


There are additional synergies that can be gained by sharing the underlying analytics infrastructure across multiple functions.

Stretching this thought a bit further, in order for enterprises to prioritize their data integration efforts for these use cases, there are four categories that seem to emerge:
  1. Traditional BI - Only focuses on internal enterprise data
  2. "Inside-Out" - Starts with enterprise data and enriches it with social data
  3. "Outside-In" - Starts with an event/trend of interest on the social side and then prods action from the enterprise side
  4. "Outside-only" - Focuses on only social data to glean insights
As one would expect, the majority of high-value use cases are those that move from (1) to (2) and (3) - the social dimension gives new insights into enterprise trends. That said, it would be ideal for enterprises to leverage the same solution to cater to all four categories of use cases.


Interestingly, there is another (super)-dimension to this data domain discussion which is the structured to unstructured continuum that exists in both the social and enterprise data domains. Ideally any solution that you work with should be able to handle this whole spectrum of data - tall order indeed but a worthy objective.

This kind of framework does help us (vendors) to know our strengths and opportunities better. Hopefully this will add a few dimensions to the enterprises' thought process on what categories of use cases they want to go after.

It would be interesting to hear your insights and thoughts on this. Do leave your comments here or ping me on Twitter (@ramsgopa).

Now that we have given some thought to the business side of things, I will venture into the solution components (Informatica as a software vendor do need to build a solution finally!) in my next post.


_______________________________________________________________________________
Ram Subramanyam Gopalan - Product Management at Informatica
My LinkedIn profile | Follow me on Twitter
Views expressed here are personal and do not necessarily represent those of Informatica.
_______________________________________________________________________________


Wednesday, January 11, 2012

Social Data Integration journey - Where are you on it?

There are several "Stages of Evolution" thoughts around how enterprises can go about ingesting social data into their business DNA. Adding to these, from my experience creating CRM and data integration products, here is an outside-in approach for enterprises towards social media.

Companies must first map the customer journey through various social media as they interact with the company. The following depicts such a typical journey (which, in most part, applies to both B2B and B2C businesses):

Note that some of these "Events / Triggers" are actually a cumulative experience for your customers.

Once this map is created, companies can decide on how to start their own journey. A phased approach is prudent especially in the social media arena where missteps can get amplified quickly. The following illustrates this:


Importantly, this is an additive approach in that companies will have to continue to listen and monitor on a broad set of social media while participating and innovating on the most effective among these social media streams.

How are you approaching this massive opportunity? Where is your company on this journey?

In subsequent posts, I'll share my thoughts on how enterprises can derive the most of their investments in social media.


_______________________________________________________________________________
Ram Subramanyam Gopalan - Product Management at Informatica
My LinkedIn profile | Follow me on Twitter
Views expressed here are personal and do not necessarily represent those of Informatica.
_______________________________________________________________________________