(See presentation on this topic at http://www.slideshare.net/RamSangireddy/data-analyticsproduct-practices)
As the business case is constructed to develop a data analytics platform product, as the product sponsor and other primary stakeholders give a go ahead based on estimated market potential, ROI, and other economic factors, the wheels are set in motion for the actual product development and its commercialization. From my experience, here are the steps that effective product managers follow through the development life cycle:
1. Gather Use Case Scenarios: While this is a continuous as well as an iterative process through the life cycle, the product manager (PM) has to have an anchor to initiate the product development. The PM identifies key use cases of the product (possible to have some already identified in business case) that will define its essential features and functionality. A good practice is to have at least 3-5 such business applications for the product to start with the development. A rolling wave planning can be adopted as the product development progresses, and more clarity on the end product applications is obtained. Considering an example product for discussion here, if the overall intent of the data analytics platform being conceived is for personalization of content and/or contextualized search, the few use cases can be in the space of “conversion of customer intent to transaction”, “personalized product/service recommendations”, “delivering competency-based adaptive content”, etc., depending on the client’s industry/vertical application.
2. Identify Key Features/Functions: This is the stage in which the product scope and its roadmap are defined with its essential features and functions identified from the primary use case applications and other stake holder requirements. The PM closely works primarily with the architect and the stakeholders, and if necessary the development team, through this process for an effective depiction of the product scope at the business and engineering architecture levels. For the example analytics platform discussed above, the capabilities can include product/service recommendations to customer based on her location or shopping/purchase history, recommendations based on interests/hobbies/habits, content access based on specific skills, etc.
3. Gather Data Requirements, Specifications, and Formats: The applications and functionality of the product leads the PM and team to identify the required data (e.g., customer transactions, inventory, price logs, resource utilization data, customer traffic, etc.) and specifications (volume/size, variety, velocity/streaming, structured/unstructured, etc.), and formats (numerical, text, voice, video, image, etc.). Through this process, the PM also has to identify internal and external sources for the required data, the barriers and costs to acquire the data, the potential challenges in integrating the source APIs with the product platform input APIs, and all other related issues.
4. Develop Data Warehousing Methods: The PM should have a good understanding of the capabilities available to warehouse, ingest, and manage the data as well as the extended capabilities to be built for the purpose. The warehousing of the data includes the extraction, transformation, and loading of the data for subsequent analyses. The ingestion of business/structured data is usually done with the traditional enterprise data warehouse (EDW), while the unstructured data such as the customer activity (on web-store, social media, etc.) written into log files is warehoused in the Hadoop. A good knowledge of the star-schema for the traditional EDW helps PM to better handle issues during the architecture, design, and development phases. The best practice starts with domain modeling, contextual modeling, and data modeling at logical and physical layers. Success of the platform also depends on choosing the right EDW tool for the right job: the PM and team usually have the choice among IBM’s DB2 or IBM’s Netezza appliance (or both combined) or MPP such as Teradata or a columnar database such as Vertica.
Besides, a good grasp on the requirements for building a suitable Hadoop cluster (capacity, number of nodes, block replication, number of users, etc.) will help the PM work with the architect and the product development teams to build appropriate data warehouse systems. The PM’s knowledge in capabilities required to facilitate the analyses of data across traditional EDW and Hadoop will further augment the team’s expertise (for example, a cross analysis on a customer segment data in EDW and the customer activity in Hadoop cluster to derive a behavior pattern for a particular customer category requires such a capability, and is a common requirement now a days).
5. Develop Data Mining Techniques: As the data warehousing systems and methods are established, the PM and team of data scientists can identify and develop necessary data mining techniques. For the example analytics platform discussed above, some of the data mining techniques include Frequency analysis, Collaborative filtering, Causal analysis, Matrix factorization, Association rule mining, Time series analysis, K-Means or Hierarchical clustering, Regression analysis, Bayesian networks, etc.
6. Develop Reporting Layer: As the platform with analytics engines is built, the insights from analytics have to be displayed/reported and the interface for display depends on the users. In case the users are data scientists internal to the organization, usually the delivery ends at providing the platform, data, reporting & visualization tools (Tableau, Cognos, Datameer, etc.), and plugins/connectors to statistical analysis tools such as R and S, that are highly popular with data scientists. Data curation is also vital to deliver the right data and insights in the right format/structure at the right time to the right stakeholders, so that timely business action is taken. On the other hand, if the end users are individual consumers, then appropriate efforts are to be made to build a sleek interface to deliver the user specific content.
The product development life cycle does not end here. Once the product is launched, information on its adoption, feature usage, and user experiences is gathered and fed back for further evolving the product. And so, the Plan-Do-Check-Act cycle continues.
For a presentation based discussion on this topic, please visit http://www.slideshare.net/RamSangireddy/data-analyticsproduct-practices
Thanks for reading. Comments to improve the content and enrich the discussion are most welcome.