More often than not, the analytics users and developers focus on answering “what” questions: “what happened?” or “what is happening?”, depending on the nature of data being handled, be it historical or real-time streaming. Regardless of the temporal aspect, the “what” analysis provides only a small part of a bigger story. Don’t get me wrong; the “what” analysis is a much required foundation for further deeper analytics. But as a business analytics professional, if you provide insights to your key stakeholders (executives, operations managers, etc.) from only the “what” analysis, you are shooting yourself in the foot!

So, what would complete the analytics story?

A primary question that has to be addressed by the analytics for creating greater business value is: “How much impact is there on my key business outcomes?”. The business outcomes for the impact analysis can be financial metrics such as revenue and costs, and qualitative yet valuable metrics such as the customer experience and brand value.

For example, if your job is in product analytics, it is not enough to measure the “what has been happening” or “what happened” trends of product awareness, trial rate, conversion, purchase rate (first or repeat), etc. It is also not sufficient to perform quantitative analytics on various stages of consider-evaluate-buy-experience-advocate-bond-buy(repeat) consumer journey. It is a must to mine deeper insights by measuring “how much impact” those trends have on the business itself. If your job is in web analytics or any other domain monitoring and analyzing various KPIs, it is not enough to measure “what is happening (in real-time)” or “what has been happening” or “what happened”. The value of such “what analysis” is akin to the skin deep beauty.

The same applies even if your job is to develop analytics tools/platforms that help users perform the above variety of analytics. If the product (tool or platform) you are building enables the users to conduct only the “”what” analytics, you are doing a huge disservice to your customers (assuming the customers are still buying that stuff!). And, if you are the customer exploring the analytics tool to buy, you have the right to demand more than the “what analysis” capabilities from your potential vendor. Remember, the right to remain silent does not apply here!

Now, if it is the degree of impact on the business outcomes that is of higher importance, how can one connect those business outcomes to the metrics being actually measured and analyzed? That’s where the business domain knowledge comes into play. For example, see the DuPont model for the Retail industry. The DuPont model establishes a relationship among various value drivers with the financial levers which in turn are connected to the business outcomes. On the other end, the measurable KPIs (for example: page visits, visitors, sessions, duration, bounce rate, etc. in web analytics) can be (and dare I also say “must be”!) quantitatively connected to the value drivers. That can be done through various statistical and/or machine learning techniques applied on the historical data.

Such “How much?” analytics with established relationships to assess the degree of impact being made on the business outcomes, are the ones that truly generate the business value.

In fact, the process of establishing the relationships among various metrics and outcomes through “How much?” analytics will become a valuable foundation for “Advanced What Analytics” such as predictive (“What might happen?”) and prescriptive (“What should happen for optimal outcome?”) for more actionable insights.

That is a nice circle of “What – How Much – What” analytics to complete.

Your thoughts please!

Acknowledgment: DuPont model for Retail developed with inputs from my ex-colleagues (Bernadette, Kim, Robyn, and Victoria), the true industry experts! Any errors are mine.

Biggest Weak Link in Personalization? Missing Customer-Channel Bonding

We all as consumers “channel surf” at will. We quickly sort through best available promotions via one channel, infer popularity and consumability of a product from crowd reviews/ratings via one or more other channels, measure the price and cost of ownership on another channel, may determine the availability via another channel, and then may eventually buy the product on a different channel.

Depending on who the individual is, which age demographic he/she belongs to, and many other factors, the channels leveraged for the above purposes can be any or all of physical store, web, mobile, social media, customer phone service (or call center), catalog, etc. While the services of high-touch channels among these are typically leveraged during the awareness, consideration, and preference evaluation stagescarbon-bond of the purchasing process, the consumer might buy the product at another cheaper channel.

Enterprises traditionally designed channels based on market segmentation studies, whether it was about delivering the right products to the right segments, or creating awareness of products and influencing the appropriate segment, or for any other purpose among the various stages of the typical purchasing process (awareness, consideration, preference, purchase, and post-sale service). A key underlying assumption for such an approach was that consumers with similar demographic characteristics tend to shop and buy in the same way, through the same few channels (for a deeper analysis on the subject, read the beautiful article: The Customer Has Escaped by Paul F. Nunes and Frank V. Cespedes, Harvard Business Review, November 2003).

However, as companies started to understand more about how consumers shop and not just buy (the “channel surfing” behavior) and realize the resultant losses (or avoidable costs) from stranded assets that support the underutilized channels, they have been striving to design channels that cater more to the consumer behavior rather than trying to hold the consumers captive. Instead of forcing consumers into predetermined channels, the strategy is to let the consumer navigate across channels seamlessly as it suits him/her, and in fact sometimes go an extra mile by encouraging/directing the consumer to alternate channels for greater user empowerment. For example, companies have become more open to directing the consumers to a third party source to compare their product with that of the competitors (though this is not a new concept; it has been shown beautifully and funnily in the 1947 movie Miracle on 34th Street)!

In this context, I have been questioning myself on whether we, the Data Science and Analytics professionals, have been providing an effective support to the organizations in the execution of this strategy.

For instance, Personalization is one of the key initiatives at most companies in leveraging the data for better consumer engagement. No doubt, so much advancement has been achieved in personalization, be it in product/service/promotion recommendations for creating awareness or be it in a customized configuration for purchase or be it in the delivery of a post-purchase service. However, are we leveraging the customer-channel bonding in the most appropriate way to deliver best value to the customer such that the conversion can also generate higher value to the organization as well?

If we look at the most common approach for recommender systems for personalization, it involves a user-item matrix (an item can be a product, service, movie, etc.), where each element in the matrix can be an affinity score between user i and item j. The affinity score element can simply be 1 or empty (watched/not-watched, purchased/not-purchased, liked/not-liked, etc.) or it can be a numeric score (or empty) computed as a weighted average of multiple actions (liked, watched/browsed, disliked, shared on social media, added-to-cart, purchased, reviewed, recommended to others, etc.). This score is usually computed from the information aggregated from various channels, as these multiple actions do not necessarily happen on the same channel. Such an aggregated scoring is bound to lose valuable information on how the consumer interacted with the product on a specific individual channel; this “lost in translation” results in missed opportunities for enterprises to understand better the customer-channel bonding, and leads to recommending to a user the same information and services/products/promotions on all or multiple channels.


In other words, with the two dimensional approach for user-item personalization, user-channel bonding is ignored (or rather not extracted properly) at the cost of a deeper three-dimensional user-item-channel personalization.

How do we address this gap? One quick win (not necessarily the most effective solution) can be through an approach to construct a three-dimensional matrix with user-item-channel affinity scores. The user-item matrix itself is largely a sparse matrix (even after aggregation of information from across channels), and this three-dimensional matrix will be even more sparse to maintain the user-item affinity separate for each channel. And so, yes, it is a challenge to expect meaningful user-item insights from a sparser matrix, but then that should not deter the analysts. One approach to overcome this is to include more information for each channel, the information that would have been ignored during aggregation; for example, how much time has the customer spent browsing the product/service on this channel? What time of day does the user interact with the channel (user may use mobile and web at different times)? Did the customer directly search for this product on this channel or was led to it in hops? There is no limit to the level of information that can be gleaned from the user-channel interaction.

Armed with the three-dimensional matrix, we can compute the recommended items/information for a given user separately for each channel (popular Alternate Least Squares Matrix Factorization algorithm can be handy for the purpose); that is only the first step. Such a compiled information of user-item affinity can be cross-referenced among multiple channels and a set of rules can guide the recommender for the best action to take in the context.

For coding enthusiasts: The ALS Matrix Factorization algorithm is part of the Spark MLlib machine learning libraries; the advantage of using Spark for this problem (in contrast to the Apache Mahout or R or other tools) is in the fact that Spark operations can be performed on RDD (Resilient Distributed Dataset, an in-memory structure extracted from large volumes of HDFS stored data), and an RDD can be a wrapped structure with several objects within. While a 2D data frame of a matrix as consumed by R or other tools can be a single object, Data Scientists can explore if the RDD can simultaneously hold multiple data frames. If such a possibility is established, the RDD can hold multiple user-item matrices and thus the 3-D user-item-channel matrix can be operated upon by the Spark MLlib algorithm for optimal outcomes.

(A detailed version of this article appeared as a multi-part blog series at part-1, part-2, part-3, part-4).

Happy recovery from the holidays! I am sure you were one of those numerous gleeful shoppers in the holiday season shopping via web, mobile, physical, and other channels, while also wishing that retailers could do more to enhance your shopping experience!

No doubt that retailers continuously strive for innovative solutions that can enrich customer engagement; particularly the engagement that is seamless across channels, engagement with advance knowledge (predictive) about consumer preferences, and engagement to deliver an offer to the customer at the right moment (real-time). Now, how about a solution that can achieve all the three dimensions (i.e., cross-channel, real-time, and personalized) of the engagement simultaneously?

For example, consider the situation when a consumer is in vicinity of a physical store, and then is at store entrance, and then is inside the store shopping through aisles. There have been various advancements in technologies for retailers to deploy to detect customers in the vicinity of a store location and then track their movements within the store. Some of these technologies include Wi-Fi or Bluetooth based sensors for Geo-positioning, embedding/integrating RFID chips inside customer loyalty cards, and so on. Similarly, tracking and recording the activity of online customers across various webpages has been in vogue via Clickstream technologies (a detailed discussion of tracking technologies is beyond the scope here; focus is around the technologies that help in real-time monitoring of each individual customer to realize an instantaneous engagement, as well as in discovering the pattern of customer activity in a zone; we start with the assumption that regardless of the sensor technology deployed, data from such sources will be available for ingestion to generate a continuous streaming event data).

Given this, how can retailers combine the insights from real-time analysis of dynamic tracking information with the predicted preferences (based on historical data about customers viewed/liked/added-to-cart/purchased) in order to recommend the right promotions/products in a very short time window before the customer walks out of the store? Or how can retailers influence the consumer in the vicinity to come into the store, through a personalized message, and thus improve the conversion in real-time?

For retailers to be able to engage the consumers at the “right moment” armed with “insights in real-time to the seconds and/or minutes” and supported with “accurately predicted” individual consumer preferences is no longer a pipe-dream. This article discusses the relevant technologies that not only have to solve individual disparate problems, but also have to come together and act in unison in order to translate this complex business case into a reality.

  1. The dynamic tracking data is streamed continuously into a streaming analytics engine which continuously analyzes the location data to obtain the insights about the individual consumer activity in real-time (how this is achieved via the “event processing network – EPN” and data ingestion layer, is discussed below);
  2. The data science has to deliver individual consumer’s predicted (personalized) purchase preferences as accurately as possible for greater conversion; while the recommender systems are not new, more significant is the fact that the data science, while predicting the preferences, have to take into account the cross-channel correlated data from web, mobile, store POS, and other channels;
  3. Most importantly, the comprehensive solution has to ensure that these disparate pieces of technologies (i.e., streaming analytics and predictive analytics) are able to communicate and exchange relevant information, possibly via a Business Process Management System (BPMS), to deliver the right recommendation at the right time via the right channel.

Dynamic Tracking and Streaming Analytics: In order to build a model for real-time monitoring of activity, one approach is to first create an Activity Flow Diagram for all of the possible scenarios in which the activity can be defined when a customer is in a particular zone. For example, say a zone is divided into five sub-zones: Z (in vicinity outside the store), Z (in the entrance/checkout area), Z (clothing and apparel aisles), Z (electronics aisles), and Z (food and grocery aisles). See Figure.One can construct the Activity Flow Diagram across these various states (Z0, Z1, Z2, Z3, Z4) by leveraging the Activity Tracker application within the Streaming Analytics engine; the model will enable the system to monitor the customer movement from zone to zone in real-time. The Activity Tracker application can also be leveraged to monitor in real-time various KPIs, such as the duration spent by the customer in each zone. The Duration KPI for each zone can be assigned some threshold (e.g., poor, fair, good) to trigger operational actions, such as texting promotional messages; for example, the smaller the duration consumer spends in Zone 0 and Zone 1 the better, and the larger the duration consumer spends in Zone 2 or Zone 3 or Zone 4 the better. The other KPIs can be the total number of customers in each zone in a given time window, the number of customers that move from Za to Zb as against Za to Zc, and so on.

For the Streaming Analytics engine to monitor the customer flow and provide the necessary insights, a source connector or a stream processing component can first ingest the data coming from various sources (e.g., sensors); the output of the ingestion can become the source stream to the Streaming Analytics engine. Using an event-driven architecture (EDA), the stream processing engine tackles large volumes of raw events in real-time to uncover valuable insights. It does this by correlating events from diverse data sources and by aggregating low-level events into business-level events so as to detect meaningful patterns and trends.

As the real-time monitoring of a customer in a zone is set in place, the second piece of the puzzle is the predictive analytics for recommendations/messages, personalized at each individual customer level to be delivered when the customer is in the zone.

Predictive Analytics for recommended promotions/products: Some of the popular approaches for product/item recommendations have been based on collaborative filtering techniques. The fundamental assumption in these techniques, as it is suitably referred to as a “collaborative” technique, is that an individual consumer tends to view/like/purchase the same items that other consumers with similar patterns of views/likes/purchases have also done. Various algorithms/models in the realm of collaborative filtering techniques vary in their level of efficiency in extracting the “product/item similarity” between two consumers. For example, the Matrix Factorization method, particularly the Alternate Least Squares (ALS) method, has been a preferred choice among collaborative filtering techniques. For more detailed discussion on this method, and how cross-channel information of individual consumers can be captured via this method, see here and here.

Regardless of the approach, the recommender system will produce for each individual user an ordered set of recommended items based on the items not yet “touched by the user,” but have been “touched” by other similar users. It is important to note that the collaborative filtering techniques work well for user-item affinity, where the items (consumer products, movies, etc.) have sufficient longevity to be “touched by” a considerable number of users. But, what if the items are promotions or coupons or other similar things with a short life-cycle to map relevant promotions to users? One approach is to leverage the triangulation of “user <=> products” and “promotions <=> products” and thus deriving the “user <=> promotions” affinity.

With the ordered set of predicted products/promotions available ready and the EPN tool deployed to monitor in real-time the activity of an individual consumer in the zone, how do we combine these two to deliver the right message at the right time via the right medium? We will discuss this in the following section.

Combining Streaming Analytics with Predictive Recommendations: The data ingestion layer in a Streaming Analytics platform can ingest and fuse data from multiple sources. For this solution to be successfully deployed, as the EPN tool tracks the consumer in a store, the business process management system (BPMS) within the Operational Intelligence (OI) platform performs a few steps to achieve the lock-step (see Figure for functional architecture for the solution):

  • Consume the output from the EPN tool with information such as the presence of the consumer in a specific zone, the duration of a consumer in a specific zone, etc.
  • Parse the data with the ordered set of predicted recommendations for the particular consumer
  • Formulate appropriate messages, in context, through various built-in decision rules

For example, when the EPN tool determines that a particular consumer is in Z (clothing and apparel), the BPMS tool can extract the relevant recommendation (product or promotion) for the items relevant to that zone.

While the dimensions of “real-time” and “predictive” are combined through this process, the third dimension of “channel” can also be achieved at the data usage level. That is, by leveraging the insights about a particular consumer’s affinity to items, gained from online activity information, the right message is delivered to the consumer when in a specific zone inside a physical store. The inverse can also be achieved with the mix of these technologies. That is, with the help of the insights gained from the activity in various zones in a physical store, a consumer can be engaged with a personalized message via mobile and online channels.

SUMMARY: Overall, the article discussed how a mix of technologies perform individual tasks of a complex business use case and also act in lock-step in an OI platform to deliver a unified message for an enriched real-time cross-channel customer engagement:

  • Ingest high velocity data from various sensor sources for real-time analysis
  • Glean insights from the real-time monitoring of a consumer in a zone
  • Produce personalized recommendations for consumers based on historically aggregated data from both in-store and online/mobile activity
  • Combine insights from real-time activity with predicted recommendations for more relevant messages, in context
  • Deliver messages to the consumer instantaneously for greater conversion
  • Trigger enhanced engagement (e.g., sending a store representative for interactive engagement) that can dynamically adapt based on the situation at-hand.

I attended the Kellogg Alumni event, on Tuesday evening, at the Facebook campus here in Bay Area. While I got the opportunity to meet some old friends and make some new acquaintances, I was thrilled to watch an expert panel discuss an exciting subject entitled, “Big Data does not Make Decisions – Leaders Do”; thrilled simply because the entire discussion was about what I live everyday in my work, and what I strive to achieve with my team and my organization.

The panel composed of highly accomplished Kellogg alumni (Facebook’s CMO, Marketo’s CMO, and a Senior Managing Partner at a leading venture firm) plus two of the world renowned Kellogg faculty. They all agree, and in fact strongly promote, that the managers as well as executives (CIOs, CTOs, and even CEOs) have to have some “working knowledge of data science” to be able to ask the right questions and steer the data science and data analytics teams in the right direction so as to solve most relevant business problems; that the right culture and the mindset have to be developed that Big Data (however you define it) by itself or the software and/or analytics tools by themselves are not the end solutions, but that it requires the leaders to recognize the importance of domain knowledge combined with the ability to to connect the market problems for a given vertical with the right set of data.

Another key agreement, closest to what I live and breathe everyday, is that “Product Management” is a critical role in bridging the data science world with the market/business requirements; that a good product manager is most essential to connect the business problems with the data science findings. It is observed that, even while there is a shortage of good data scientists, there is even more acute shortage of good product managers who understand the business domain as well as the data science well enough to make a successful connection.

As I often say: “Data Science is about Discovery, Product Management is about Innovation“, the primary focus of a product manager should be the market-centric innovation. Data Scientists (and Engineers) are often driven by precision (and perfection), while good product managers will know the level of sufficient accuracy to take the product to the market at the right time. The three things I recommend that you do to be a successful product manager in the data analytics space are:

  1. Take time to learn the science: if you are looking for some quick learning, many resources are out there (webinars, MOOC classes, etc.) to teach you the subject; even paid training classes (2-day or 3-day) such as those from Cloudera are worth the time and money. If you can and are looking for longer term learning, many schools have undergraduate and graduate level programs in data science and analytics. A product manager with a strong grasp on the algorithms, models, techniques, and tools in data science will not only enjoy the support of the data science teams he/she works with, but also will be able to steer the product roadmap with application of the right models/techniques combined with right data set to the right business problems and market requirements.
  2. Learn by spreading the knowledge: it is given that the product manager have to play a highly proactive role in building the product with the help of data scientists, data engineers, data architects, and other stakeholders on all sides and also in taking it to the market. But, go beyond that expectation and play a highly interactive role in educating the stakeholders and the managers about the data science behind the analytics. Invite managers, executives, engineers, and others for a 30-minute seminar once in a while on a new data science topic; that in itself will be an enormous learning experience for you; after you present the data science model and technique, facilitate the audience to have a discussion on the model’s possible application to various business use cases. As part of the lunch seminar series I have been doing at my company, I recently presented on the topic of “Hidden Markov Models” about which I did not have much prior knowledge; as I took upon the task, I did some reading, found the necessary R packages and relevant data sets to build the model and do the predictions for a real world use case; though the preparation took many hours over two weeks, when I presented I did not have all the answers for the questions from the intelligent people in the room (our CTO, solution architects, engineers, and other product managers) about nuances of the algorithm and the model; but that’s fine, because the key outcome is that I learned something new and the dialogue made me richer as to how this model can be leveraged for a particular use case!
  3. Engage the cross-functional teams to stay ahead in innovation: the product marketing (for B2B) or consumer marketing (for B2C) and sales people may not attend your seminars for either lack of interest and/or background or they might often be on the field; but they are also key stakeholders in your ecosystem. Marketing and sales people are eyes and ears of the company as they provide the ground-level intelligence which are critical for formulating the company’s strategies and tactics; engage them to tap into their field knowledge about what customers are really looking for; often customers and clients understand only tip of the data science and what they really need may be different from what they ask for; so, have the empathy and patience to help the stakeholders understand the difference between descriptive analytics (BI tools and other) and more advanced analytics (prescriptive analytics and predictive analytics with data science); make efforts to collaborate with the marketing teams to generate content, not too technical nor too superficial, to appeal and to be understood by a wider audience. All these efforts will help you garner deeper market knowledge to stay ahead, help you avoid drowning in the day-to-day necessary agile/scrum activities for near-term product development, and actually help build your mid-term and long-term product roadmap.

What are your thoughts and recommendations as a successful product manager in this space? Share with us….

Not to be confused with Real-Time Analytics. It is different.

As a product manager in the domain of predictive analytics, I own the responsibility to build predictive analytics capabilities for consumer facing and/or enterprise platforms; the business applications vary among item recommendations for consumers, prediction of event outcomes based on classification models, demand forecasting for supply optimization, and so on. We usually see the applications where the predictive model built using machine learning technique(s) is leveraged to score the new set of data, and that new set of data is most often fed to the model on-demand as a batch.

However, the more exciting aspect of my recent work has been in the realm of real-time predictive analytics, where each single observation (raw data point) has to be used to compute the predicted outcome; note that this is a continuous process as the stream of new observations continuously arrive and the business decisions based on the predicted outcomes have to be made in real-time. A classic use case for such a scenario is the credit card fraud detection: when a credit card swipe occurs, all the data relevant to the nature of the transaction is fed to a pre-built predictive model in order to classify if the transaction is fraudulent, and if so deny it; all this has to happen in a split second at scale (millions of transactions each second) in real-time. Another exciting use case is the preventive maintenance in Internet of Things (IoT), where continuous streaming data from thousands/millions of smart devices have to be leveraged to predict any possible failure in advance to prevent/reduce downtime.

Let me address some of the common questions that I often receive in the context of real-time predictive analytics.

What exactly is real-time predictive analytics – does that mean we can build the predictive model in real-time? A data scientist requires an aggregated mass of data which forms the historical basis over which the predictive model can be built. The model building exercise is a deep subject by itself and we can have a separate discussion about that; however, the main point to note is that model building for better predictive performance involves rigorous experimentation, requires sufficient historical data, and is a time consuming process. So, a predictive model cannot be built in “real-time” in its true sense.

Can the predictive model be updated in real-time? Again, model building is an iterative process with rigorous experimentation. So, if the premise is to update the model on each new observation arriving in real-time, it is not practical to do so from multiple perspectives. One, the retraining of the model involves feeding the base data set including the new observation data point (choosing either to drop older data points in order to keep the data set size the same or not drop and keep growing the data set size) and so requires rebuilding of the model. There is no practical way of “incrementally updating the model” with each new observation; unless, the model is a simple rule based; for example: predict as “fail” if the observation falls outside the two standard deviations from the sample mean; in such a simple model, it is possible to recompute and update the mean and standard deviation values of the sample data by including the new observation even while the outcome for the current observation is being predicted. But for our discussion on predictive analytics here, we are considering more complex machine learning or statistical techniques.

Second, even if technologies make it possible to feed large volume of data including the new observation each time to rebuild the model in a split second, there is no tangible benefit in doing so. The model does not much with just one more data point. Drawing an analogy, if one wants to measure by how much the weight has reduced from an intensive running program, it is common sense that the needle does not move much if measured after every mile run. One has to accumulate a considerable number of miles before experiencing any tangible change in the weight! Same is true in Data Science. Rebuild the model only after aggregating a considerable volume of data to experience a tangible difference in the model.

(Even the recent developments, such as Cloudera Oryx, that are making efforts to move forward from Apache Mahout and similar tools (limited to only batch processing for both model building and prediction) are focused on real-time prediction and yet rightly so on batch-based model building. For example, Oryx has a computational layer and a serving layer, where the former performs a model building/update periodically on an aggregated data at a batch level in the back-end, and the latter serves queries to the model in real-time via an HTTP REST API)

Then, what is real-time predictive analytics? It is when a predictive model (built/fitted on a set of aggregated data) is deployed to perform run-time prediction on a continuous stream of event data to enable decision making in real-time. In order to achieve this, there are two aspects involved. One, the predictive model built by a Data Scientist via a stand-alone tool (R, SAS, SPSS, etc.) has to be exported in a consumable format (PMML is a preferred method across machine learning environments these days; we have done this and also via other formats). Second, a streaming operational analytics platform has to consume the model (PMML or other format) and translate it into the necessary predictive function (via open-source jPMML or Cascading Pattern or Zementis’ commercial licensed UPPI or other interfaces), and also feed the processed streaming event data (via a stream processing component in CEP or similar) to compute the predicted outcome.

This deployment of a complex predictive model, from its parent machine learning environment to an operational analytics environment, is one possible route in order to successfully achieve a continuous run-time prediction on streaming event data in real-time.

Six out of the eight games in the elimination round of 16 in the on-going FIFA Soccer World Cup were decided by a margin of one goal or less. Of the six, two were decided with penalty goals after the extra time plus the 30-minute over time, and three were decided after the extra time (these include the heartbreaking loss in the USA-Belgium game!). In the quarter-finals, all the four games were decided either by a margin of one goal or by penalty kicks after the over time. That means, all these games in the elimination rounds were fought hard until the last second and certainly were nail-biting finishes.

As the teams advance to the higher stages, the quality of teams increases and so does the quality of the contests. As demonstrated in the elimination rounds, more than in the group stage, the teams have to bring much more than the skill and the expertise to win the games. They have to have the stomach and stamina to fight until last, constant focus to not let the guard down even for a second, perseverance to keep attacking, and hold the nerves until the final whistle is blown.

This is akin to what we experience in our career as we advance to the higher levels. What certainly helps us outperform our peers and grow in the initial stages of the career is our IQ (here for argument sake I am including one’s breadth and depth of the knowledge, subject matter expertise, and the intellectual capabilities, all into the IQ). However, once we are at the higher stages where similarly intellectually capable individuals have also arrived at, what helps us hold edge over others is the emotional intelligence (EI, or popularly referred to as EQ).

As proposed by Daniel Goleman in his book “Leadership: The Power of Emotional Intelligence”, and also widely researched about its influence on leadership abilities, EQ is what matters more in higher stages of the career. There are no substitutes to the abilities to hold the nerve under pressure, stand steady against odds, persevere in spite of failures, and stay focused until the completion for a greater success.

Location can often play a significant role in a consumer’s lifestyle and her purchasing decisions; so, it forms a vital input to how better we can recommend products or services to her. Perhaps, not so much when it comes to recommending movies or mass consumer products. But, for those companies that sell home furnishings, custom-design goods, design merchandise for homes, goods for outdoors and related activities, location of the consumer matters. For instance, homes in New York, San Francisco Bay Area, Seattle, and Texas differ in their style and space; these factors and also the weather certainly influence the moods and tastes for furnishing the homes. Even in a country like India with diverse cultures and lifestyles, the everyday dresses and jewelry that women wear vastly differ from state to state and location to location. At the eCommerce company that I have been part of, our analyses showed that a geographical location determined not only the number of users, number of orders, and amount revenue, but also the type of merchandize bought (color, style, size, etc.). In this context, collaborative filtering based recommending of products or services that are also considered (viewed/liked/bought) by other users in a nearby location has an upside.

So, how can we build recommender systems like “people similar to you and in your vicinity also viewed these” or “people similar to you and in your vicinity also liked these“, that take spatial similarity (location proximity) between two users into account?

The location of a user can be obtained from a GeoIP technologies, where the user’s IP address is used to obtain the location with a precision that can depend on the SLA with a service provider (external link). That is, the user’s precise latitude-longitude (lat,long) coordinates can be used to build a greater accuracy into the recommender system, or can choose to use the lat-long of the center of user’s city/zipcode which is an approximation of the user’s actual location.

To keep the discussion on point, we assume that you already have the user-item preference matrix built for existing recommender systems (for starters, the user-item matrix can be written with all observation data with each observation having a tuple <user_id, item_id, preference_score>; the data file format can depend on what tool you use to build recommender systems: for example, Apache Mahout consumes a csv file). The matrix is of size n x m, with n number of users and m number of items. The preference score can be built either from user’s implicit actions (view, like, add-to-cart, etc.) or explicit feedback (ratings, reviews, etc.), where most often the former matrix is less sparse than the latter.

Given the user-item matrix and each user’s location (lat,long), here we discuss two possible approaches to build the recommendation model with spatial similarity. The approaches are discussed here using a matrix factorization method; Alternate Least Squares (ALS) method is a preferred choice for matrix factorization (external link) among the latent factor model based collaborative filtering techniques, especially for the less sparse implicit data matrix.

1. Using the ALS matrix factorization method, the n x m user-item matrix U can first be factorized into two vectors, a n x k user vector P and a m x k item vector Q, where k is the number of dimensions, such that U = P * Transpose(Q). The user vector P has n rows, with each row defining a user with k attributes/dimensions. In the conventional method, for a given user the most similar user among remaining n-1 users is the one with nearest values of the k attributes (for simplicity, it can be the Euclidean distance between the two sets of k values). Now, we can factor the k-value distance with the geographical distance between the two users (again, that can be the Euclidean distance using the lat-long coordinates of the two users). The larger the geographical distance between the two users the more the k-value distance is inflated pushing farther in similarity. A smaller geographical distance between the two users will have the opposite effect. Thus, the weighted value ingests the spatial similarity between two users.

The above method can be challenging for one main reason. The same scalability reason that user-user collaborative filtering models are less preferred compared to the item-item collaborative filtering techniques (in addition to the fact that user properties are less static than item properties). That is, for most organizations the number of users will be in the order of millions while the number of items will be in the orders of tens or hundreds of thousands, and so user-user computations are less scalable than the item-item computations.

2. To overcome the above computational scalability problem, another approach is to create clusters of users based on the lat-long data. Once the user base is divided into C clusters (using k-means or other preferred techniques), one can create a user-item matrix separately for each cluster. Then for each matrix, perform the matrix factorization to derive the user vector and the item vector. The user vector can now be used to find a most similar user for any given user. This approach alleviates the scalability problem, as the user-user computations are performed on the matrices of reduced size. Of course, the size of the matrices can depend on the number of clusters chosen and the actual geographical distribution of the user base.

The former approach first derives the k attributes and then applies geographical distance as a weighted factor. On the other hand, the latter approach first applies the spatial proximity by clustering all the users that are geographically close to each other, and then derives the k attributes for each group separately.

As in any problem in data science, the building of a good model for recommender system with spatial similarity will depend on a lot of experimentation, rigorous off-line testing, tuning the model, conducting suitable A/B testing, and then repeating for continuous improvement. The above approaches are some of possible methodologies only to build the model; the final model will depend on the actual data and also the experimentation that involves the tuning of various parameters including the number of dimensions k in both approaches, and also number of clusters C in the second approach. One has to define the metrics such as precision, recall, RMSE, to measure the performance of the model and have the patience to keep experimenting.


Get every new post delivered to your Inbox.