Building a DS Pipeline To Solve a Business Problem

Alternative Text


22 Feb 2021

In today’s world, we have an abundance of data, but time is of the essence while we try to understand how we can correctly leverage the data at hand to support the decision-making process for our business. We, at Ninjacart, employ our efforts in turning data into actionable insights.

In order to extract any value from the data, it needs to be gathered, scrubbed, and explored, with business domain knowledge serving as the backbone. One can use the available data to either predict future outcomes or find unknown trends and patterns in the data. This is where Data Science comes into the picture and a Data Science pipeline includes the steps that need to be followed to obtain valuable results for the end-user. Adding a sequence/structure to the solution not only helps optimize the process but also improve the outcomes at each step of the process.

Model building is often considered at the heart of a Data Science pipeline. While that may be true, the far more important part is posing the right questions to the data. We will understand that throughout the article, where modelling the data forms just 20% of the pipeline. With that in mind, let’s discuss what goes into solving a business problem at Ninjacart, using Data Science.

. . .

The DS Pipeline: An Overview

First and foremost, we need to define the business problem and ascertain the KPI that will improve upon solving it. Next, we delve into hunting and transforming the data, followed by an exploratory analysis so as to find patterns in this data. This step also helps in finding out the most important features or attributes of the data for this particular business problem. After feature selection, we come to the part of creating the model, evaluating and refining it. Once we have our model, it is very important to be able to interpret the findings or communicate them effectively with the concerned stakeholders.

. . .

The Ninjacart Ecosystem

Having defined the above structure, it’s always important to see how it functions in a real-world application. The Ninjacart ecosystem caters to local businesses and Kirana stores with farm-fresh produce and has a customer base that is growing each day. As such, we have access to tonnes of data at every leg of the supply chain. For instance, we have data gathered from historical purchases of customers and can use this data to ask questions like:

  • Who are our most valuable customers?
  • What can we do to retain our customers?
  • What can we do to increase our sales?
  • What is the optimal level of stock we should procure?

To answer these and similar questions, one needs to understand the hidden patterns in the data. For instance, upon early analysis, we understand that the products bought by a customer have some correlation with both his personal historical purchases as well as the purchases by other customers who are similar to him in some way. We can use this information to define a business problem where we are able to recommend the products that he would have more affinity to buy, at the time of his next purchase, thus, improving conversion rates.

Let’s map the above-defined structure of a DS pipeline to this particular problem.

The above pipeline for next basket recommendation to a customer resonates with the key ideas highlighted for solving a business problem using Data Science.

. . .


We, at Ninjacart, believe in instilling intelligence into our business decisions on the basis of data, both on the demand and supply side of the ecosystem. To that extent, we have built several customer-centric solutions focusing on problems like growth, retention hacks, churn, sales multi-level cohort prediction, pricing intelligence, relevance, freshness, serendipity and inbound optimizations.

Written by
Jagrati Gogia
Data Scientist 2
Tech Team — Ninjacart