By Deepak Sasikumar

At GO-JEK, ‘Data-driven decisions’ are critical and ‘Democratization’ of data is crucial. The concept of DAGGERS — our real-time data aggregation platform, power by Apache Flink, rose from these principles in early 2017.

The Data Engineering (DE) suite is comprised of 22 products & tools catering to our users, that is Engineers, Analysts & Product Managers at GO-JEK. In this article, our users are not end consumers, instead, it will be our internal employees who rely on the DE team.

Daggers, in keeping with our core principles, is completely DIY in nature; any DE user can create a Dagger by selecting few drop-downs & writing a SQL-query. Handing the power back has helped spark the imagination of our users. This has unlocked various problems at GO-JEK, ranging from concise supervision dashboards & demand-based pricing changes for the Driver Management team, to preventing fraudulent actions. It has also helped in health monitoring systems for engineering teams and event-based triggers based on user intent for growth initiatives.

Our Data Architecture foundation is around multiple Apache Kafka clusters segregated on the type of data. We provide self-serve tools to different teams to publish data (Fronting) and consume data (Firehose, Daggers, Bedrock). In the case of Daggers, we have multiple clusters based on the criticality and use cases. These clusters are Apache Flink hosted over Kubernetes since we need to scale the capacity of clusters frequently.

Daggers provide the capability to read messages from Kafka topics and aggregate them based on the business logic defined by the SQL query written by the user. The resulting aggregates can be piped to:

  • Databases (InfluxDB, PostgreSQL) for monitoring
  • Kafka for applications to consume
  • Elasticsearch for fast lookup through in-house segment store

Elaborating few among the various use-cases of the Daggers platform:

Allocation metrics

The Driver management team is responsible for around 25 key metrics (number of bookings, conversation rate, availability etc. ), for each service provider (bike, car, food delivery, logistics) for their respective city. They are empowered with changing multiple levers including incentivising drivers and reaching out to drivers through text messages and app-notifications.

Daggers enable them to create jobs that compute these metrics from Kafka topics (booking, bids, driver heart-beats etc.) every minute, push that to InfluxDb and visualise on Grafana dashboards. One can compare the respective metrics with the previous day (D-1) and previous week (D-7). This data gives unparalleled insights at a micro-level which helps us react to the market changes and keep our drivers, customers and the org happy.

Dynamic surge pricing

During peak hours & bad weather, there’s an obvious supply-demand mismatch. To improve the experience for riders and provide a better income, we introduced ‘Dynamic surge pricing’, which was granular & real-time. (More on how we went about this later) For this, we needed to compute the demand & supply in real-time, every minute for 44k S2ID Level 13 blocks; this, in turn, powered Surbo, the Machine Learning model which computes the Surge Factor.

These Daggers crunch roughly 1 million data points a minute during peak hours and about 600 million every day.

Fraud control

Handling fraudulent activity is of utmost importance. Historic patterns suggest that preventive actions need to be taken quickly once we get the signals like fake bookings (farmed during riots), device theft, social engineering etc.

Dagger helps Fraud prevention analysts to quickly create an aggregation job, push data to the database and monitor patterns on visualisation tools (Grafana or in-house tool Atlas). They validate patterns and are able to take prevention actions manually. After fine-tuning queries, the aggregated data is published back to Kafka. This enables applications to consume data and take the required actions. Thus, we are able to automate various fraud prevention scenarios eliminating grunt work.

Growth

For the most efficient growth campaign, you need the right segmentation. The right segmentation is always a segment of one. Using customer signals and intents, we can make real-time triggers to effectively target customers — ensuring high conversation and less burn compared to conventional segmentation. Flash pipeline created by the Promotions team looks at customer actions, pages visited, search keywords and location on our food delivery service. Based on these data points, we are able to effectively identify the intention of a user and offer vouchers & discounts to the customer’s device, in a matter of seconds.

System uptime

GO-JEK is supported by around 300+ micro-services — each one being up is critical for the system. Having a consolidated API monitoring dashboard and the related alerting mechanism is critical for the scale. To enable this, we have topics in Kafka which is generated from Kong (our API gateway for public APIs). This datastream handles close to 5 billion messages daily with a maximum rpm of 2.5M. Daggers consume this data, and group the requests on http request path and http status code. Aggregating this data helps in understanding the health of each API. By integrating Kapacitor, alerts were set up for each API so issues are detected at the earliest possible instance.

In all of the above cases, sets of dashboard with graphs, drop-down filters & alerts were setup by users without requiring help from engineering counterparts. This, in fact, was IKEA effect at its best; DE users came up with such solutions to some of the existing problems we had not even imagined. DIY → Do It Yourself & Data Is Yours

Needless to say, we have challenges, and good solid analytical ones that require thinking, ideating and building. We’re expanding and you want to grab this chance to work with some of the brightest minds out there! We’re hiring! Join us. Check out gojek.jobs for more.

gojek.jobs