How we deal with Data Quality using Circuit Breakers

Sandeep Uttamchandani
7 min readOct 8, 2018

Imagine a business metric showing a sudden spike — is the spike real or is it a data quality problem? Analysts and Data Engineers today will spend hours, days, and even weeks analyzing whether a given metric is correct! In other words, Time-to-Reliable-Insights today are unbounded and are a widespread pain-point across the industry. At Intuit, we are working on addressing the data quality problem at scale and presented our platform (called QuickData SuperGlue) at the Strata Conference in New York, 2018.

Analogous to using the circuit breakers pattern in micro-services architecture, we are designing circuit-breakers for data pipelines. In the presence of data quality issues, the circuit opens preventing low-quality data from propagating to downstream processes. The result is that data will be missing in the reports for time-periods of low quality, but if present, it is guaranteed to be correct. This proactive approach makes Time-to-Reliable-Insights bounded to mins by automating data availability to be directly proportional to data quality. This approach also eliminates the unsustainable fire-fighting required for verifying-&-fixing metrics/reports on a case-by-case basis. The rest of the blog describes details for implementing and deploying circuit breakers and divided into three sections:

  • Data Pipelines Ground realities
  • Circuit Breaker Pattern for Data Pipelines
  • Implementing Circuit Breakers in Production

--

--

Sandeep Uttamchandani
Sandeep Uttamchandani

Written by Sandeep Uttamchandani

Sharing 20+ years of real-world exec experience leading Data, Analytics, AI & SW Products. O’Reilly book author. Founder AIForEveryone.org. #Mentor #Advise