My experiences building Data+AI Products & Platforms

Image Credit: Unsplash

What happens when you put your hand in the fire? There are two ways to find out:

  • Option #2: Learn from others who have already tried putting their hand in the fire

My motivation with these blogs is to help you with Option #2 in the context of real-world Data & AI fires.

A little about myself: I am a Ph.D. in AI/Expert Systems with over two decades of experience building software systems, data, and AI. Over my career, I have built numerous Data+AI products and platforms, have 40+ issued patents, an O’Reilly book author

Are any of these landmines hiding in your real-world ML initiative?

87% of ML projects fail today!

These numbers should be taken with a grain of salt. Irrespective of the actual number, it does reflect reality — I have seen a significant percentage of ML-based projects never get into production!

GIF via giphy

The goal of this blog is to share my experiences on things that can go wrong in an ML project (they added up to 98!). The motivation with this post is for you to potentially avoid these landmines in your role as a data engineer, data scientist, ML engineer, data-business leader driving an ML initiative.

Experiences divided into 6 phases of any ML project. Depending on your role, feel free to read the respective sections in this blog (Image by author)

This is a…

Data Labeling for ML Models

Image credit: Unsplash

Need for Data Labeling Tools

The key to ML is the availability of “right” data. “Right” data is a combination of right features/metrics, right distribution (IID) in the raw data, and the right labeling of the data samples.

The need for labeled data is dependent on the type of ML algorithm i.e., supervised learning requires labeled samples for training models. In 2020, the image/ video segment accounted for over 35% of the global revenue for data collection and revenue. …

How to fail fast on AI projects

Image credit: Unsplash

AI teams invest a lot of rigor in defining new project guidelines. But the same is not true for killing existing projects. In the absence of clear guidelines, teams let infeasible projects drag on for months.

They put up a dog and pony show during project review meetings for fear of becoming the messengers of bad news. By streamlining the process to fail fast on infeasible projects, teams can significantly increase their overall success with AI initiatives.

AI projects are different from traditional software projects. They have a lot more unknowns: availability of right datasets, model training to meet required…

Transforming AI Geniuses into Genius Makers

Image credit: unsplash

Data+AI is rapidly evolving with several rapid advancements in data, ML, AI technologies. In order to succeed, it is critical for teams to keep up with the new technologies as well as leverage experiences w.r.t. delivering faster business value. The best way for teams to learn-&-grow is Peer-2-Peer mentoring.

P2P mentoring is not a new concept. In my past experiences, I have applied it within software teams, data engineering, and data science teams. In this blog, I wanted to share details related to applying P2P mentoring within a product focussed AI team. …

The 7 skill personas of a well-performing AI team

GIF by giphy

Article originally published in VentureBeat

How do you start assembling an AI team? Well, hire unicorns who can understand the business problem, can translate it into the “right” AI building blocks, and can deliver on the implementation and production deployment. Sounds easy! Except that sightings of such unicorns are extremely rare. Even if you find a unicorn, chances are you won’t be able to afford it!

In my experience leading Data+AI products and platforms over the past two decades, a more effective strategy is to focus on recruiting solid performers who cumulatively support seven specific skill personas…

Avoid endless pain in model debugging by focussing on datasets upfront

Let’s start with an obvious fact: ML models can only be as good as the datasets that were used to build them! While there is a lot of emphasis on ML model building and algorithm selection, teams often do not pay enough attention to dataset selection!

Unsplash Image

In my experience, investing time upfront in dataset selection saves endless hours later during model debugging and production rollout.

Nine Deadly Sins of ML Dataset Selection

1. Not handling outliers in datasets properly

Based on the ML model being built, outliers can either be a noise to ignore or important to take into account. Outliers arising from collection errors are the ones that need to be ignored. Machine…

These mistakes are easy to overlook but costly to redeem

two stacks of colored coffee mugs with the names of different cities on them
two stacks of colored coffee mugs with the names of different cities on them
Photo by Frank Vessia on Unsplash

ML model training is the most time-consuming and resource-expensive part of the overall model-building journey. Training by definition is iterative, but somewhere during the iterations, mistakes seep into the mix. In this article, I share the ten deadly sins during ML model training — these are the most common as well as the easiest to overlook.

Ten Deadly Sins of ML Model Training

1. Blindly increasing the number of epochs when the model is not converging

During model training, there are scenarios when the loss-epoch graph keeps bouncing around and does not seem to converge irrespective of the number of epochs. There is no silver bullet as there are multiple root causes to investigate — bad training examples, missing truths…

Understanding under-the-hood details of modern NoSQL systems

What distinguishes a good data engineer from a great one? Having an understanding of distributed systems concepts can help in making the right Big Data technology choices as well as write better data apps and pipelines.

To illustrate, imagine buying a car. You will typically have a few different car models and then compare the price/performance i.e., engine, transmission, braking, etc. …

Blog series on DataOps for effective AI/ML

GIF by giphy

Let’s start with a real-world example from one of my past ML projects: We were building a customer churn model. “We urgently need an additional feature related to sentiment analysis of the customer support calls.” Creating the data pipeline to extract this dataset took about 4 months! Preparing, building, and scaling the Spark MLlib code took about 1.5-2 months! Later we realized that “an additional feature related to the time spent by the customer in accomplishing certain tasks in our app would further improve the model accuracy” — another 5 months gone in the data pipeline! …

Sandeep Uttamchandani

Democratize Data + AI/ML — real-world battle scars to help w/ your journey. Product builder(Engg VP) & Data/ML leader (CDO). O’Reilly Author|

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store