Table of Contents
When Ford released the Edsel, its brand-new, mid-range model, in 1957, the company was confident it would blow the competition out of the water. After all, Ford had spent ten years and $250 million developing this car. They’d conducted extensive, exhaustive market research. The data they’d painstakingly accumulated suggested the Edsel would easily outperform sales of Chrysler’s popular Dodge and General Motors’ Pontiac. Despite the massive undertaking, though, the launch was a disaster. Sales were dismal and within two years, Ford had to discontinue the model.
So what went wrong? It wasn’t that Ford’s findings were inaccurate. The data they had was accurate — it was just outdated. The Edsel was exactly the car that people wanted in 1952, but by 1957, tastes had changed. Ford had missed their moment.
Today, there’s no excuse for a business to make a mistake like this. Not only do you have greater access to more valuable data than ever before, but you also have the automation tools at your disposal to derive insights and build models from this data at high speed.
The key is to get the right data pipeline tools, infrastructure, and strategy in place to feed your machine learning models with quality data, all the way from design to development to deployment. Then you can ensure your models are always based on the most relevant, up-to-date information.
Here are four steps to take to get your model ready for deployment.
Step 1: Get your data pipeline ready
Long before you reach the predictive model deployment stage, you need to make sure that your data pipelines are structured effectively and are giving you high-quality, relevant data.
One of the most important considerations here, too, is what happens when you move from the proof-of-concept (POC) stage, where you may be using a relatively small data sample, to production stage, where you will need far larger volumes of data, drawn from a wide variety of datasets. It’s vital to figure out now how you will scale your models and your data pipelines once deployed.
Step 2: Access the right external data
When you are building a predictive model for production, you need to be sure that you are working with the best possible data, from the most relevant sources, right up until the moment you launch. If it’s already stale, your carefully crafted models won’t be much use.
Part of the challenge is getting hold of enough historical data to get a complete picture. Few organizations can collect all the data they need internally. For full context and perspective, you will probably need to start incorporating external data sources. Types of external datasets could include company data, geospatial data, people data such as internet behavior or spending activity, and time-based data, which includes everything from weather patterns to financial trends.
Using an augmented data discovery platform allows you to connect to thousands of external data sources seamlessly, safe in the knowledge that these have already been vetted for quality and legal issues — and that they are compatible with one another. Some providers will also allow you to set up custom signals, meaning you can make the most of your domain expertise to better interpret the data and tease out the insights you need.
Step 3: Build strong training and testing automation tools
Rigorous training and testing is essential before you can progress to the predictive model deployment stage, but it can be a time-consuming process. To avoid getting slowed down, you need to automate as much as you can.
This doesn’t mean simply working in a few time-saving tools or tricks. The goal is to create models that can ultimately run without any action on your part. With the right technology in place, you can automate everything from data collection and feature engineering to training your models. This will also make your models truly scalable without massively increasing your workload.
Step 4: Design robust auditing, monitoring and retraining protocols
Before you can deploy your predictive model, you need to know that it’s actually delivering the kind of results you’re looking for, that these results are accurate, and that the data you’re feeding into the model will keep these models relevant over time. Relying on the same, tired old data can cause model drift, leading to inaccurate results.
This means you need to build training pipelines and processes that bring in new data, audit your internal data sources, and tell you which features are still giving you valuable insights. You can’t afford to get complacent about this, or your models may be leading your business in unhelpful directions. It’s important to have processes in place for monitoring your results, too, ensuring you’re not just putting more and more of the wrong type of data into the predictive model.
Final thoughts: streamlining the process
As we’ve seen, the journey to predictive model deployment can be fraught with delays and scalability challenges, all of which threaten to make your model less relevant by the time you go to production.
The key is to streamline and automate the process wherever you can, reducing time to deployment – and to ensure that you’re always using the most recent, useful, quality data. Without this, like Ford, you risk missing your moment.