Table of Contents
By now, we’re all getting used to the new (less than ideal) normal in sports — watching our favorite athletes play their games in empty arenas as the coronavirus pandemic continues. However, it’s not hard to think of a time in the recent past when we were lining up for tickets to our favorite arenas and teams’ games. While this was great, it wasn’t cheap.
The price of sporting event tickets has consistently been on the rise for years. From inflation to the massive cost of the stadiums and arenas where sports teams play, tickets today are increasingly pricing out a wide range of consumers who would love to watch games live. This has led to an unsurprising drop in live attendance since most regular customers don’t have the disposable income to drop $400 on taking their family to a single game before considering paying for parking, snacks, souvenirs, and more.
For teams, this presents both a major problem but also an opportunity. On the one hand, sports organizations run nearly billion-dollar budgets that depend on revenues that largely come from ticket sales and concessions sold at their venues. Losing sales with untenable prices is bad business, plain and simple. There’s a clear need to find a better — more dynamic — way to price tickets, but that involves too many variables to sort out, right? You can’t really find all the different factors and include them in your existing models if you’re doing this by hand. Let’s tackle this problem using Explorium to see how enriching your data could offer a quick and easy solution.
The question: finding the features that are most important for predictive pricing models
The question here is about finding the optimal price, but not just based on average ticket prices around the league (let’s assume, for this exercise, that we’re focusing on the NBA). It’s about understanding how much customers are willing and able to pay and finding a price point that meets those conditions and still gives your organization profits off each ticket sold. It’s about using data to boost your ROI.
The goal: find the features that result in the best predictive pricing models
Using Explorium’s data enrichments for better answers
Before going further, let’s take stock of the data we had to begin with (going by the columns in the initial database):
- The date and time of the game being played
- The visiting and home teams
- The number of tickets sold (our target variable)
- Whether the teams are playoff contenders
- Location data on state, zip code, street address, city, and country
- The arena where the game is taking place
Based on this, you could assume you already have a fairly robust internal dataset. However, this data can’t really account for several key variables that might give your pricing model a much-needed uplift.
So before we even build a model, let’s simply focus on getting the data you need into your dataset. This enrichment takes a few minutes on the Explorium platform and requires your internal data and a single click.
Let’s see what new data Explorium added to enrich your initial dataset.
- Information about the teams playing, which can lead to features such as whether the starting players for each team are in the lineup
- Search engine results that can show us what potential customers are searching for and if it will lead to purchases
- Income data by zip code which can help us determine things like the average income in the city where the game is being played and median family incomes
- US rental statistics, which can help us find the right price by telling us about disposable income. For instance, one feature generated with this data is whether an individual’s rent will be more than X% of a potential customer’s income in the previous 12 months
- Weather data that can indicate whether customers are likelier to go out and enjoy outside activities
From enrichment to feature engineering and generation
Once we’ve enriched your data, Explorium starts doing the hard work of actively boosting your ML models.
The first part of this goes beyond adding sources. Once the dataset is matched with as many sources as possible, Explorium proceeds to find those most relevant to you and rank them before continuing to the feature engineering process. On the platform, you’ll be able to see the most relevant sources available, which in this case include:
- Squad information (including things like who’s playing and which stars are in the lineup)
- Industry sales statistics
- Search engine results
- Country statistics
- Company registry data
- US income by ZIP code
- Housing Units Rent statistics
From here, you can choose which sources to include or exclude, or even let Explorium choose for you. In the case of your hypothetical NBA franchise, here are some of the most relevant:
- Whether the team’s biggest player is in the lineup
- Whether the second top player is playing
- The number of reviews for a given arena
- If the team has an official Instagram profile
- The number of ads
- Whether the visiting team’s top player is playing
Testing models and seeing uplift
Once a feature list is built, optimized, and ranked, it’s time to train your models to see how well the dataset performs. In our example, the data was able to provide an R2 score of 68.21.
Compared with your internal dataset, which scored a whopping 4.77, Explorium gave you an impressive 1330% uplift (it’s worth noting that, while this is undoubtedly a great result, usually uplifts are more within normal ranges).
More importantly, however, you can see how a variety of models perform instead of testing each once at a time. The best results come from a standard XGBoost model which used a few unique features, including:
- Men not enrolled in school, not high school graduates
- Median number of rooms
- Whether a stadium lacks complete plumbing facilities
- Percentile percentage of civilians (16+) unemployed
Next, a random forest model (with eight iterations) scored a 66.23 R2, using unique features that include:
- The percentage of people with at least a high school degree
- Whether a team’s star is playing
- Median number of rooms per renter
- The male median age
Gaining insights, deploying your model, and beyond
From here, the next step is to see the insights you can glean from the training and test models. Explorium lets you view an insight tree to see which combinations of features lead to the most accurate predictions, as well as see which features contribute the most to your predictive pricing models. Additionally, you can identify meaningful combinations of variables, and even compare different models to determine which is best for your specific need.
Once you’re satisfied with your insights, you can continue to run predictions based on the models you choose, and can even set scheduled predictions. Using your XGBoost from above, let’s try to run a prediction.
The results are even better than we could have imagined, with an R2 score of 68.21. You can also check your model’s performance in terms of absolute percentage errors (MAPE, MdAPE, and SMAPE). From there, all that’s left is to deploy your model and start making predictions.
Build the models you need now, not in weeks
If you’re keeping score (pun intended), Explorium just let you build a model to predict ticket prices in under 15 minutes, including connecting you to thousands of external data sources, refining your datasets, and even building you hundreds of possible features to use in a variety of models (which it also tested in parallel).
Even better, this is only a small sample of what Explorium can do. Thanks to our powerful AI-driven data enrichment engine and feature generation tools, Explorium can build models for a variety of predictive questions and use cases.