Machine Learning
Diving Deeper into Machine Learning: Exploring Updates, Analysis, and Practical Insights on Our Blog
Feature Engineering – A Complete Introduction
What is Feature Engineering? Feature engineering is the process of improving a model’s accuracy by using domain knowledge to select and transform raw data’s most relevant variables into features of predictive models that better represent the underlying problem. Feature engineering and selection aim to improve the way statistical models and machine learning (ML) algorithms […]
5 Reasons Why Feature Engineering is Challenging
Feature engineering is an important part of leveraging big datasets. Even with the right technical skills and domain knowledge, it can still be a time consuming process. This blog article will go over feature engineering, the five biggest challenges associated with it, and how automated feature engineering can help. What is feature engineering? Turning data […]
Foundations of Machine Learning: From Classification to Evaluation
Machine Learning and its evaluation are foundational to modern technological advancements. As we delve into the intricate world of algorithms and data, understanding core concepts is paramount. Whether it’s the nuanced art of classification or the precise science of validation, each element plays a pivotal role in crafting robust and efficient models. The journey through […]
What is Explainable Artificial Intelligence, and why is it important for predictive models?
An introduction to AI explainability; why should you trust your model? Explainable artificial intelligence (XAI) is the process of understanding how and why a machine learning model makes its predictions. It can also help machine learning (ML) developers and data scientists to better understand and interpret a models’ behavior. A major challenge for machine learning […]
Data Matching
Data Matching Definition Data matching, also known as record linkage, refers to the process of comparing two sets of collected data, typically via advanced machine learning algorithms or by programmed loops. The processes sequentially compare each individual data point in a set to each individual data point in another set, or compare each data string […]
Data Roles & Governance: Orchestrating the Symphony of Data-Driven Decisions
In an era where data drives decisions, understanding its intricacies and the roles that harness its power is paramount. This journey begins with grasping the overarching discipline of data science, delves into the principles that ensure its effective use, and then explores the specific roles that turn data into actionable insights. Data Science At the […]
How to dramatically increase your Elasticsearch throughput and concurrency capacity
As published on Medium. Every Data Engineer who uses Elasticsearch as a documents store, knows that there are many parameters that affect the queries latency, throughput, and eventually the Queries Per Second (AKA — QPS). In one of our Projects at Explorium, we have an Elasticsearch cluster, hosted in AWS with 14 nodes of m5.4xlarge.elasticsearch. […]
How To Power Your BI and ML with External Data, Pain Free
Companies today understand the value of external data and the need to look beyond their four walls to get the data required for accurate predictive models. This is especially true after the events of the past year, where the COVID-19 pandemic rendered internal data inadequate in predicting future trends. McKinsey & Co. noted: “In a […]
Decisions, Decisions: A Quick Guide to Classification Algorithms and How to Choose the Right One
To decision-tree or not to decision-tree, that is the question. Or to cluster, for that matter. Or to linear regress. Classification is a key part of machine learning (ML), helping you to define factors and variables and/or train your model to recognize items and patterns. Sometimes that might mean teaching the model to classify and […]
Supervised Learning – A Complete Introduction
What is Supervised Learning in Artificial Intelligence? Supervised learning, also called supervised machine learning, is a subset of artificial intelligence (AI) and machine learning. The goal of supervised learning is to understand data within the context of a particular question. Supervised learning involves using labeled datasets to train computer algorithms for a particular output. As the […]
Bias or Variance? How Each Affects Your Model and Why You Should Care
We all want our models to be as accurate as possible. While you can’t control every factor that interferes with accuracy, there are two types of “reducible error” that you can address. These are machine learning bias and variance. The trouble is, each of these issues relates to opposite sides of a data problem. As […]
A Brief Introduction to Predictive Model Deployment
You’ve built your model, you’ve located your data sources, and you’ve done all the initial processing and ETL to get your data how you want it. Now you’re ready — or almost ready — to deploy your predictive models in the real world. But wait! Before you go further, you need to appreciate that this […]
2021 Will Be an Inflection Point for Data Science and Fintech
It’s getting crowded in fintech. That’s not a new revelation or anything, but it’s a trend that’s had a lot of serious consequences — especially as we start 2021 and look ahead to the next twelve months (or more). The fintech field — which covers a wide range of sectors and services — has exploded […]
ECommerce is Pushing Forward in 2021, and ML is The Way
ECommerce is no stranger to data. Brands that sell online rely heavily on data for targeting customers, creating unique offerings, and setting themselves apart from the pack. Particularly in 2020, when lockdowns worldwide sent demand for online shopping soaring, companies needed to find ways to provide unique experiences in an increasingly crowded market. In some […]
What is Feature Engineering?
Your data is teeming with potential insights, ready to be teased out by predictive models. But doing that isn’t only about knowing what questions to ask or how to translate them into the right kinds of algorithms. It’s also about identifying and creating the most fruitful, predictive indicators — the features — in the dataset. […]
The Data Science Buzzwords and Acronyms That Defined 2020
Another year gone, and another round of retrospectives and looking back at what was and is coming. Data science had quite the 2020, but like most tech fields, it was also full of terms, buzzwords, and acronyms. Buzzwords come and go, and sometimes they take on a life of their own. Others get used so […]
How to Approach Data Preparation Using Python
As they say, the proof is in the pudding, and data preparation is where the pudding is put together. Any mistakes you make here will be baked into your dataset and later your model deployment, becoming harder to fix as you approach deployment. Fortunately, using machine learning (ML) tools like Python can help you avoid […]
How To Include Data Science Platforms in Your 2021 Budget
A good data science platform alleviates a ton of your data-related headaches. It connects to multiple data sources, including external data sources, and offers ETL tools. Augmented data discovery tools help you fill in gaps in your data. Feature engineering boosts accuracy and insights. It supports your machine learning modeling, and may even suggest the […]