Table of Contents
Data science is an empirical approach that is based on data to provide answers to problems. The first component to this approach is the data. Surprisingly, access to data that is useful for analysis is not always easy to find. Sometimes, the data is not available, manipulation techniques (join, etc.) may not be feasible, or is forbidden for confidentiality reasons. There are also cases where the IT department is not able to extract data due to technical reasons.
In machine learning, most of the time, the more data, the better! Indeed, data science often relies on long-known data mining algorithms, often developed before the 2000s. Their performance has often been limited by lack of available data, or IT teams not being able to handle large volumes of data.
Today, the rise in storage capacities and the profusion of digital data that they generate, coupled with more powerful computers, are bringing these algorithms to center stage. So be aware that large volumes of data are not a problem, but on the contrary, this presents a great opportunity to find valuable information!
Data Science is a disciplinary mix of data inference, algorithm development, and technology to solve complex analytical problems like prediction, clustering, and classification. At the heart of this great mix, the data is stored in the company’s data warehouses.
Additional Resources: