Data Science is a field of study that stands out for the ability to
assist the discovery of useful information from large or complex
databases, as well as data-driven decision making. It can be defined as a
set of strategies, tools and techniques for collecting, transforming
and analyzing data carried out by multidisciplinary teams formed by
researchers with substantive knowledge of the problem under analysis -
in our case public health - statisticians, mathematicians and computer
scientists (date -driven analysis).
It combines traditional analysis methods with sophisticated algorithms to process large volumes of data in various formats; structured, semi-structured and unstructured. The process of analysis in the scope of Data Science involves the phases of (i) collection and ingestion: extraction, transformation and load (better known as ETL); (ii) pre-processing: selection of records, reduction of dimensionality, normalization, creation of subsets of data; (iii) exploratory analysis and data mining: mainly analyzes aimed at classification, association, clustering, anomaly detection and prediction; (iv) post-processing: pattern interpretation, filtering, visualization and coupling in decision support systems and online platforms for visualization.