Data Science is a field of study that stands out for the ability to
assist the discovery of useful information from large or complex
databases, as well as data-driven decision making. It can be defined as a
set of strategies, tools and techniques for collecting, transforming
and analyzing data carried out by multidisciplinary teams formed by
researchers with substantive knowledge of the problem under analysis -
in our case public health - statisticians, mathematicians and computer
scientists (date -driven analysis).
It combines traditional
analysis methods with sophisticated algorithms to process large volumes
of data in various formats; structured, semi-structured and
unstructured. The process of analysis in the scope of Data Science
involves the phases of (i) collection and ingestion: extraction,
transformation and load (better known as ETL); (ii) pre-processing:
selection of records, reduction of dimensionality, normalization,
creation of subsets of data; (iii) exploratory analysis and data mining:
mainly analyzes aimed at classification, association, clustering,
anomaly detection and prediction; (iv) post-processing: pattern
interpretation, filtering, visualization and coupling in decision
support systems and online platforms for visualization.