SAVIME: An Array DBMS for Simulation Analysis and ML Models Prediction

  • 14/02/2021

Hermano. L. S. Lustosa, Anderson C. Silva, Daniel N. R. da Silva, Patrick Valduriez, Fabio Porto

Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make them benefit from DBMS support, enabling Declarative data analysis and visualization over scientific data, we present an in-memory array DBMS system called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database system, SciDB, during the process of data ingestion. We also show that it is possible to use SAVIME as a storage alternative for a numerical solver without affecting its scalability, making it useful for modern ML based applications.

Tactful Networking: Humans in the Communication Loop

  • 07/12/2020

Rafael Lima Costa, Aline Carneiro Viana, Artur Ziviani, Leobino Nascimento Sampaio

This survey discusses the human-perspective into networking through the Tactful Networking paradigm, whose goal is to add perceptive senses to the network by assigning it with human-like capabilities of observation, interpretation, and reaction to daily-life features and associated entities. To achieve this, knowledge extracted from inherent human behavior in terms of routines, personality, interactions, and others is leveraged, empowering the learning and prediction of user needs to improve QoE and system performance while respecting privacy and fostering new applications and services. Tactful Networking groups solutions from literature and innovative interdisciplinary human aspects studied in other areas. The paradigm is motivated by mobile devices’ pervasiveness and increasing presence as a sensor in our daily social activities. With the human element in the foreground, it is essential: (i) to center big data analytics around individuals; (ii) to create suitable incentive mechanisms for user participation; (iii) to design and evaluate both human-aware and system-aware networking solutions; and (iv) to apply prior and innovative techniques to deal with human-behavior sensing and learning. This survey reviews the human aspect in networking solutions through over a decade, followed by discussing the tactful networking impact through literature in behavior analysis and representative examples. This paper also discusses a framework comprising data management, analytics, and privacy for enhancing human raw-data to assist Tactful Networking solutions. Finally, challenges and opportunities for future research are presented.

STConvS2S: Spational Convutional Sequence to Sequence Network for Weather Forecasting

  • 10/11/2020

Rafaela Castro, Yania M. Souto, Eduardo Ogasawara, Fabio Porto, Eduardo Bezerra

Applying machine learning models to meteorological data brings many opportunities to the Geosciences field, such as predicting future weather conditions more accurately. In recent years, modeling meteorological data with deep neural networks has become a relevant area of investigation. These works apply either recurrent neural networks (RNN) or some hybrid approach mixing RNN and convolutional neural networks (CNN). In this work, we propose STConvS2S (Spatiotemporal Convolutional Sequence to Sequence Network), a deep learning architecture built for learning both spatial and temporal data dependencies using only convolutional layers. Our proposed architecture resolves two limitations of convolutional networks to predict sequences using historical data: (1) they violate the temporal order during the learning process and (2) they require the lengths of the input and output sequences to be equal. Computational experiments using air temperature and rainfall data from South America show that our architecture captures spatiotemporal context and that it outperforms or matches the results of state-of-the-art architectures for forecasting tasks. In particular, one of the variants of our proposed architecture is 23% better at predicting future sequences and five times faster at training than the RNN-based model used as a baseline.

A Survey of Biodiversity Informatics: Concepts, Practices and Challenges

  • 29/09/2020

Luiz M. R. Gadelha Jr., Pedro C. de Siracusa, Artur Ziviani, Eduardo Couto Dalcin, Helen Michelle Affe, Marinez Ferreira de Siqueira, Luís Alexandre Estevão da Silva, Douglas A. Augusto, Eduardo Krempser, Marcia Chame, Raquel Lopes Costa, Pedro Milet Meirelles, Fabiano ThompsonLuiz M. R. Gadelha Jr., Pedro C. de Siracusa, Artur Ziviani, Eduardo Couto Dalcin, Helen Michelle Affe, Marinez Ferreira de Siqueira, Luís Alexandre Estevão da Silva, Douglas A. Augusto, Eduardo Krempser, Marcia Chame, Raquel Lopes Costa, Pedro Milet Meirelles, Fabiano Thompson

The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molecular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Biodiversity data follows a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.

You Shall not Pass: Avoiding Spurious Paths in Shortest-Path Based Centralities in Multidimensional Complex Networks

  • 28/09/2020

Klaus Wehmuth, Artur Ziviani, Leonardo Chinelate Costa, Ana Paula Couto da Silva, Alex Borges Vieira

In complex network analysis, centralities based on shortest paths, such as betweenness and closeness, are widely used. More recently, many complex systems are being represented by time-varying, multilayer, and time-varying multilayer networks, i.e. multidimensional (or high order) networks. Nevertheless, it is well-known that the aggregation process may create spurious paths on the aggregated view of such multidimensional (high order) networks. Consequently, these spurious paths may then cause shortest-path based centrality metrics to produce incorrect results, thus undermining the network centrality analysis. In this context, we propose a method able to avoid taking into account spurious paths when computing centralities based on shortest paths in multidimensional (or high order) networks. Our method is based on MultiAspect Graphs (MAG) to represent the multidimensional networks and we show that well-known centrality algorithms can be straightforwardly adapted to the MAG environment. Moreover, we show that, by using this MAG representation, pitfalls usually associated with spurious paths resulting from aggregation in multidimensional networks can be avoided at the time of the aggregation process. As a result, shortest-path based centralities are assured to be computed correctly for multidimensional networks, without taking into account spurious paths that could otherwise lead to incorrect results. We also present a case study that shows the impact of spurious paths in the computing of shortest paths and consequently of shortest-path based centralities, thus illustrating the importance of this contribution.

An analysis of malaria in the Brazilian Legal Amazon using divergent association rules

  • 03/08/2020

Lais Baroni, Rebecca Salles, Samella Salles, Gustavo Guedes, Fabio Porto, Eduardo Bezerra, Christovam Barcellos, Marcel Pedroso, Eduardo Ogasawara

In data analysis, the mining of frequent patterns plays an important role in the discovery of associations and correlations between data. During this process, it is common to produce thousands of association rules (ARs), making the study of each one arduous. This problem weakens the process of finding useful information. There is a scientific effort to develop approaches capable of filtering interesting patterns, balancing the number of ARs produced with the goal of not being trivial and known by specialists. However, even when such approaches are adopted, the number of produced ARs can still be high. This work contributes by presenting Divergent Association Rules Approach (DARA), a novel approach for obtaining ARs that presents themselves in divergence with the data distribution. DARA is applied right after traditional approaches to filtering interesting patterns. To validate our approach, we studied the dataset related to the occurrence of malaria in the Brazilian Legal Amazon. The discovered patterns highlight that ARs brought relevant insights from the data. This article contributes both in the medical and computer science fields since this novel computational approach enabled new findings regarding malaria in Brazil.

CoVeC: Coarse-Grained Vertex Clustering for Efficient Community Detection in Sparse Complex Networks

  • 01/06/2020

G. S. Carnivali, A. B. Vieira, P. A. A. Esquef

BioinfoPortal: A scientific gateway for integrating bioinformatics applications on the Brazilian national high-performance computing network

  • 01/06/2020

Kary A.C.S.Ocaña, Marcelo Galheigo,Carla Osthoff, Luiz M.R. Gadelha Jr., Fabio Porto, Antônio Tadeu A.Gomes, Daniel de Oliveira, Ana Tereza Vasconcelosa

SUQ2 : Uncertainty Quantification Queries over Large Spatio-temporal Simulations

  • 03/04/2020

Noel Moreno Lemus, Fabio Porto, Yania M. Souto, Rafael S. Pereira, Ji Liu, Esther Pacciti, and Patrick Valduriez

Efficient Network Seeding under Variable Node Cost and Limited Budget for Social Networks

  • 01/04/2020

R. C. Souza, D. R. Figueiredo, A. A. A. Rocha

Parallel computation of PDFs on big spatial data using Spark. Distributed and Parallel Databases

  • 12/03/2020

Ji Liu, Noel Moreno Lemus, Esther Pacitti

New perspectives on analysing data from biological collections based on social network analytics

  • 01/02/2020

P. C. Siracusa

Towards Optimizing the Execution of Spark Scientific Workflows Using Machine Learning based Parameter Tuning

  • 27/02/2019

Douglas de Oliveira, Fábio Porto, Cristina Boeres, Daniel de Oliveira


In the last few years, Apache Spark has become de facto the standard of big data framework on both industry and academy projects. Especially in the scientific domain, it is already used to execute compute- and data-intensive workflows from biology to astronomy. Although Spark is an easy-to-install framework, it has more than one hundred parameters to be set, besides specific application design parameters. In this way, to execute Spark-based workflows in an efficient manner, the user has to fine tune a myriad of Spark and workflow parameters (even the partitioning strategy, for instance). This configuration task cannot be manually performed in a trial-and-error way, since it is tedious and error-prone. This article proposes an approach that focuses on generating predictive machine learning models (i.e. decision trees), and then extract useful rules (i.e. patterns) from this model that can be applied to configure parameters of future executions of the workflow and Spark for non-experts users. In the experiments presented in this article, the proposed parameter configuration approach led to better performance in processing Spark workflows. Finally, the methodology introduced here reduced the number of parameters to be configured, by identifying the most relevant ones related to the workflow performance in the predictive model.

Parallel computation of PDFs on big spatial data using Spark

  • 21/02/2019

Ji Liu, Noel Lemus, Esther Pacitti, Patrick Valduriez

Distributed and Parallel Databases

Graph-Based Skill Acquisition for Reinforcement Learning

  • 12/02/2019

M. R. F. Mendonça,A. M. S. Barreto

ACM Computing Surveys (CSUR), ISSN: 0360-0300, vol. 52, issue 1, article no. 6

Understanding Human Mobility and Workload Dynamics Due To Different Large-Scale Events Using Mobile Phone Data

  • 16/10/2018

H. T. Marques-Neto, F. H. Z. Xavier, W. Z. Xavier, L. M. Silveira, J. M. de Almeida,C. H. S. Malab

Journal of Network and Systems Management (JONS), Springer, ISSN: 1064-7570, vol. 26, no. 4, pp. 1079-1100,

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

  • 14/06/2018

Magalhães, T., Loss, G., Wilde, M., Foster, I., Mattoso, M.,

PeerJ, 6, e5551.

GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis.

  • 13/06/2017

Ribeiro-Alves, M.

PeerJ, 5, e3509.

MobHet: Predicting Human Mobility Using Heterogeneous Data Sources

  • 12/12/2016

L. M. Silveira, J. M. Almeida, H. T. Marques-Neto, C. Sarraute

Computer Communications, Special issue on Mobile Traffic Analysis, Elsevier Science, ISSN: 0140-3664, vol. 95, pp. 54-68

On MultiAspect Graphs

  • 12/10/2016

E. Fleury

Theoretical Computer Science (TCS), Elsevier Science, ISSN: 0304-3975, vol. 651, pp. 50-61

A note on the complexity of the causal ordering problem

  • 04/07/2016

Bernardo Gonçalves, Fabio Porto

Managing Scientific Hypotheses as Data with Support for Predictive Analytics

  • 18/08/2015

Bernardo Gonçalves

 Computing in Science and Engineering 17(5): 35-43 (2015)

BaMBa: towards the integrated management of Brazilian marine environmental data

  • 16/06/2015

Meirelles, P. M., Francini-Filho, R. B., Leão, R. de M., Amado-Filho, G. M., Bastos, A. C., … Thompson, F. L.


Υ-DB: Managing scientific hypotheses as uncertain data

  • 13/08/2014

Bernardo Gonçalves

PVLDB 7(11): 959-962

Applying Provenance to Protect Attribution in Distributed Computational Scientific Experiments. In Provenance and Annotation of Data and Processes

  • 11/06/2014

Mattoso, M.

IPAW 2014. Lecture
Notes in Computer Science, vol. 8628 (Vol. 8628, pp. 139–151). Springer.

MTCProv: a practical provenance query framework for many-task scientific computing.

  • 13/06/2012

Wilde, M., Mattoso, M., & Foster, I

Distributed and Parallel Databases, 30(5–6), 351–370.

A Conceptual View on Trajectories

  • 02/01/2008

Stefano Spaccapietra, Christine Parent, Maria Luiza Damiani, José Antônio F. Macedo, Christelle Vangenot,

Journal of Data and Knowledge Engineering, pp.126-146, ISSN:0169-023X, V(65)