An Algorithmic Information Distortion in Multidimensional Networks Cria

  • 2020

Felipe S. Abrahão, Klaus Wehmuth, Hector Zenil, Artur Ziviani

Network complexity, network information content analysis, and lossless compressibility of graph representations have been played an important role in network analysis and network modeling. As multidimensional networks, such as time-varying, multilayer, or dynamic multilayer networks, gain more relevancy in network science, it becomes crucial to investigate in which situations universal algorithmic methods based on algorithmic information theory applied to graphs cannot be straightforwardly imported in...

Machine Learning and Knowledge Graph Inference

  • 2019

Daniel N. R. da Silva, Artur Ziviani, Fabio Porto

The increasing production and availability of massive and heterogeneous data bringforward challenging opportunities. Among them, the development of computing systemscapable of learning, reasoning, and inferring facts based on prior knowledge. In this sce-nario, knowledge bases are valuable assets for the knowledge representation and automa-ted reasoning of diverse application domains. Especially, inference tasks on knowledgegraphs (knowledge bases’ graphical representations) are increasingly im...

SAVIME: A Database Management System for Simulation Data Analysis and Visualization

  • 2019

Hermano Lustosa, Patrick Valduriez, Fabio Porto

Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make scientific applications benefit from DBMS support, enabling declarative data analysis and visualization over scientific data, we present an in-memory array DBMS system called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database sy...

A conceptual vision toward the management of Machine Learning models

  • 2019

Daniel N. R. da Silva, Yania Souto, Adolfo Simões, Carlos Cardoso, João N. Rittmeyer, Hermano Lustosa, Luciana E. G. Vignoli, Rebecca Salles, Eduardo Ogasawara, Flavia C. Delicato, Paulo de F. Pires, Artur Ziviani and Fabio Porto

To turn big data into actionable knowledge, the adoption of machine learning (ML) methods has proven to be one of the de facto approaches.When elaborating an appropriate ML model for a given task, one typically builds many models and generates several data artifacts.Given the amount of information associated with the developed models performance, their appropriate selection is often difficult. Therefore, appropriately comparing a set of competitive ML models and choosing one according to an arbi...

Deep Learning Application for Plant Classification on Unbalanced Training Set

  • 2019

Deep learning models expect a reasonable amount of training in- stances to improve prediction quality. Moreover, in classification problems, the occurrence of an unbalanced distribution may lead to a biased model. In this paper, we investigate the problem of species classification from plant images, where some species have very few image samples. We explore reduced versions of imagenet Neural Network winners architecture to filter the space of candi- date matches, under a target accuracy level. We sh...

Dealing with categorical missing data using CleanerR

  • 2019

Missing data is a common problem in the world of data analysis. They appear in datasets due to a multitude of reasons, from data integration to poor data input. When faced with the problem, the analyst must decide what to do with the missing data since its not always advisable to discard these values from your analysis. On this paper we shall discuss a method that takes into account information theory and functional dependencies to best imput missing values.Palavras-chave: categorical data, data imput...

SDN-Based Architecture for Providing Quality of Service to High Performance Distributed Applications

  • 2019

A. T. Oliveira, B. J. C. A. Martins, M. F. Moreno, A. T. A. Gomes

Constellation Queries over Big Data

  • 2018

Fabio Porto, Amir Khatibi, Joao N. Rittmeyer, Eduardo Ogasawara, Patrick Valduriez, Dennis Shasha

A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. Finding geometric patterns is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the pattern. In this paper, we propose algorithms to find patterns in large da...

Point pattern search in big data

  • 2018

Eduardo S. Ogasawara, Alberto Krone-Martins, Patrick Valduriez, Dennis E. Shasha

SSDBM 2018: 21:1-21:12...

TARS: An Array Model with Rich Semantics for Multidimensional Data

  • 2017

Noel Moreno Lemus, Patrick Valduriez

 ER Forum/Demos 2017: 114-127...

Database System Support of Simulation Data

  • 2016

Pablo Blanco, Patrick Valduriez

2016): 1329-1340 (PVLDB 9(13))...

A Unifying Model for Representing Time-Varying Graphs

  • 2015

E. Fleury

IEEE International Conference on Data Science and Advanced Analytics - IEEE DSAA 2015, Paris, France