Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make them benefit from DBMS support, enabling Declarative data analysis and visualization over scientific data, we present an in-memory array DBMS system called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database system, SciDB, during the process of data ingestion. We also show that it is possible to use SAVIME as a storage alternative for a numerical solver without affecting its scalability, making it useful for modern ML based applications.
This survey discusses the human-perspective into networking through the Tactful Networking paradigm, whose goal is to add perceptive senses to the network by assigning it with human-like capabilities of observation, interpretation, and reaction to daily-life features and associated entities. To achieve this, knowledge extracted from inherent human behavior in terms of routines, personality, interactions, and others is leveraged, empowering the learning and prediction of user needs to improve QoE and system performance while respecting privacy and fostering new applications and services. Tactful Networking groups solutions from literature and innovative interdisciplinary human aspects studied in other areas. The paradigm is motivated by mobile devices’ pervasiveness and increasing presence as a sensor in our daily social activities. With the human element in the foreground, it is essential: (i) to center big data analytics around individuals; (ii) to create suitable incentive mechanisms for user participation; (iii) to design and evaluate both human-aware and system-aware networking solutions; and (iv) to apply prior and innovative techniques to deal with human-behavior sensing and learning. This survey reviews the human aspect in networking solutions through over a decade, followed by discussing the tactful networking impact through literature in behavior analysis and representative examples. This paper also discusses a framework comprising data management, analytics, and privacy for enhancing human raw-data to assist Tactful Networking solutions. Finally, challenges and opportunities for future research are presented.
Applying machine learning models to meteorological data brings many opportunities to the Geosciences field, such as predicting future weather conditions more accurately. In recent years, modeling meteorological data with deep neural networks has become a relevant area of investigation. These works apply either recurrent neural networks (RNN) or some hybrid approach mixing RNN and convolutional neural networks (CNN). In this work, we propose STConvS2S (Spatiotemporal Convolutional Sequence to Sequence Network), a deep learning architecture built for learning both spatial and temporal data dependencies using only convolutional layers. Our proposed architecture resolves two limitations of convolutional networks to predict sequences using historical data: (1) they violate the temporal order during the learning process and (2) they require the lengths of the input and output sequences to be equal. Computational experiments using air temperature and rainfall data from South America show that our architecture captures spatiotemporal context and that it outperforms or matches the results of state-of-the-art architectures for forecasting tasks. In particular, one of the variants of our proposed architecture is 23% better at predicting future sequences and five times faster at training than the RNN-based model used as a baseline.
The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molecular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Biodiversity data follows a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.
In data analysis, the mining of frequent patterns plays an important role in the discovery of associations and correlations between data. During this process, it is common to produce thousands of association rules (ARs), making the study of each one arduous. This problem weakens the process of finding useful information. There is a scientific effort to develop approaches capable of filtering interesting patterns, balancing the number of ARs produced with the goal of not being trivial and known by specialists. However, even when such approaches are adopted, the number of produced ARs can still be high. This work contributes by presenting Divergent Association Rules Approach (DARA), a novel approach for obtaining ARs that presents themselves in divergence with the data distribution. DARA is applied right after traditional approaches to filtering interesting patterns. To validate our approach, we studied the dataset related to the occurrence of malaria in the Brazilian Legal Amazon. The discovered patterns highlight that ARs brought relevant insights from the data. This article contributes both in the medical and computer science fields since this novel computational approach enabled new findings regarding malaria in Brazil.
In the last few years, Apache Spark has become de facto the standard of big data framework on both industry and academy projects. Especially in the scientific domain, it is already used to execute compute- and data-intensive workflows from biology to astronomy. Although Spark is an easy-to-install framework, it has more than one hundred parameters to be set, besides specific application design parameters. In this way, to execute Spark-based workflows in an efficient manner, the user has to fine tune a myriad of Spark and workflow parameters (even the partitioning strategy, for instance). This configuration task cannot be manually performed in a trial-and-error way, since it is tedious and error-prone. This article proposes an approach that focuses on generating predictive machine learning models (i.e. decision trees), and then extract useful rules (i.e. patterns) from this model that can be applied to configure parameters of future executions of the workflow and Spark for non-experts users. In the experiments presented in this article, the proposed parameter configuration approach led to better performance in processing Spark workflows. Finally, the methodology introduced here reduced the number of parameters to be configured, by identifying the most relevant ones related to the workflow performance in the predictive model.
Distributed and Parallel Databases
ACM Computing Surveys (CSUR), ISSN: 0360-0300, vol. 52, issue 1, article no. 6
Journal of Network and Systems Management (JONS), Springer, ISSN: 1064-7570, vol. 26, no. 4, pp. 1079-1100,
PeerJ, 6, e5551.
Computer Communications, Special issue on Mobile Traffic Analysis, Elsevier Science, ISSN: 0140-3664, vol. 95, pp. 54-68
Theoretical Computer Science (TCS), Elsevier Science, ISSN: 0304-3975, vol. 651, pp. 50-61
PVLDB 7(11): 959-962
IPAW 2014. Lecture
Notes in Computer Science, vol. 8628 (Vol. 8628, pp. 139–151). Springer.