Brasil BRASIL
Contact
Advanced Database Techniques for Processing Scientific Multi-Dimensional Data and Stochastic Gradient Descent on Highly-Parallel Architectures: Multi-core CPU or GPU? Synchronous or Asynchronous?

Advanced Database Techniques for Processing Scientific Multi-Dimensional Data and Stochastic Gradient Descent on Highly-Parallel Architectures: Multi-core CPU or GPU? Synchronous or Asynchronous?

Event Location
LNCC - Auditório B
Date
22/05/2019
TO
22/05/2019
Event Site

Presentation 1: Advanced Database Techniques for Processing Scientific Multi-Dimensional Data

Abstract

Scientific applications are generating an ever-increasing volume of multi-dimensional data that require fast analytics to extract meaningful results. The database community has developed distributed array databases to alleviate this problem. In this talk, we introduce four classical techniques that we extend to array databases. The first is a novel distributed similarity join operator for multi-dimensional arrays that minimizes the overall data transfer and network congestion while providing load-balancing, without completely repartitioning and replicating the input arrays. The second technique is materialized array views and incremental view maintenance under batch updates. We give a three-stage heuristic that finds effective update plans and repartitions the array and the view continuously based on a window of past updates as a side-effect of view maintenance. The third technique is User-Defined Functions (UDF) for structural locality operations on arrays. We propose an in-situ UD F mechanism, called ArrayUDF, that allows users to define computations on adjacent array cells without the use of join operations and executes the UDF directly on arrays stored in data files. The fourth technique is a distributed framework for cost-based caching of multi-dimensional arrays in native format. We design cache eviction and placement heuristic algorithms that consider the historical query workload. These techniques are motivated by the Palomar Transient Factory (PTF) astronomical project and are implemented in the real-time transient detection pipeline. They have played a pivotal role in the first-ever observation of a neutron star merger which produces gravitational waves and turns out to be the origin of heavy elements, including gold. This has lead to a Science magazine article that has received extensive media coverage on ACM TechNews, Slashdot, FiveThirtyEight, and Quanta Magazine, among others.


Presentation 2: Stochastic Gradient Descent on Highly-Parallel Architectures: Multi-core CPU or GPU? Synchronous or Asynchronous?


Abstract

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow, implement their compute-intensive primitives in two flavors---as multi-thread routines for multi-core CPUs and as highly-parallel kernels executed on GPU. Stochastic gradient descent (SGD) is the most popular optimization method for model training implemented extensively on modern data analytics platforms. While the data-intensive properties of SGD are well-known, there is an intense debate on which of the many SGD variants is better in practice. In this work, we perform a comprehensive experimental study of parallel SGD for training machine learning models. We consider the impact of three factors -- computing architecture (multi-core CPU or GPU), synchronous or asynchronous model updates, and data sparsity -- on three measures---hardware efficiency, statistical efficiency, and time to convergence. We draw several interesting findings from our experiments with logistic regression (LR), support vector machines (SVM), and deep neural nets (MLP) on five real datasets. As expected, GPU always outperforms parallel CPU for synchronous SGD. The gap is, however, only 2-5X for simple models, and below 7X even for fully-connected deep nets. For asynchronous SGD, CPU is undoubtedly the optimal solution, outperforming GPU in time to convergence even when the GPU has a speedup of 10X or more. The choice between synchronous GPU and asynchronous CPU is not straightforward and depends on the task and the characteristics of the data. Thus, CPU should not be easily discarded for machine learning workloads. We hope that our insights provide a useful guide for applying parallel SGD in practice and -- more importantly -- choosing the appropriate computing architecture.

Speakers

Florin Rusu

Bio

Florin Rusu is an associate professor in the School of Engineering at University of California Merced and a faculty scientist in the Scientific Data Management Group at Lawrence Berkeley National Lab. He is the recipient of a Hellman Faculty Fellowship in 2013 and a DOE Early Career Award in 2014. Florin publishes in and serves on PC committees for database conferences such as SIGMOD, VLDB, ICDE, and SSDBM regularly. A notable accomplishment is the use of his work on array databases in astronomy for identifying transient celestial objects. This lead to the first-ever observation of a merger of two neutron stars which turns out to be the origin of gold in the Universe. This work is featured in the Science magazine.