Biomedical data generation is continuously growing both in terms of size and complexity. Clinical Study Data is complicated by the fact that new forms of associated data are continuously created as technologies emerge, including biomarkers, pathway (mechanistic) knowledge, assay platforms, and model systems. W3C semantic standards such as RDF and OWL have been around for several years, but most informatics specialists are unsure where they can be applied effectively. Semantically Linked Data (SLD) can significantly change the organization and re-use of data without requiring a concomitant investment in data systems. SLD is especially useful in dealing with changing data descriptions and the relationships they may have to other data elements, even if they exist externally in other data systems. Applications in the area of clinical data management and analyses will also be presented in the context of SLD.
Philippe Cudre Mauroux
In this talk I will introduce SciDB, a new open-source and massively parallel platform for array data storage, processing, and analysis. I will review a number of scientific use-cases and describe how these use-cases have determined the features and functionality of the system. I will introduce SciDB's data model, and will describe some of the key architectural features of the system including columnar storage, parallel user-defined functions, and overlapping array partitioning. Finally, I will introduce SSDB, a new benchmark for scientific data management systems, and will explain why SciDB is up to two orders of magnitude faster than traditional database systems on common large-scale array processing tasks.
Pedro Leite da Silva Dias
Weather forecasters face uncertainty that is inherent to the nonlinear nature of the governing equations of the atmospheric state. Significant progress has been achieved in the two decades in estimating uncertainty and predicting “predictability”. More recently, ten operational weather forecasting centers producing daily global ensemble forecasts to 1-2 weeks ahead have agreed to deliver in near-real-time a selection of forecast data to the TIGGE (THORPEX Interactive Grand Global Ensemble) data archives at CMA, ECMWF and NCAR. This is offered to the scientific community as a new resource for research and education. The objective of TIGGE is to establish closer cooperation between the academic and operational worlds by encouraging larger use of operational products for research, and to explore actively the concept and benefits of multi model probabilistic weather forecasts, with a particular focus on severe weather prediction. Data policy and current status of the archives, exchange procedures and complexity of the network will be presented. Examples of the use of super model ensembles in South America will also be shown based on extensive use of the internet.
We analyze the different tradeoffs and goals of Grid, Cloud and parallel (cluster/supercomputer) computing. They tradeoff performance, fault tolerance, ease of use (elasticity), cost,interoperability. Different application classes (characteristics) fit different architectures and we describe a hybrid model with Grids for data, traditional supercomputers for large scale simulations and
clouds for broad based "capacity computing" including many data intensive problems. We discuss the impressive features of cloud computing platforms and compare Mapreduce and MPI. We take most of our examples from the life science area.
08:30 - 08:45 - Opening
08:50 - 09:30 - Paper Session 1
09:35 - 10:30 - Invited Talk Geoffrey Fox
10:30 - 11:00 - Break (Poster Presentations)
11:00 - 12:00 - SBAC Keynote
12:00 - 12:40 - Invited Talk Pedro Leite da Silva Dias
12:40 - 14:30 - SBAC Lunch
14:30 - 15:30 - Paper Session 2
15:40 - 16:30 - Invited Talk Philippe Cudre-Mauroux
16:30 - 17:00 - Break (Poster Presentations)
17:00 - 17:50 - Invited Talk Eric Neumann
18:00 - 19:00 - Paper Session 3
- Fabio Porto
- Bruno Schulze
- Simone Santana