We are living in the Big Data era, and we are witnessing a shift in the role of data management system: Rather than “just” being the systems of record at the heart of traditional enterprises, modern Big Data management systems must model, capture, track, and react to the current state of the world. Doing so requires the ingestion of event data, arriving from a variety of devices, as well as enabling query access to the history of captured data over time. These requirements span a variety of scientific disciplines, including the handling of data produced by a variety sensors in health care, environmental monitoring applications, traffic monitoring, dynamic social network data, and many other domains.
AsterixDB is an open source Big Data Management System (BDMS) with a feature set that’s very different than those of other platforms in today's Big Data ecosystem. The system was initially co-developed by UC Irvine and UC Riverside, starting in 2009 and leading eventually to its first beta release in mid-2013. It has recently moved to Apache, where AsterixDB is now an active incubating project. Many of the system’s key design decisions relate to the aforementioned shift. This talk will first briefly review AsterixDB’s data model, query language, and scale-out architecture. It will then examine a number of counter-cultural aspects of the AsterixDB system, including where its data lives, its runtime architecture, its approach to streaming data, its view of transactions, and its features for handling time-based data.
Michael J. Carey is a Bren Professor of Information and Computer Sciences at UC Irvine. Before joining UCI in 2008, Carey worked at BEA Systems for seven years and led the development of BEA's AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years teaching at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. Carey is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests all center around data-intensive computing and scalable data management (a.k.a. Big Data).
Graph databases have been an active research topic as of late, owing to their applications in areas such as the Semantic Web and Social Network Analysis. In this talk, we present features that are used by several real-world query languages for graph databases. These include navigation, pattern matching, path variables, ungrouping and data comparisons. We analyze their properties, having two main objectives in mind. First, we would like to provide simple yet general definitions of key features of graph query languages, to enable analyses of their different uses in practice and to establish a common ground for their theoretical study. Second, we would like to use them for studying tradeoff between expressiveness and efficiency of existing query languages for graph databases.
Pablo Barceló is an Associate Professor in the Department of Computer Science at the University of Chile. He received his PhD from the University of Toronto in 2006. His main research interest are in the areas of databases and logic in computer science. He has written over 30 technical papers and served on the program committees of major conferences in his area (ACM PODS, SIGMOD, ICDT, STACS, ACM/IEEE LICS). He has also been an invited tutorial speaker at ACM PODS 2013. He is a member of the editorial board of Logical Methods in Computer Science and the editor of the Database Principles Column of the SIGMOD Record.
The Semantic Web such as envisaged by Tim Berners Lee in the early years of this century foresaw a universal medium to exchange data, where resources could be published and interconnected with others, providing interoperation between users and machines, in order to facilitate their computational tasks. Since then a flood of semantic technologies have been created, revolutionizing the way to store, access, and communicate digital information. Among these, Linked Data (LD) emerged as an innovative technology for realizing the Semantic Webvision of making the Web a global, distributed, semantics-based information system. Despite being a powerful strategy to link data resources, these links are not always created adequately, i.e., frequently they are not semantically associated with other resources, hindering the interoperability between them. In order to minimize this problem, some LD technologies have been recently developed. This presentation aims to give an evolutionary overview of the state of the art, focusing on the main Semantic Web technologies that have been responsible for: (i) describing data semantically enriched; (ii) linking resources on the Web of data according to their semantic meaning; and (iii) providing the efficient consumption of these resources. In this context, I will show the main works I have been carrying out in the domain for these last 15 years.
Ana Maria de C. Moura is a research collaborator at Extreme Data Laboratory (DEXL) at LNCC, where she has been working for 5 years on applying semantic technologies to improve database integration, data retrieval and data publishing. Graduated on Computer Science from the UTC-Université de Technologie de Compiègne (Compiègne, France) in 1984, andM.Sc in Computer Science, COPPE/Federal University of Rio de Janeiro (UFRJ) 1979, she worked for more than 20 years as professor and coordinator of the Database research area at the Computing Engineering Department of IME-Military Institute of Engineering, Rio de Janeiro, where she still contributes as collaborator professor. More than 50 advised Master theses concluded and more than 190 national and international publications. Her main domain interests are: databases, semantic data integration, conceptual modeling, metadata management,ontologies, linked open data.