2016): 1329-1340 (PVLDB 9(13)
Supported by increasingly efficient HPC infra-structure, nu-merical simulations are rapidly expanding to fields such asoil and gas, medicine and meteorology. As simulations be-come more precise and cover longer periods of time, theymay produce files with terabytes of data that need to be effi-ciently analyzed. In this paper, we investigate techniques formanaging such data using an array DBMS. We take advan-tage of multidimensional arrays that nicely models the di-mensions and variables used in numerical simulations. How-ever, a naive approach to map simulation data files may leadto sparse arrays, impacting query response time, in particu-lar, when the simulation uses irregular meshes to model itsphysical domain. We propose efficient techniques to mapcoordinate values in numerical simulations to evenly dis-tributed cells in array chunks with the use of equi-depth his-tograms and space-filling curves. We implemented our tech-niques in SciDB and, through experiments over real-worlddata, compared them with two other approaches: row-storeand column-store DBMS. The results indicate that multidi-mensional arrays and column-stores are much faster than atraditional row-store system for queries over a larger amountof simulation data. They also help identifying the scenarioswhere array DBMSs are most efficient, and those where theyare outperformed by column-stores.