Abstract:
Science gateways have gained increasing attention in the last years from
diverse communities. Science gateways are software solutions that bring
out the integration of reusable data and specialized techniques via Web
servers while hiding the complexity of the underlying high-performance
computing resources. Several projects and initiatives have been started
worldwide to develop frameworks that support the broad range of key
scientific domains. Biological sciences are undergoing a revolution
since novel technologies, such as next-generation sequencing, allow data
generation in exascale dimensions. Bioinformatics covers a wide range
of important applications in health, diversity, and life sciences with
the understanding of the high-performance computing culture to
accelerate the transition of computational simulations of biological
systems at all scales. The article introduces the BioinfoPortal gateway, its architecture, functionalities, and the integration to the CSGrid
middleware used to manage the high-performance computing environment of
the Brazilian National High-Performance Computing System, SINAPAD,
including the Santos Dumont supercomputer. We present a discussion about
the challenges of integrating BioinfoPortal and CSGrid
framework, which considers the general process of the installation,
configuration, and deployment. Finally, we present the findings of the
performance analysis of high-performance computing applications,
presenting how machine learning was applied to optimize the
functionality of BioinfoPortal based on recommending predictive models for the efficient allocation of resources obtained over 75% of performance efficiency.