Middleware for Large-Scale Data-Intensive Applications

The current project under NSF/ITR Award #0219267 with PI Dr. Peter Pulay and co-PI Dr. Amy Apon is the development of the Array Files interface for large out-of-core chemistry applications. The motivating applications for Array Files are the computational chemistry applications developed by Pulay as described below. These applications manipulate enormous sparse arrays internally consisting of tens of thousands of moderately-sized arrays that do not all fit into the collective memory of the cluster. Our continuing research is to further develop this interface by designing and implementing an efficient middle layer between the API and the storage access layer. Supercomputing resources are a testbed for validating our work experimentally.
In funded work with Acxiom Corporation we are researching techniques for dynamically allocating nodes in a grid computing system and the data assigned to those nodes in particular applications in a way that balances the priority of individual applications with the overall throughput of the system. One of our research goals is to allow the grid management system to automatically respond and adjust to changing loads and component failures.
View more information
at Dr. Apon's web site.
View
more information at Dr. Pulay's research web site.
Computational Condensed Matter Physics Group (CCMP)
This research focuses on one of the frontiers of Physics and Materials Science, namely, to predict, understand, and finally realize novel types of nanomaterials by design. The main computational approach to be used is the state-of-art first-principles density-functional theory (DFT). This theory has been known to be very accurate and has great predicting power. However, applications of this theory to nanomaterials are found to be exceptionally difficult since meaningful modeling of nano-sized materials require the consideration of several hundreds (or even thousands) of atoms, which far exceeds the size range (~a few hundred atoms) that normal DFT can handle.

For this reason, many existing theoretical studies of nano-dots rely on semi-empirical or model approaches. By contrast, the first-principles approach has much greater predicting power and is able to handle realistic atomic-scale environment. However, it requires a powerful computing facility. The supercomputing facility is expected to be heavily used to study novel nanostructure materials that possess unusual properties of technological importance.
View more information about the Computational Condensed Matter Physics Group.
DNA and Biomolecular Computing
DNA is increasingly used for non-biological applications. For example, representations of non-biological problems can be encoded in DNA sequences, and then, manipulated by enzymatic and laboratory techniques to compute a solution. In addition, DNA template matching hybridization reaction can be used for directed self-assembly of nanostructures. A key operation for these applications is the template-matching hybridization reaction between DNA oligonucleotides.

To implement a successful computations or nanostructures, the hybridizations should occur as designed. Otherwise, unplanned hybridizations (i. e. crosshybridizations) can occur, with several negative, including false positive and negatives for computations, and structural defects for nanostructures. Therefore, a first step, which has been termed the DNA word design problem, is to find sequences that minimize crosshybridizations. In NSF-supported research by Deaton and others, non-crosshybridizing libraries of DNA sequences are designed with software tools and in vitro protocols.
The objective is to produce as large a library as possible, as this represents the raw material for DNA-based nanotechnology and computers, and determines both scaling properties and reliability of the computation or structure. For example, for DNA sequences of length 20, there are 420 sequences from which to choose. It is unclear how many of these sequences are non-crosshybridizing. In order to successfully search this large space for non-crosshybridizing sequences, large computer systems are necessary. University of Arkansas supercomputing resources allow the search to be parallelized.
Read an article featuring Dr. Deaton's DNA research.
Last updated: January 29, 2007