Developing a Grand Challenge
Materials Application Code:
An Interdisciplinary Approach
By William A. Shelton and G. Malcolm Stocks
Malcolm Stocks and Bill Shelton view a computer image of the direction and magnitude of magnetic moments calculated for a nickel-iron alloy. Photograph by Tom Cerniglio.
The combination of massively parallel processors and newly developed computation methods at ORNL is enabling scientists to simulate magnetic and other properties of metallic alloys based on the electronic structure of thousands of atoms. Such computer simulations are less costly and time-consuming than laboratory experiments and could help U.S. aerospace, biotechnology, electronics, and power generation industries become more competitive in products based on advanced materials.
Fig. 1. Advanced materials are showing up in everyday products: (a) A variety of materials make automobiles tougher and lighter. (b) Newer materials enhance the performance of catalytic converters. (c) Quasi-crystals are used in the electronics industry. (d) Carbon-fiber-reinforced composites provide strength for car bumpers and tires. (e) Bioskin is used for burn victims. (f) Turbine blades are used in the aerospace and power generation industries.
Throughout history, materials science has played an extremely important role in society. Todays significant developments in materials science include lightweight, corrosion-resistant, ductile alloys used to make turbine blades for the aerospace and power generation industries and artificial skin used in medicine (see Fig. 1).
Clearly, materials science has a profound effect on our economy. For example, in the early 1970s, the birth and growth of the microelectronics industry sparked a surge in the Japanese economy and a large shift in the global marketplace. Therefore, to improve its competitiveness in the global economy, the United States must be at the forefront of materials research to better position itself for development, commercialization, and use of the next new technology.
A key factor for any industry is the research and development (R&D) costs associated with bringing a product to market. Unfortunately, at the present time the laboratory development of new, technologically advanced materials is largely performed using phenomeno-nological parameters (based on human experience). This type of approach, while successful, is an expensive and time-consuming process.
For successful competition in the global marketplace, R&D resources must be used efficiently to enable timely release of a product at the lowest possible price. Reducing R&D costs and the time needed to bring a product to market requires strong collaborations among materials theorists, experimentalists, and computational scientists. Computational scientists are essential because the cost of performing computer simulations, or computer experiments, is far lower than the cost of conducting laboratory experiments. In addition, results can often be obtained faster from simulations than from laboratory experiments.
Fig. 2. Crystalline materials are made up of atoms in a periodic arrangement.
A key difficulty in simulating the properties of materials such as metallic alloys is that physical mechanisms responsible for particular properties exist on varying length scales ranging from the microscopic to the continuum. While theoretical models do exist across these length scales, these models often are based on phenomeno-nology, requiring some form of experimental input that diminishes their usefulness. On the other hand, on the microscopic scale there are so-called first principles, or ab initio, methods that incorporate the underlying physics and therefore do not require any adjustable parameters. These methods calculate the physical properties of crystalline materials at the microscopic level, where the crystal is made up of atoms in a periodic arrangement (see Fig. 2). In each atom, electrons orbit a nucleus composed of protons and neutrons. The outermost electrons, or valence electrons, hold the crystal together (cohesion). They are also responsible for most of a materials properties, ranging from its phase stability to its ability to conduct heat and electricity.
These atomistic methods calculate the energetics of a system of electrons in the field of atomic nuclei using quantum mechanics. In addition, it is possible to use the output from calculations using ab initio methods to determine parameters used as the input for models that operate at much larger length scales. This type of approach connects the different modeling efforts on various length scales, enabling calculations of materials properties without performing a laboratory experiment. This approach can aid in the design of technologically advanced materials at a fraction of current development costs.
Unfortunately, contributions arising from theoretical investigations have been limited. A major hurdle in implementing the above strategy is overcoming poor J(N3) scaling (where N is the number of atoms making up the system) inherent in traditional ab initio methods. J(N3) scaling arises because of either the necessity of solving an eigenvalue-eigenvector problem or the need to calculate the inverse of a matrix. This limitation reduces both the types of properties that can be realistically simulated and the types of parameters that can be reliably extracted for use in other large-scale simulations. Solving this problem requires the development of new ab initio methods that offer significantly improved scaling.
New Tools for Materials Theorists
The introduction of massively parallel processors (MPPs) in the 1980s has provided materials theorists with the tools to begin development of new ab initio methods for uncovering the physical mechanisms responsible for a materials intrinsic properties. As part of DOEs High Performance Communication Computation Program (DOE-HPCCP) in materials science, we have developed the first fully parallel, local-density-approximation (LDA)-based ab initio electronic structure method that scales linearly [J(N)] as the number of atoms is increased. This new method, referred to as the locally self-consistent multiple scattering (LSMS) method, is a real-space multiple scattering, Green-function-based method. For the first time, this new technique using ab initio methods makes possible an accurate simulation of the properties of materials whose behavior depends on the electronic structure of systems comprising hundreds to thousands of atoms.
The development of such a large-scale application requires interdisciplinary work involving quantum physics, materials science, applied mathematics, and computer science. This type of approach—formulated by materials theorists at ORNL—has been extremely successful, resulting in a highly efficient algorithm that can be used for investigations of materials properties of systems that were originally considered untenable because of the number of particles necessary to accurately perform the simulations.
Fig. 3. An example of one scattering path an electron can take within a four-atom cluster.
At the heart of multiple scattering theory is the calculation of the single-site Green function. Determining the single-site Green function requires calculation of the scattering path operator. The scattering path operator describes the scattering of an electron beginning at site i and ending at site j and includes all possible scattering paths. The total scattering path of an electron consists of both a single scattering path, by which an electron scatters from a single site, as well as a multiple scattering path, by which an electron scatters from multiple sites and returns to a site multiple times after being rescattered (see Fig. 3). For condensed systems, multiple scattering theory can be viewed as a succession of scattering events with the added advantage of expressing the scattering properties of the entire system in terms of the scattering properties of the individual atoms. Using this type of method along with standard approaches employed in traditional electronic structure methods leads to a J(N3) algorithm. Calculation of the scattering path operator requires the inverse of a matrix whose size is proportional to the number of atoms, N, making up the system and hence J(N3) scaling.
Fig.4. This plot is the log of the execution time vs the log of the number of atoms making up the crystal. The plot demonstrates the linear scaling of the LSMS method.
On the other hand, by making use of MPP machines and formulating this approach in real space with an approximation to the calculation of the scattering path operator leads to a J(N) method. In this new real-space approach, every atom must calculate its own scattering path operator and single-site Green function. Formally, to calculate the scattering path operator for site i requires inclusion of all scattering events with all the other sites, leading to the inversion of a matrix whose size is proportional to N. However, truncating the number of atoms in the calculation of the scattering path operator to m reduces the size of the matrix to be inverted. Because each atom must perform this inversion, which scales as J(m3), the total scaling for the entire N atom system is J(m3N). Now the J(N) scaling can be realized by noting that once m is determined, adding more atoms outside of the m atom cluster does not affect the calculation of the scattering path operator. In other words, once m is fixed, the algorithm naturally scales linearly as N is increased because every atom must perform the matrix inversion, whose size is the same. Therefore, each atom requires the same amount of time to perform this operation (see Fig. 4).
A key component of this approach is the decomposition of the atoms making up the crystal onto computer nodes on a large-scale MPP machine. To simplify this discussion, we will use a strategy in which we assign an atom to a computer node (generally one can assign multiple atoms to a computer node). This type of decomposition requires communication between an atom and the other atoms within its m atom cluster. The information that is sent is the single-site transfer matrix that contains information about the sending atoms scattering properties. The single-site transfer matrix is used to construct the scattering path operator and single-site Green function for the receiving atoms site.
A key contribution made by ORNL computer scientists (including the authors) was the design and development of a computationally efficient message-passing algorithm to handle communication between atoms. On a distributed message-passing machine, two pieces of information are crucial to ensuring that a particular atom receives the correct message: the node number of the computer node on which an atom resides and the message tag. The node number is used in a way similar to a postal address; like anyone who gets mail, an atom can receive messages from more than one source. The message tag is used to separate these messages so that the atom can pick up the messages in a specific order.
Fig. 5. A schematic of the LSMS algorithm highlighting the information that is exchanged within a nearest-neighbor LIZ.
Our algorithm is similar to a molecular dynamics algorithm in which two integer lists are constructed for each atom. One list contains the node numbers of the atoms that are to receive information from this atom. The second list contains the message tags. These tags are numbered using the row block index of the matrix. A row block consists of several rows of a matrix. This numbering scheme is necessary because each row block of an atoms scattering path operator describes a part of the total scattering process associated with the atom sending the message (see Fig. 5). The algorithm is computationally fast and memory-efficient, requiring the storage of only two single-site transfer matrices at a time, one for the local atoms matrix and the other for the sending atoms matrix. Once it has been used in the construction of a row block of the atoms scattering path operator, its memory location is reused for the next incoming message.
As previously mentioned, the most time-consuming part of this method is taking the inverse of a matrix. Even though this is a linear method, the time goes up drastically for increasing cluster size, m, because the method involves inverting a matrix whose size is proportional to m. The scattering path operator is a double-precision, dense, complex, nonsymmetric matrix. Because the matrix is nonsymmetric, there is no symmetry that can be exploited to speed up the calculation of the inverse.
A major contribution made by ORNLs applied mathematicians was to develop a hybrid method for solving for the inverse of the scattering path operator. A key feature of this method is that only a small portion of the inverse is needed. Using direct methods such as lower upper (LU) triangular-based methods to obtain this part of the inverse still results in a J(m3). So instead, an iterative method known as the quasi-minimal residual method (QMR) was employed. This method can be used for both symmetric and nonsymmetric problems, and equally important, it is extremely efficient in its use of computer memory. Iterative methods typically work well for sparse matrices (matrices containing lots of zeros) but not so well on dense matrices (matrices containing very few zeros). Normally, this is true because the full inverse is usually being calculated. However, because we are interested only in a small portion of the inverse, an iterative approach might just work, and indeed it works extremely well. There are certain situations where it is known not to work as well as a direct method. Therefore, the final algorithm is a hybrid one that switches between the QMR and a direct method, providing maximum computational efficiency.
Recently, materials theorists at ORNL have formulated a new version of the LSMS method. This new approach results in a sparse scattering path matrix. Again, the applied mathematicians are working with the application scientists on providing a direct sparse solver. This solver takes advantage of the sparsity pattern to develop a computationally fast algorithm that uses significantly less computer memory than the previous version of the LSMS computer code.
Simulating Materials Properties by New Computer Methods
Accurately simulating particular materials properties such as magnetic interactions may require more computational resources than can be provided by a single large MPP machine, requiring instead several large machines that are networked together. These machines can consist of several different architectures and can reside on different networks. This type of computing is referred to as heterogeneous computing. Efficient use of these resources requires fault tolerance. Fault tolerance provides the user with the ability to restart an application after a machine failure, provided either that the remaining computational resources can supply the necessary capabilities or that additional computational resources become available to run the application.
Currently, the ORNL team is involved in a networking project with DOEs Sandia National Laboratories (SNL) to link together the two largest computers in the world to solve an important scientific problem. The two machines that are networked together are the 1024-node Intel Paragon XP/S 150 at ORNLs Center for Computational Sciences and the 1800-node Intel Paragon at SNL. The LSMS computer code is being used for this project to study the magnetic structure of technologically important magnetic alloys. This project requires a strong collaboration with computer scientists. A key objective is to demonstrate the effectiveness of linking together large-scale MPP machines to solve important scientific and technological problems. The major obstacle to overcome is the time it takes to communicate information between machines.
The parallel virtual machine (PVM) software library developed in ORNLs Computer Science and Mathematics Division is used to handle the communication between the machines. PVM supports heterogeneous computing and dynamic configuration, permitting a user to add or delete computational resources dynamically during a PVM session. Also, PVM takes advantage of high-performance network interfaces such as the asynchronous transmission mode (ATM), which currently provides the lowest latency and highest bandwidths and, thereby, the maximum message-passing performance.
ORNL and Others Recognized for Metacomputing Approach
At the Supercomputing 96 conference held in November 1996 in Pittsburgh, Pennsylvania, a team of researchers from ORNL, Sandia National Laboratories, and the Pittsburgh Supercomputing Center (PSC) received a gold medal for performing a complex pacesetting calculation on a high-performance, multisite, multicomputer system linked by high-speed networks. Their metacomputing approach connected ORNLs Paragon supercomputers with Sandias Paragon and Pittsburghs Cray T3D supercomputers using high-speed networks provided by ESnet and vBNS.
The first code being implemented is a first-principles materials science electronic structure method developed at ORNL. In this project the code is being used to model the magnetic properties of metallic alloys. The algorithm decomposes the atoms of the material of interest across the nodes of a massively parallel computer by assigning each atom in the structure to an individual processor of the parallel machine. In this way, very complex models consisting of large numbers of atoms can be simulated.
The purpose of this simulation was to investigate the underlying magnetic interactions that give rise to the complex magnetic structure observed in neutron-scattering experiments performed on disordered copper-nickel alloys. These neutron-scattering data, obtained by ORNL scientists at the High Flux Isotope Reactor, have existed for over two decades without a satisfactory intepretation. Previous simulations performed on the ORNL XP/S-150 Paragon that have involved a few hundred atoms had accurately described the structure in the neutron-scattering experiments, attributing it to relatively short-range interactions between magnetic atoms.
The purpose of the computer experiment demonstrated at Pittsburgh was to use the power of the metacomputer to simulate the thousands of atoms required to study the nature of the long-range interactions, thereby providing a more complete understanding of the neutron-scattering data. The three-lab team used the high-speed network for a 1000+ processor (atom) run between Paragons at Oak Ridge and Sandia. The code was also used to test the connection between ORNLs machines and PSCs Cray. The results demonstrate that tying geographically distributed heterogeneous supercomputers together for a single application is a viable way to produce otherwise unobtainable scientific results. This distributed approach is expected to be immensely beneficial to the national research community.
The three laboratories metacomputing approach—which will ultimately use all of ORNLs Paragons, most of Sandias Paragon, and all of PSCs Cray, for a total of nearly 3000 processors—received the gold medal award in the concurrency category of the High-Performance Computing Challenge competition at the Supercomputing 96 conference. The award, which was the only gold medal given in this competition, recognized the recipients for demonstrating excellence in large-scale heterogeneous supercomputing. ORNL researchers from the Center for Computational Sciences and the Metals and Ceramics, Computer Science and Mathematics, and Computing, Information, and Networking divisions participated in the effort.
ORNL computer scientists, led by Al Geist, are also developing a parallel application development system called CUMULVS (see Kohls article High-Performance Computing: Innovative Assistant to Science). CUMULVS supports fault tolerance, interactive visualization, and computational steering so that parameters of a simulation can be viewed and changed by participants during the simulation to steer it toward a desired outcome. Because it is based on PVM, it is heterogeneous and supports the use of high-speed network interfaces. CUMULVS supports fault tolerance through user-directed checkpointing (saving) of data and heterogeneous task migration. That is, the user can specify the essential data that must be saved to restart the application. CUMULVS writes these data out on each machine so that all machines have a coherent view of the data. CUMULVS can restart the application either on the remaining computational resources or by dynamically adding new computational resources via PVM and using the checkpointed data as the input. The interactive visualization feature supports multiple viewers connected to a running application. In this way, scientists collaborating on the same problem can view the data at the same time on their own machines at their own respective institutions.
We are currently incorporating CUMULVS into the LSMS code because performing some of our scientific investigations requires a large number of computer resources obtainable only by linking together multiple platforms. The possibility of a machine failure increases with the number of machines networked together. Fault tolerance assures us that a failure within the environment is recoverable, allowing our simulation to continue. In addition, the ability to view a simulation provides us with a powerful tool to monitor it; if something goes wrong, we can stop the job, minimizing the loss of computer cycles.
Modeling Importance to Magnetic Alloys
To demonstrate the wide applicability of the LSMS method, we have applied it to the investigation of two magnetism problems affecting materials that contain both chemical and magnetic disorder. Chemical disorder is substitutional disorder in which an atomic site in the crystal is randomly occupied by atoms making up the system. Because we are assuming complete random disorder, the probability of occupation of an atomic site by an atom of a particular element is equal to that elements concentration in the alloy. Consider a copper-nickel alloy, for example. For collinear magnetic disorder (where the moments point either parallel or antiparallel to the spin axis of quantization) the size (magnitude) of the moments are randomly distributed on the magnetic atomic sites (i.e., nickel sites are magnetic whereas copper sites are not). For noncollinear magnetism, both the size (magnitude) and the directions are randomly distributed on the magnetic atomic sites. A major advantage in using the LSMS method is that it naturally accounts for any local fluctuations. Specifically, LSMS naturally incorporates the effects of short-range order on the energetics as a result of either chemical or magnetic disorder. An example of short-range correlation is short-range order of a particular atomic site with its surrounding neighbors. If a random disordered alloy is truly random, then there is no correlation between a site and its neighbors; in the case of the nickel-copper alloy, the nickel site is not affected by whether its neighboring atoms are nickel or copper. In many alloys, short-range order is extremely important and must be accounted for in the model if one hopes to calculate a materials properties accurately. LSMS incorporates short-range order in the construction of the system by using short-range order parameters, for example, to construct a crystal lattice with atomic sites occupied by atoms with appropriate neighboring atoms. Traditional band structure methods either cannot or can only approximately treat these types of effects.
Magnetic materials represent a multibillion-dollar industry. A fundamental understanding of magnetism in alloys has the potential to influence the design of magnetic materials for applications ranging from power generation to data storage. Magnetism has a profound effect on many alloy properties such as phase stability, thermal expansion, and electrical conductivity.
Magnetism is a consequence of electron spin. In metals the same electrons that give rise to cohesion (the energy that holds the crystal together) can, if they are reasonably well localized about atomic sites (e.g., d-electrons in cobalt, nickel, and iron), also give rise to magnetism. Magnetism occurs when it is energetically favorable on the atomic sites to have an excess of electrons of one spin; this spin imbalance gives rise to magnetic moments associated with individual atoms. In a ferromagnet the local magnetic moments point in the same direction (collinear parallel), resulting in a macroscopic magnetic moment (there are more electrons with spins that point parallel to the spin axis of quantization than electrons whose spins point antiparallel); in an antiferromagnet, an equal number of moments point up and down (collinear antiparallel) in an ordered arrangement, resulting in no net macroscopic magnetic moment.
In our first investigation, we have been studying the nature and effect of magnetic inhomogeneities in nickel-copper (Ni-Cu) alloys. This investigation uses the standard approach of assuming that the electron spin points either parallel or antiparallel to the spin axis of quantization (we assume that the z-axis is the spin axis of quantization in spin space). The second investigation involves the use of a new theory concerning noncollinear magnetism. In noncollinear magnetism the electron spin in the global frame of reference (the laboratory frame) is not restricted to any direction, but in its local frame of reference (on the atom), it points along the traditional spin axis of quantization (z-axis). As will be seen, this extra degree of freedom allows for a very rich magnetic structure that has not previously been simulated using an ab initio approach.
Our second investigation is applied to the disordered nickel-iron (Fe) Invar alloy, Ni.35Fe.65. Invar alloys are interesting for both scientific and technological reasons. In 1920 the Nobel Prize in physics was awarded to the Swiss-born French scientist Charles E. Guillaume for discovering this unique magnetic alloy. Invar alloys exhibit a negligible coefficient of thermal expansion (hence INVAR-iable), called the Invar effect. These alloys are used in many industries that need a material that does not expand in a particular temperature range. Such a material is needed for shadow masks for televisions and computer monitors, the surrounding tubing for fiberoptic cables, and high-precision laboratory equipment.
The purpose of our investigation of the Ni.80Cu.20 alloy is to uncover the nature of the magnetic correlations found in neutron-scattering experiments performed more than two decades ago by Joe Cable and colleagues at ORNLs High Flux Isotope Reactor. The onset of magnetism and the nature of the ferromagnetic state when nickel is alloyed with nonmagnetic copper to form a weakly ferromagnetic disordered Ni-Cu alloy have been matters of scientific interest. The results of the LSMS calculation for a ferromagnetic disordered Ni.80Cu.20 alloy show that the local magnetic moments associated with individual sites are inhomogeneously distributed.
Fig. 6. A schematic of the 256-atom Ni-Cu unit cell. Fig. 7. Comparison of the experimental neutron-scattering cross sections with those obtained from the LSMS method. Fig. 8. A schematic of the 256-atom Ni-Fe spin canting unit cell.
In the LSMS calculation the random alloy is modeled by randomly occupying the sites of the 256-atom unit cell by atoms of nickel and copper (shown as large blue spheres and small red spheres, respectively, in Fig. 6). In the illustration, the arrows emanating from the nickel sites represent the magnitudes of the calculated local magnetic moments. (The magnetic moments associated with copper sites, which are antiparallel to the nickel moments, are too small to be seen.) The magnitudes of the nickel moments are encoded both in the length and in the color of the arrows. The local nickel-site magnetic moment varies from a minimum of approximately 0.29 Bohr magnetons (blue arrows) to a maximum of approximately 0.6 Bohr magnetons (red arrows). Interestingly, the magnetic moment on a nickel site correlates with the total magnetic moment on the nearest-neighbor shell of atoms surrounding it: large red arrows tend to be surrounded by other reddish arrows, while small blue arrows are surrounded by either copper sites having no moment or other blue arrows. So far, our calculations show excellent agreement with the measured neutron-scattering cross sections and also provide an atom-by-atom picture of magnetism (see Fig. 7). From this it can be seen that the magnetic moments are higher for nickel sites whose neighbors are nickel atoms having strong magnetic moments than for nickel sites surrounded by either nickel atoms having weak magnetic moments or by virtually nonmagnetic copper atoms. The results of these simulations are being used to reinterpret results of neutron-scattering measurements of magnetic correlations in these alloys.
This past year, 1996, marked the hundredth anniversary of Guillaumes discovery of Invar. Despite the venerability of the subject, the mechanisms responsible for the Invar effect, though thought to be related to Invars complex magnetic behavior, are still deeply mysterious. Our preliminary investigations of the ground state (T = 0 K) magnetic structure of large-cell (256-atom) models of disordered Ni.35Fe.65 alloys (the classic Invar system) are pointing to unusual magnetic behavior involving noncollinear arrangements of the local magnetic moments associated with iron-rich clusters within the otherwise disordered alloy (see Fig. 8). In particular, there is a nickel atom whose nearest neighbors are all nickel atoms and whose local moment, peculiarly enough, is exactly antiparallel to its surrounding nickel moments (the local magnetic structure is antiferromagnetic). If these magnetic structures remain stable, they will have important implications for our understanding of magnetism in this material, implications that could be investigated experimentally using neutron diffraction.
Our interdisciplinary team made several other important contributions to computational modeling of materials, in addition to the major ones covered here. The strong interaction among various ORNL scientists is a key to the success of this large-scale scientific project which can play an important role in sustaining our economic competitiveness.
BILL SHELTON is a research staff member in the Mathematical Sciences Section of ORNLs Computer Science and Mathematics Division. He holds a Ph.D. degree in theoretical condensed matter physics from the University of Cincinnati. He also held a National Research Council (NRC) fellowship at the Naval Research Laboratory, working with Dr. Warren Pickett. In 1992 he joined the ORNL staff. His research interests include high-performance heterogeneous scientific computing, the development of scalable first-principles electronic structure algorithms, and the development of applied mathematical techniques and numerical algorithms to scientific problems, software tools, and parallel algorithms. He has published over 60 papers in materials science, high-performance computing, and parallel processing. He has also won several high-performance computing awards, including the Gordon Bell Prize, the Cray Research Award, and several High-Performance Computing Challenge Awards.
MALCOLM STOCKS is a Lockheed Martin Corporate Fellow in the Theory Group of ORNLs Metals and Ceramics Division. He holds a Ph.D. degree in condensed matter physics from Sheffield University. Prior to joining ORNL in 1976, he was a lecturer at the University of Bristol, U.K., from 1972 to 1976. His research interests include materials science, high-performance heterogeneous scientific computing, and the development of scalable first-principles electronic structure algorithms and parallel algorithms. He has published over 200 papers in materials science, high-performance computing, and parallel processing. He has also won several high-performance computing awards, including the Gordon Bell Prize, the Cray Research Award, and a High-Performance Computing Challenge Award.
Next article | Contents | Search | Mail | Review Home Page | ORNL Home Page