Scientific Computing in Java (Part 2): Writing Scientific Programs in Javaby Ken Ritley
WE LIVE IN A TECHNOLOGICAL World, At The Heart of Which Arei Scientists and engineers. The next proportant discoveries and bring the next generation of technology to market.
Part 1 of this article discussed how scientists can benefit from Java. We've said that despite a few pitfalls which scientists should watch out for, the future of Java as a scientific programming language looks bright. Here in Part II we examine the structure of a scientific program more closely. We'll define a few scientific OOP design patterns in Java, and we'll give you a short style guide that can help scientists write good Java programs (please see the sidebar "A style Guide for Scientific Programs in Java ").
A Style Guide for Scientific Programs in Java The Golden Rule of Programming states that computer programs should be understandable by the people who write and maintain them. But what's good programming practice for business is not necessarily what's good for science. Here are some ideas scientists can use for making Java programs more readable and easier to maintain Translating Formulas to Source Code 1. use short variable names like e, m and c, to mirror the variables which appear in the original equations Clearly e = m * c * c..; is much easier to compare with the original equation and debug than energy = mass * speedOfLight * speedOfLight ;. 2. Hungarian notation in Java can be helpful It's not necessary to precede every variable with a letter which indicates its type:. d = double, I = Integer, etc. But Because Java Convention is to Start Variable Names with lowercase letters, sometimes hungarian notation can be be more aesthetically please, especially for loop control variables; for example Inumb erOfElectrons vs. numberOfElectrons. It also helps make clear when a seemingly integer-like variable has been converted to a different type, as in double dNumberOfElectrons = (double) iNumberOfElectrons ;. 3. Use descriptive variable names for controlling loops and all other variables. Remember when looking through source code, loops are what the eye sees first It's important that the physical meaning of the loop be clear For example, // Eq (5) Loop over all electronsfor (i = 0;... i E [i] = m [i] * c * c; 4. When Coding an equation Which Appears in a Journal Article, Always Cite Both the article and the equation number. For example, at the beginning of the method you May Type / / Einstein, Ann. Phys. 17, 891-921 (1905) Later on in the code, it Only Takes a few extra seconds to type // EQ. (5) .Frequently Overlooked, this Simple Rule Will Save ENORMOUS TIME AND GRIEF , both for you and for someone trying to understand your program. 5. Break up complicated equations to enhance their readability, but make your intermediate, temporary variable names clear. The variable names numer and denom make it instantly clear that you've broken the Formula Up Into a Temporary Numerator and Denominator: // EQ. (5) Loop over all electronsnumer = 1.0 - v * v / c / c; DENOM = 1.0 - v / c; vp = Math.sqr (numer / denom); Java-Specific Advice 1. It's easier to share numerical methods than numerical classes or packages Especially if you are new to Java, it may be fun to implement a Java-intensive, object-. oriented solution to a purely numerical problem -... something involving a package with inherited classes and Java-specific features such as Hashtables for truly massive projects or scientific libraries, this may be an ideal solution Your source code may be elegant and efficient But for future scientists who wish to borrow only small pieces of your numerical ideas, it may be a nightmare to dissect object-oriented numerical packages. And what's the sense of importing a massive library when only a single method may be needed? Try to write portable, static methods first, then classes if necessary, and finally packages if absolutely necessary after that. 2. If dedicated classes store global variables, preface their names with Global. If dedicated classes are used as libraries of easily-shared methods, preface their names with Utils It's a clever idea because it not only reminds you of what's in the class -. it also ensures that the HTML files created by javadoc will all be grouped together Some examples might include GlobalEnergyVariables, UtilsFile. .., GlobalAntimatter or UtilsImageProcessing 3. When necessary, port Fortran source code rather than C source code to Java Java's syntax most nearly resembles C, but Java does not support pointers - and finding scientific subroutines in C without pointers is nearly impossible 4.. Interfaces Are Great For Numeric Constants. With interfaces you ' LL Have to Type in Each of Your Constants (Like Pi, Pi / 2, Pi * Pi), But Since Interfaces Are Easy to Recycle, You'll Only NEED TO TYPE THEM INCE! 5. Don't forget system.out .println and the Console. It's tempting to invoke the Java Virtual Machine with javaw, but do not forget that the standard console is a handy place to see error messages and liberally-sprinkled System.out.println comments. Since Java protects itself so well against errors which would be fatal in Fortran or C, it's important to see possible error messages when they are generated, or else you might not realize they are there! If you want to get fancy, there are even good ways to implement your own Custom Console. - KR Do not OOP? Do not Worry! Since the real world is composed of objects related in complex ways, object-oriented strategies are often the best solution to programming problems. One reason for Java's success is that it simplifies OOP development. However, many scientific problems are not amenable to full-blown object-oriented treatments. The problem may be too simple, such as evaluating an equation. Or sometimes the relationship between objects is just too complex. For example, a very simple equation describes how the moon orbits the earth, but add a third object to the problem (such as the sun or a comet or an asteroid) and the resulting equations can be so intertwined and complex that entirely different calculational techniques become necessary. nevertheless, OOP principles can be valuable for Scientific Programming, And Increasing Numbers of Scientific Programmers Are Now Learning to Think In OOP Terms. For the scientist new to java, the first lesson to learn is this: the classes area re it's at Classes:. The Heart of Scientific Programs Coming from a traditional Fortran or C background, it's easy to look at methods in Java and assume they are like functions or subroutines They're not For scientific programming, Java methods are weaker.. than their Fortran or C counterparts. A scientist expects to invoke a subroutine with a long argument list containing lots of variables, then to have those variables updated and changed upon return. In Java, variables can be passed into methods (passing by value), But at Most ONLY ONE OF THEM CAN BE CHANGED, SUCH AS VIA A Function: x = MyMethod (A, B, C, X); . Of course, groups of variables can be stored in arrays, and methods can change the contents of arrays (passing by reference) But cheating with variable-arrays is rarely an elegant approach This is not a shortcoming of Java -.. In fact, ! it's an improvement over Fortran and C The Java solution is elegant: construct a class and feed it the necessary variables a, b, and c, then provide either public variables or else public methods which return x, like this: myCalc mc = new Mycalc (a, b, c); x = mc.x; // or, alternative.. X = mc.getx (); Java ends endless parameter lists, in which it is never clear which variables stay fixed and which are changed;! And Java prohibits pointers - no comment necessary And as this example shows, the Java code for such classes is clean and concise, and above all The Programmer's INTENT IS CRYSTAL CLEAR ("Converting Scientific Subroutines To Scientific Classes"). Converting Scientific Subroutines to Scientific Classes Subroutines are the building blocks of scientific programs in Fortran and C. They're portable and easy to re-use. But they're not without problems, because it's frequently unclear which of the variables stay fixed and which . are changed and as the needs of the programmer change, and as more features become necessary, subroutines quickly become unreadable spaghetti-style code A typical Fortran subroutine might look like this:. SUBROUTINE root (a, b, c, root1, root2) c Which Input Parameters Are Changed? Real * 8 A, B, C, Root1, Root2, DISC DISC = B * B - 4. * a * c Root1 = (-b dsqr (disc)) / 2./a Root2 = (-b - dsqr (disc)) / 2./a Return end Java classes provide a better way. The input and output parameters are easy to see. Variables can be protected, to prevent inadvertly changing their values. And best of all, a finished class is a file in its own right, complete with useful HTML comments (courtesy of javadoc) - ready to be shared with colleagues and recycled in many different programs It's easy to convert this Fortran subroutine to a Java class:! / ** Solve a quadratic equation, a * x ^ 2 b * c c = 0PUBLIC CLASS QuadraticeQuationsolver { Private Double root1, root2; / ** Initialize the Parameters Public void setup (Double A, Double B, Double C) { Double Disc = B * B - 4. * a * C; Root1 = (-b math.sqrt (disc)) / 2./a; Root2 = (-b - math.sqrt (disc)) / 2./A; } / ** RETURns the " " root Public Double getposroot () {return root1;} / ** RETURNS THE "-" root Public Double getnegroot () {Return root2;} } A few extra lines of code may be necessary, but the programmer's intent is crystal clear, and useful HTML documentation is automatic. The use of the setup method ensures the class can be reused many times (for example, within a loop) without multiple instantiation . And new methods can be added (such as for mimicking Fortran functions) without changing the code which performs the calculation Further, the code can be easily modified, perhaps adding an isThereASolution method -. or possibly, by following the Java convention of returning a -1 When the desired Operations Could Not Be Performed. - KR Handling Global Variables in Java A scientific program is about numbers, not (necessarily) about interrelationships between inherited classes or objects So numerical variables -. Sometimes many dozens of them -. Must be easy to group and share between sections of the program Fortran originally provided a pre- object-oriented tool for this task, the named common statement: global variables were easy to share, but it required significant programmer overhead to ensure appropriate typing and dimensioning Java provides better ways The easiest way is simply to create a separate.. class for each collection of global variables to be grouped, and to declare these variables as static members of the class The static keyword ensures that no matter how many instances of the class exist, the variables will all share the same address space -. which means THERE IS EACH VARIABLY INE CLAS. A More Clever Approach Is Known as The Singleton Pattern, a Technique Which En . Sures that one - and only one - instance of a class can be created Each desired collection of variables can be declared in its own Singleton class Here is an example of these strategies (please see "Interfaces: Where Scientific Bakers Bake their Pi.! "). ! Interfaces: Where Scientific Bakers Bake their Pi One advantage of global variables stored in classes is that, to use these variables, their names must contain the class instance in which they appear (gt.x vs. x) - perhaps awkward at first but ultimately a much-needed bookkeeping mechanism. But for scientific constants which never change, such as pi (3.141 ...) or e (2.718 ...), it is useful to define them in an interface which can be implemented by classes which NEED THEM. IN FACT for SCIENTISTS USUALLY THIS Not Enough: They Want To Store All Variations (Such AS PI / 2, 2 * Pi, Pi * Pi, ETC.) Which might show up in forms. The Advantage of Interfaces Is That such constants need to be typed in one time only -! thereafter, it's easy to recycle the interface in many different programs Scientists Like to Think Global Java offers advantages over traditional Fortran 77 for storing and managing global variables in scientific programs By defining, dimensioning. And Storing Variables in a class, the need for repetitive declarations in each subroutine is eliminated Here's an example program, GlobalVariableTest.java, which demonstrates both static and Singleton-style global variables:! public class GlobalVariableTest {public static void main (String args []) { Globalvariabletest.init (); GlobalVariabletest.calc (); } Public static void init () { Globalsingletontest GT = GlobalSingletontest.getInstance (); gt.x = 1.0; GlobalStaticTest GST = New GlobalStaticTest (); gst.y = 2.0; System.out.println ("x, y: gt.x ", " gst.y); } Public static void cagc () { Globalsingletontest GT = GlobalSingletontest.getInstance (); GlobalStaticTest GST = New GlobalStaticTest (); System.out.println ("x, y:" gt.x "," gst.y); } Here's The Accompanying File, GlobalSingletontest.java, Defining Global Variables Using The Singleton Class: / ** Demonstrates Singleton-Style Global Variables * / Public class globalsingletontest { / ** Manages Singletonclass * / Private globalsingletontest () {} Static Private Globalsingletontest_INSTANCE; Static public globalsingletontest getInstance () { IF (_Instance == null) _INSTANCE = New globalsingletontest (); Return_INSTANCE; } // List of global variables Public Double X; } And Here's The Accompanying File, GlobalStaticTest.java, Defining The Global Variables Using Static Members: / ** Global Variables Via Static MEMBERS * / Public class globalstatictest { // List of global variables STATIC PUBLIC DOUBLE Y; } Better for Scientific Programs: Public Access not Accessor Methods In these examples, the programmer handles global variables directly (eg, gt.x = 1.0;) rather than by using methods (eg, gt.setX (1.0)) This latter approach is. ! not always optimal in a scientific program which contains dozens of variables which appear in complicated equations But scientists take note: methods are a useful programming tool to make it obvious when setting or changing important control variables - KR! A useful convention is to preface the name of such global variables classes with the word Global, as in GlobalEnergy or GlobalMaterial. This not only describes the class, but also ensures that all these classes will be grouped together in any javadoc HTML files which are produced . Design Patterns for Scientific Programs Modern programming languages are like children's Lego-type block toys, comprising myriads of small components which can build structures of great complexity. There are a few basic substructures used frequently, such as boxes to build houses or chassis to build cars. by clearly defining these structures, known as design patterns, and by recycling them in programs, the programmer can quickly assemble elegant programs of great complexity. The Singleton pattern for global variables is one such design pattern. Here we present three design patterns which Are useful for scientific programming in java. The DataTransceiver Pattern As The Sidebar on Antimatter Illustrates, Some number-crunching programs take hours or even days to run. What happens when, ten minutes before such a program is finished executing, the system crashes or the computer is accidentally switched off? Such problems are not just frustrating, they are also expensive. The Scientist's Time Costs Money, And On SuperComputers Each CPU Second May Recorded and Billed. ! When Matter and Antimatter Collide Any Star Trek fan knows what happens when matter and antimatter collide: annihilation But fortunately for us, the annihilation is not total - the only thing to annihilate are the miniscule bits of matter and antimatter which have collided The. equally-miniscule light waves emitted during annihilation can be measured easily, and they make a sensitive fingerprint of the matter that's been annihilated. Medical diagnostic techniques called PET scans are based on this process, and they help save countless lives every year. Prof. Kelvin lynn and Dr. Marc Weber are two physicists at Washington State University who work at the forefront of antimatter annihilation technology. They build complicated machines, such as the one shown above, which use precisely-controlled beams of antimatter (called positrons) to study new Materials and new electronic devices. But to interpret their results, the results of computer lations. These so-called Monte Carlo computer programs use probability theory to simulate the complicated scattering processes which occur just before annihilation, as the antimatter beam enters the material and is randomly scattered by the atoms. These calculations predict what happens by simulating and averaging thousands And Thousands of Discrete Scattering Events. Even ON The Fastest Computes, The Calculation May Take Hours OR Even Day, Depending On The Level of Accuracy That The Research NEED. Some of Dr. Lynn ' s programs use the DataTransceiver design pattern (see separate sidebar below). As the program runs, the results of the simulation are periodically written to ASCII files. This strategy allows the programs to be stopped at any time to look at the intermediate results, and to be restarted again when more accuracy is needed Computer time is not free, especially on high-speed supercomputers, so this strategy also helps save money and protect the results in case of a computer crash -.. KRThe DataTransceiver pattern is a solution to this problem. Like a radio transceiver, which both transmits and receives, the DataTransceiver pattern regularly reads and writes all the essential information for a calculation to data files. It allows the calculation to be interrupted at any time, either accidentally (such as a system Crash), or intertionally (Such as to check the interface) - and the to be restage again. and by giving a little thing stroduced, int Erspersing Them Liberally with Variable Descriptions and User-Comments, They Make A Useful Archival Record Which Stores The Calculation Results Together with The Initial Parameters Used To Generate Them. The DataTransceiver Design Pattern The Problem A numerical calculation without user interaction requires lengthy execution time, perhaps hours or even days. If the computer system crashes considerable time and resources will be lost. The user may periodically wish to interrupt the calculation, to verify the intermediate results. The Solution At periodic times the calculation is halted and such intermediate results as are required to restart the calculation are written to datafiles. Implementation Details A single class can be used to perform both the data input and the data output operations. AdvantagesBy use of a single class for input and output, the programmer is automatically reminded that any new variables introduced into the calculation during program development must be both initialized and as well as saved. by appropriately labelling the output variables in the data file, and by providing space in The Data File for User-Written Commentary, The Data File Makes A Useseful Archival record which can store the results of the calculation together with the parameters used to generate them. By creating a sequence of temporary data files, rather than a single data file which is continually overwritten, the programmer obtains a record of results during intermediate stages in the Calculation. Disadvantages Depronding on the Calculation (Eg, One Involving Dozens of Large, Multidimensional Arrays), Intermediate Data Files May Be Large and Unwieldly. Responsibilities for the Programmer Ensure the library methods are functionally independent. Try to use static methods whenever possible. Example The DataTransceiver design pattern is implemented in many scientific calculations, although scientists may not call it by this name! A good reference to some publications about programs which use this pattern is the 1993 journal article in the Journal of Applied Physics, volume 74, pages 3479-3496 (1993). The scientific programmer will often not know ahead of time how many iterations are required to obtain an accurate answer. The DataTransceiver pattern lets the scientist start and stop the calculation as often as necessary to obtain the desired accuracy -. KRThe ScientificLibrary Pattern We've said in Part 2 that numerical methods are the heart and soul of scientific programs, and that many scientific programs are nothing but well-tested Legacy Methods Patched Together In New Ways. soremes for Archiving and Maintaining These Methods. The scientific Library pattern provides a good technique. A scientific library is a class which contains numerous independent, static public methods and internal classes, with no global variables. Every scientific programmer has a favorite collection of tools which he / she reuses often, such as for solving an equation, calculating a histogram, writing data to an ASCII file, etc. A scientific library is the ideal repository for these methods Declaring the methods as static is the ultimate in programmer-friendly strategy:. the methods can be used without needing to instantiate The Class! The ScientificLibrary Design Pattern The Problem A scientist needs to have easy access to a collection of many small, functionally independent numerical methods. These may be favorite methods the scientist uses often, such as for solving an equation, making histograms or writing columns of data to ASCII files. The Solution A class is used as a repository for a "library" of recyclable, interdependent methods. Implementation Details The methods are declared as static whenever possible, so they can be used without the need for class instantiation. AdvantagesSpeeds program development by . keeping often-used methods at fingertip distance Results in cleaner source code, especially for programs which may use dozens of "library style" numerical methods Easy to modify and maintain -. new methods are simply appended to the class HTML documentation (javadoc). IS Clean and adequately describes method usage. Because Java Allows Defining Multiple Methods with Different Parameter Lists, THE LIBRARY C An Easily Contain Several Offs. Disadvantages NOT Well-Suited for Interdependent Methods. Responsibilities for the Programmer The programmer must analyze a calculation for appropriate breakpoints for writing data files. The programmer must a calculation in such a way that it can be restarted using parameters read from a data file. The programmer must ensure the DataTransceiver class has access to the appropriate variables implement . This may require the use of global variable techniques such as Singleton classes Example There are several useful ScientificLibrary classes in DataScan, a Java-based software application for data analysis in x-ray diffraction and microscopy experiments -.. KRAs with global variable classes described above, a useful convention is to name the scientific library classes with the word Utils, as in UtilsMath or UtilsFile. This not only describes the class, but also ensures that all these classes will be grouped together in any javadoc HTML files which are produced . The Nooop Pattern for Simple Scientific Programs, Such As a Calculation Requiring Little User Intertion, The Nooop P attern (pronounced nope) can be useful. It's a way to implement a Fortran-like procedural program. It may contain methods to initialize the starting data, methods to perform the calculation, and methods to output the results. This pattern is not the optimal . use of Java's powerful OOP resources, as the name cleverly suggests But traditional (non-scientific) software developers should take note:. a scientific programmer may have years of successful software design experience, though none of it object-oriented The NoOOP pattern provides A Perfect Place for a Scientist to Begin with Java. The NoOOP Design Pattern The Problem A scientist - possibly a scientist new to Java and without experience with classes and inheritance and other OOP techniques - needs to quickly write a short program, perhaps to evaluate an equation The Solution A Fortran- or C-style. procedural program can be easily developed using Java. Implementation Details A single class can contain all variables and methods to perform all required tasks. AdvantagesExtremely quick program development for simple, procedural programs. Only superficial, not detailed (esp. OOP) knowledge of Java is Required. disadvantages NOT Well-Suited for Longer Program with Many Methods or Global Variables. Difficult To Modify And Maintain. Responsibilities for the Programmer Must be prepared to completely redesign the program when the calculation outgrows the procedural framework Example This example shows a simple procedural program to find when a user-specified function is equal to zero. For this example, the function is the sine of x (sin ( x)), And useful "test" Parameters Are 0.1, 5, and 1e-7. if the program runs correctly, it will report what the function is Zero for the value 3.1415 ... Some Java Features this Program Demonstrates: Since the main method is static, it is especially useful to invoke or encapsulate the procedural program within a different method, to simplify how the methods are invoked. How to perform basic input and output from the console. Public class Procedural { // this function Just Starts The Program Public static void main (String [] args) { Procedural myproc = new procedural (); } // the main procedural program goes here Public procedural () { Double D, DStart, Dstop, DSTepsize; DStart = GETDOUBLINPUT ("Enter StartPoint:"); Dstop = GetDoubleInput ("Enter Endpoint:"); DSTepsize = getDoubleinput ("Enter Stepsize:"); For (d = dstart; d IF (MyFunction (D) * MyFunction (D DSTEPSIZE) <0) Break; } IF (D> DSTOP) System.out.Println ("No Zero Found."); Else System.out.println ("The Function is Zero WHEN X =" D); } // Example of How To Read Double Numbers from the keyboard Double getdoubleinput (string smessage) { Double D = 0.0; System.out.print (SMESSAGE); Try { D = Double.Valueof (New Java.io.DataInputStream (System.in) .readline ()). DoubleValue (); } catch (java.io.ioException e) {} Return D; } // EXAMPLE OF A User-Defined Function Double myfunction (double x) { Return math.sin (x); } } - K. R.