Live Breaking News and Articles From the Pharmaceutical Industry

Pharmaceutical News on Ulitzer

Subscribe to Pharmaceutical News on Ulitzer: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Pharmaceutical News on Ulitzer: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Pharmaceuticals Authors: Jolie Conahan, Jason Bloomberg

Related Topics: Java EE Journal, Pharmaceutical News

J2EE Journal: Article

Distributed Parallel Computing with Web Services

A pivotal role on the back end

Web services technology has become the ubiquitous connectivity fabric amongst diverse business domains and technical camps. At the same time, distributed parallel computing is becoming the de facto architecture for managing the performance of computationally intensive, long-running programs.

So, is it counterintuitive to consider Web services when pursuing performance improvement of compute-intense, long-running applications? It may seem that way but, most amazingly, Web services play a critical role not in one but in two areas of High Performance Computing (HPC) and distributed parallel computing:

  • Communications/deployment
  • Classifications/discovery services of resources
In other words, Web services play a role in the application adaptation and infrastructure layer, respectively. Sensibly enough, Web services deliver again on the promise of semantic and syntactic universal collaboration and wide acceptance in yet another coming-of-age technology.

This article looks at how the Web services scenario is unfolding in the distributed parallel computing space. First it discusses the developing infrastructure standards, followed by some definitions, and then delves into the real grid opportunity: to improve application response time while workload balancing. I'll conclude with a recent project implementation.

Grid Provides the Infrastructure for Parallel Distributed Computing
The Globus project and its proposed Open Grid Services Architecture (OGSA) specification describe how Web services can facilitate the creation, life-cycle management, and security requirements for reliable and industrial strength Grid Services Architecture (GSA). However it only seems to address the infrastructure and resources associated with the grid. So what about the application, you might ask. After all, the grid is only as valuable as the applications you run on it.

The most common approach to enabling an application for a grid environment is to decompose the application into smaller, independent subtasks or job "chunks," submit the jobs to the grid (schedule and deploy), and hope for the best. This paradigm provides a good solution for batch cycle applications and is well suited for coarse-grain parallelizable applications.

But what about the standards, providing best practices, support, and tools for distributing more complex applications, sometimes referred to as "fine-grain parallelizable."

Furthermore, many have argued that such services are orthogonal to what OGSA addresses. Yes, I agree that the specification is about the infrastructure, but only for now. Remember the debate about middleware, EAI, application servers, and the J2EE stack? I foresee that the same line of arguments regarding distributed parallel computing will soon be coming to a grid near you. In fact the two technologies - J2EE and grid - have very similar characteristics, including services, specifications, best practices patterns, and supporting technology middleware, runtime containers, and desirable quality of services. And in both cases Web services play a key role. Perhaps the overriding similarity is n-tiered distributed computing.

Parallel Distributed Computing and the Grid
First, let's talk about grid and distributed parallel computing, or are these the same? Some say it's a matter of definition again. This is especially true for uncharted technology territories: there seem to be as many definitions as opinions. However, some commonly used terms and concepts have been emerging. This is not surprising as parallel computing has been around for quite a while, for example as parallel computing execution on symmetric multiprocessing (SMP) machines or massively parallel computers. Programming to leverage these configurations required executing multiple parallel threads and sharing common memory - fast memory, that is. Silicon Graphics and a number of mid-range to high-range Sun and AIX multiprocessing machines are a fine example of parallel computing machines - the next best thing to Cray supercomputers!

These SMP configurations provide horizontal scaling by adding CPUs (typically up to 64). You add more processors until you can't add any more. But what happens when your application isn't completing fast enough and you need a 65th CPU? The only choices that don't require reprogramming are to buy a system with more powerful CPUs or a system that can support more CPUs. If this doesn't help the outlook is either chunking and distributing or bleak.

So the real distinction between distributed parallel processing and parallel processing is the access to the data that would have been in the shared memory in the parallel configuration. This could require a minor change and some data selection and movement logic being added, or it could be a major consideration because the movement of the data among the processors executing the instances of the application could take more time that the additional CPUs reduced. Typically, distributed parallel computing involves dozens or hundreds of computers, the grid or compute farm, and concurrently running components of an application.

I saved for last the notion of parallel running of an application. In a nutshell parallel computing exploits concurrency of execution, so no arguments here: it is parallel concurrently computing as opposed to serial computing. There is only one thing missing from the alphabet soup, the main character: the application.

Applications: Making Fast Faster
High Performance Computing is about doing more with less or making programs that run fast run faster! Consider an HPC application currently running as fast as possible using parallel computing (multiple system resident CPUs). Moving this application to a multiple system configuration typically requires reengineering the program. This translates to diverting developer focus away from new projects, and may require recertifying the application and hiring engineers conversant in distributed computing and data considerations. But most of the programs were designed with sequential flows in mind and the programmers (those who are mere mortals) typically think in terms of sequential flows and are most effective in designing, writing, and debugging sequential programs, albeit some are fluent with multitasking techniques at times. So where is the magic?

So the options to move programs from sequential to SMP machines are:

  • Use special parallel compilers that leverage the multiple CPUs which at best offer less than optimal improvements without major tuning
  • Use an advanced language that supports threads and write multithreaded programs. It's certainly not a piece of cake to develop, not to mention what happens to the application when "the thread guru" moves on to bigger better things or goes on in pursuing alternative life styles.
Developers and organizations recognize that the ever increasing business volumes will drive their already loaded SMP-based applications beyond the affordable expansion point, and are looking at ways to avoid this problem. They are looking at ways to distribute the individual program components using less expensive systems as part of cluster and grid strategy.

So the question remains: how do you improve the response time of an application by distributing it to a compute farm? If the application has a high-volume, sub-second response, transactional characteristic we are in luck. WebLogic and BEA's robust clustering technology can effectively distribute a high transactional application to a cluster. But there is a large class of application legacy of some sort, such as finance quant apps, engineering numerical analysis solvers, life science genome applications, or computational statistics which require processing large amounts of data and are compute- intense. These still need a solution.

The Compute-Intensive Application
To discuss the distributed parallelization options, it is helpful to further classify applications into stages based on the best practices parallel distributed computing design patterns they fit in. I've called them Stage I and Stage II variants of distributed applications.

Stage I distributed application candidates can be described as "same instructions small different data" design patterns. Consider the search for extra terrestrial [email protected] project or the search of large Mersenne prime numbers - both items arguably on the far right of compute-intensive. In both cases the same application needs to be executed again and again using a small amount of different data; hence the name "same instructions different data." The only requirement to distribute the calculation is to use a job scheduler that distributes the calculation segments across computers. Sounds like a classic mainframe batch scheduler could do the job. No wonder IBM calls grid computing the next big thing - they have been providing the underlying services for years. From an application programming point of view, the grid is a vast number or transactions and jobs distributed by a scheduler.

These applications have one entry point and one exit point, and can have their work computed in parallel without sequential or data dependencies on the results of each. An application with these characteristics is referred to as "embarrassingly parallel."

A variant of the embarrassingly parallel application is a job or application that for historical reasons is being run as a sequential string of steps. These can easily be decomposed into smaller jobs and fall under the Stage I distribution umbrella. So all you need to distribute Stage I applications is to identify them, chunk them into parallel executable steps, and get a scheduler that deploys the "chunks" or steps. Several vendors, including Platform and Sun, provide such scheduling services.

While there are many applications that fit into the Stage I model, these applications have input data and usually intermediate result interdependencies. These are classified as Stage II distribution applications. You will recognize these as having the sources of their resource consumption embedded in loops with complex data dependency requirements. In other words, you can't just chunk the application and expect it to run faster with the same results

A few different algorithmic patterns immediately come to mind as Stage II distribution application candidates: serial nondependant, master-slave, binary non-recombining tree, simplex optimization algorithms, and so on.

Consider a simple binary search algorithm for finding the maximum from a sequence of numbers used in a sorting problem. Although it's a simple, commonly used algorithm, it exhibits the structural characteristics of Stage II distribution applications. You can devise a simple Web service that receives two numbers and returns back the maximum. A simple control master program could generate slave/worker services executing the comparisons on a compute farm, and returning the final result back to the parent/root node of the tree. The commercial tools for building such solutions include low-level parallel middleware (e.g., MPI and PVM), and a few higher-level paradigms that provide virtual shared memory or shared processes paradigms such as Java Spaces (a technology transfer of Linda Spaces), GemFire from Gemstone, and GigaSpaces. These offer generic supporting middleware services, with which the programmer must be conversant in order to enable the distribution communications and data transfer.

A recent entry that approaches such challenges differently and focuses on the application rather than on the infrastructure is ACCELERANT from ASPEED Software. It offers the developer a high-level algorithmic and computational pattern-aware interface that can be inserted in existing or new applications. This approach masks the application developer from having to deal with the middleware and distributed expertise while providing the resultant application with the use of the required runtime services to optimally manage the application execution across all instantiations of the execution.

An even more challenging variant of Stage II applications is one characterized as "impossible" to parallelize. Examples of this variant are step-wise iterative algorithms, where each step requires the computation of the previous one. A simple example is the common summation technique for adding a sequence of numbers. You iterate through the sequence of numbers and at each loop you add the next number to a tally. It turns out that even these Stage II variant algorithms can be recast to be handled like the easier Stage II algorithms, e.g. a binary tree master-slave algorithm can be applied to the above summation problem. This greatly simplifies the programming effort but obviously requires reverification since the algorithm has been altered. Other advanced techniques such as genetic algorithms are also available but their discussion does not belong to Web Services Journal - not until there is a Web services solution for them!

Now let's move on to how Web services facilitate grid enabling of a complex compute-intense application and harnessing the computing power of HPC center for fast pricing portfolio bond options. The Callable Bonds Portfolio pricing was selected because it is a representative class of particularly complex Monte Carlo simulations that yield greater accuracy.

Now is as good a time as ever to shed light on the celebrated Monte Carlo techniques. First, it's not just a jargon to confuse the unwary. It's a real scientific tool. Monte Carlo techniques are counterintuitive in the sense that they use probabilities and random numbers to solve some very concrete real-life problems. Buffon's Needle is one of the oldest problems in geometrical probability tackled with Monte Carlo. A needle is thrown on a lined sheet with a distance between the lines that is the same as the length of the needle. Doing the experiment many times computes the number  (pi) of the circle, with great accuracy I must say. You can design a Web service that executes ranges of millions of throws on a compute farm. The master control program (server-side Web service) aggregates the experiments and serves you back the value of pi!

The Use Case Requirements
So let's talk about the use case designed and implemented at a buy-side financial services boutique. The business problem was to price a portfolio of callable bonds using Monte Carlo (MC) techniques. Think of a bond as a series of cash flows. You pay a price to buy it at issue day. Then every so often, say six months, it pays a coupon back. At maturity, the bond pays back its valued principal. Pricing a bond means to access its value at any given time. One popular way of pricing is to run complex computational statistical techniques called Monte Carlo simulations. Callable bonds have the added complexity that can be, well?called at any day before maturity. In order to anticipate the value of a hypothetical call, you must run additional "what if" scenarios. This happens to be one of the easier examples of a huge set of analysis, modeling, pricing, and risk assessments being used and in need of distribution in order to run within very stringent time constraints at an affordable price.

The company had existing C++ legacy code implementing the pricing model. Some of the bond portfolio calculations could take over an hour on the existing hardware. The new business requirement was to make the response time less than 30 seconds. Given the needs and the current implementation something had to give, and adding expensive cycles and reengineering the application was very risky - pardon the pun. The front GUI presented yet another challenge. The trading desk uses a new Java- based front-end trading system, but the sales desk primarily uses Excel spreadsheets for pricing for simplicity and easy of use.

Web Services for Robust Shared Business Services
After assessing the business objectives and the technical constraints, the desirable architecture was in place (see Figure 1). A strategic decision was made to outsource the IT infrastructure to a commercial-strength High Performance center. By lowering the cost of ownership and increasing availability, the client was able to harness on-demand computing using state-of-the-art equipment. With the appropriate SLA in place, incremental scalability turned out to be predictable and affordable.

A Web service interface provided the single API to the pricing engine and facilitated the Shared Business Service for the two LOBs. Furthermore, it provided an elegant solution to the technology interoperability gap. An unforseen benefit was the ability for the salesman to execute remotely from his laptop the Web service at a third- party office or at the coffee shop nearby (see Figure 2).

The server side of the Web service is the master/control program starting the computational/slave units on grid configuration using ASPEED's ACCELLERANT on-demand application servers and HP center's middleware fabric. Each slave component encapsulates the computation aspect of the portfolio pricing, the C++ legacy code (see Figure 3).

Every time the Web service is called, a number of slave calculations are fired on the grid. ACCELLERANT's on-demand server provides quality of services such as dynamic load balancing, fail-over, and managed optimal response time.

Building the Client Web Service
BEA's 8.1 Platform technology provided the ideal environment for building the client Web service call.

1.  The first step was to create a Web service control. This was achieved simply by pointing at the published Web service URL followed by ?WSDL.:


2.  The WSDL file received was saved at a local project directory. Figure 4 shows the input definition, a segment of the WSDL file.

3.  Browse to the project and directory where the saved WSDL is located.

4.  Right-click the WSDL file and select Generate JCX from WSDL. The resulting JCX file is a Web service control, which can be used from the Java client trading system.

Figure 4 shows a simple <XML> input file that is sent to the Web service.

In this article, I demonstrated how Web Services play a critical role in two fundamental areas of distributed parallel computing: infrastructure middleware and parallelization. I then defined the three Stages of application distribution: Stage I, embarrassingly parallel and two variants of Stage II, complex interdependent; and "impossible" to parallelize. I concluded with a case study of parallelizing a Stage II application using BEA's 8.1 platform and ACCELERENT from ASPEED.

The computational grid and distributed parallel computing deliver substantial performance improvement today. While the standards are still evolving, practitioners design and implement missing-critical applications, doing more at a faster rate in diverse commercial areas, and enjoy-great competitive advantage. Financial services professionals can execute complex financial models and provide exotic products to their clients for higher profits while they meet more stringent regulatory risk requirements and improve the bottom line through more efficient capital allocation. Pharmaceutical companies speed up preclinical and early clinical trials by a factor of five or more and gain FDA approvals faster. Manufacturing uses fluid dynamics, executing on powerful compute farms and connecting designers via Web services, to deliver faster simulation and to shorten new product life cycle while delivering better, cheaper, stronger products.

Web services play a pivotal role not only in the infrastructure back-end space, but also closer to the "final mile." I predict in the next 18 to 24 months, as the product stack matures and bandwidth increases, Web services, dynamic business process choreography, and informal on-demand networks will be able to tap the idle power of powerful compute farms, or even commuters' sleepy laptops, and deliver content on pervasive devices like never before. But remember, the grid is only as profitable as the applications you run on it.

Until then, get the grids crunching.

More Stories By Labro Dimitriou

Labro Dimitriou is a BPMS subject matter expert and grid computing advisor. He has been in the field of distributed computing, applied mathemtics, and operations research for over 20 years, and has developed commercial software for trading, engineering, and geoscience. Labro has spent the last five years designing BPM-based business solutions.

Comments (3)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.