This article aims to provide an introduction to PVM and MPI, two widely used software systems for implementing parallel message passing programs. They enable us to use a group of heterogeneous UNIX/LINUX computers connected by a network as a single machine for solving a large problem.
Parallel processing is a form of computing in which a number of activities are carried out concurrently so that the effective time required to solve the problem is reduced. In the previous days, parallel processing was used for such thing as large scale simulations (e.g. molecular simulations, simulation of the explosion of an atomic bomb etc), solving large number crunching and data processing problems (e.g. compiling the census data) etc. However, as the cost of hardware is decreasing rapidly, parallel processing is being uses more and more in routine tasks. Multiple processor servers have been in existence for a long time. Parallel processing is also used in your own PC too. For example, a graphics processor working along with the main processor to render graphics on your monitor is also a form of parallel processing.
However, apart from the hardware facilities for parallel processing, some software support too is required so that we can run the programs in parallel and coordinate their execution. Such a coordination is necessary due to the dependencies of the parallel programs on one other. This will become clearer when we work through an example. The most widely used method to achieve such coordination is message passing in which the programs coordinate their execution and in general communicate with each other by passing message's to one other. So, for example, a program may tell another program, ``Ok! Here is the intermediate result you need to proceed.'' If all this sounds too abstract, lets proceed with a very simple example.
In this section, we will consider a very simple problem and consider how we
can use parallel processing to speed up its execution. The problem is to find
the sum of a list of integers stored in an array. Let us say that there are
100 integers stored in an array say items
. Now, how do we parallelize
this program? That is, we must first find out a way in which this problem can
be solved by a number of programs working concurrently. Many a times, due to
data dependencies, parallelization becomes a difficult problem. For
example, if you want to evaluate (a + b) * c, which involves two
operations, we cannot do them concurrently, the addition must be done before
the multiplication. Fortunately, for the problem that we have chosen,
parallelization is easy. Suppose that 4 program or processors will be working
simultaneously to solve the addition problem. Then the simplest strategy would
be to break the array items
into 4 parts and have each program process
one part. Thus the parallelization of the problem is as follows:
items[0]
to
items[24]
. Similarly, P1 will find the sum of items[25]
to items[49]
, P2 items[50]
to items[74]
and
P3 items[75]
to items[99]
.items
are not known to the programs
P0 to P3 and hence some program must tell these programs the values
of the elements. Thus, apart from P0 to P3, we will require one more program
that distributes data, collects results and coordinates execution.
We call such a program as master and the programs P0 to P3 as
slaves and this organization as the master - slave paradigm.With this organization in mind, let us write the algorithms for the master and the slave programs.
/* Algorithm for the master program */ initialize the array `items'. /* send data to the slaves */ for i = 0 to 3 Send items[25*i] to items[25*(i+1)-1] to slave Pi end for /* collect the results from the slaves */ for i = 0 to 3 Receive the result from slave Pi in result[i] end for /* calculate the final result */ sum = 0 for i = 0 to 3 sum = sum + result[i] end for print sum
/* Algorithm for the slave program */ Receive 25 elements from the master in some array say `items' /* calculate intermediate result */ sum = 0 for i = 0 to 24 sum = sum + items[i] end for send `sum' as the intermediate result to the master
Now that the basic algorithm has been designed, let us now consider how we can implement it. What hardware shall we run this program on? Clearly, very few of us have access to special machines designed to run parallel programs. However, no special hardware requirements are there in order to implement this program. A single computer or a group of interconnected computers will do, thanks to PVM, a software system that enables us to use interconnected computers for parallel program execution. PVM stands for Parallel Virtual Machine. It enables you to create number of programs or processes that run concurrently on same or different machines and provided functions with which you can pass messages among the processes for coordination. Even if you have a single computer, PVM will work on it, although there will be no ``real'' parallel processing as such. However, for learning purpose, that should be fine. Later on I will describe how to do ``real'' parallel processing using the PVM.
In order to use the PVM system, you need to install the PVM software on your
Linux system. In case you are using Red Hat Linux, then the RPM package for
PVM is included on the CD, so that you can install it as you normally install
other packages. Assuming that you have installed PVM system on your machine,
create the following directories(s) in your home directory:
~/pvm3/bin/LINUX/
. Why ? Because PVM requires that some of the
executables you create be copied in this directory. Once you have done this,
your setup is ready. Test this by giving the command pvm
on the
prompt. This will start the PVM Console from which you can give
commands to the PVM system and query status information. If everything is set
OK, you will see the pvm>
prompt. Here enter the command conf
.
The output should look something like this.
pvm> conf conf 1 host, 1 data format HOST DTID ARCH SPEED DSIG joshicomp 40000 LINUX 1000 0x00408841
What does this mean? The PVM System allows you to consider a group of
interconnected LINUX system to be viewed as a ``virtual'' computer having much
higher computing capacity than the individual machines. Thus, PVM will
distribute the processes among a number of computers. However, by default, PVM
considers that only the host that you are working on is to be included in the
PVM machine, i.e. all processes you create will be scheduled to run on the
same host. The conf
command shows what hosts or nodes are in the
PVM. Currently, there is only one. Later on, we will see how to add more hosts.
Presently, exit the PVM console by giving the command halt
Now that you are ensured that the PVM system has been properly installed,
let us see how to write the programs. Programs for the PVM system can be
written in both FORTRAN and C. We will be using the C language. To use the
PVM system, you include some calls to the PVM functions in your C program
along with the other statements and link the PVM library with your programs.
To get you started with PVM, let us write a simple program in which there
will be a master and a slave. The master will send the slave some string,
which the slave will convert to upper case and send back to the master. The
master and the slave programs are given as follows. To compile the programs,
give the command make -f makefile.demo
.
[Click here for a tar file containing the program listings.]
1 /* -------------------------------------------------------------------- * 2 * master_pvm.c * 3 * * 4 * This is the master program for the simple PVM demonstration. * 5 * -------------------------------------------------------------------- */ 6 #include <stdio.h> 7 #include <stdlib.h> 8 #include <pvm3.h> /* declares PVM constants and functions */ 9 #include <string.h> 10 int main() 11 { 12 int mytid; /* our task ID */ 13 int slave_tid; /* task ID of the slave */ 14 int result; 15 char message[] = "hello pvm"; 16 17 /* enroll ourselves into the PVM system and get our ID */ 18 mytid = pvm_mytid(); 19 /* spawn the slave */ 20 result = pvm_spawn("slave_pvm", (char**)0, PvmTaskDefault, 21 "", 1, &slave_tid); 22 /* check if the slave was spawned successfully */ 23 if(result != 1) 24 { 25 fprintf(stderr, "Error: Cannot spawn slave.\n"); 26 /* clean up and exit from the PVM system */ 27 pvm_exit(); 28 exit(EXIT_FAILURE); 29 } 30 /* initialize the data buffer to send data to slave */ 31 pvm_initsend(PvmDataDefault); 32 /* ``pack'' the string into the data buffer */ 33 pvm_pkstr(message); 34 /* send the string to the slave with a message tag of 0 */ 35 pvm_send(slave_tid, 0); 36 /* wait and receive the result string from the slave */ 37 pvm_recv(slave_tid, 0); 38 39 /* ``unpack'' the result from the slave */ 40 pvm_upkstr(message); 41 /* show the result from the slave */ 42 printf("Data from the slave : %s\n", message); 43 /* clean up and exit from the PVM system */ 44 pvm_exit(); 45 46 exit(EXIT_SUCCESS); 47 } /* end main() */ 48 /* end master_pvm.c */
1 /* -------------------------------------------------------------------- * 2 * slave_pvm.c * 3 * * 4 * This is the slave program for the simple PVM demonstration * 5 * -------------------------------------------------------------------- */ 6 #include <stdio.h> 7 #include <ctype.h> 8 #include <stdlib.h> 9 #include <pvm3.h> 10 #define MSG_LEN 20 11 void convert_to_upper(char*); 12 int main() 13 { 14 int mytid; 15 int parent_tid; 16 char message[MSG_LEN]; 17 /* enroll ourselves into the PVM system */ 18 mytid = pvm_mytid(); 19 /* get the task ID of the master */ 20 parent_tid = pvm_parent(); 21 /* receive the original string from master */ 22 pvm_recv(parent_tid, 0); 23 pvm_upkstr(message); 24 /* convert the string to upper case */ 25 convert_to_upper(message); 26 /* send the converted string to the master */ 27 pvm_initsend(PvmDataDefault); 28 pvm_pkstr(message); 29 pvm_send(parent_tid, 0); 30 /* clean up and exit from the PVM system */ 31 pvm_exit(); 32 33 exit(EXIT_SUCCESS); 34 } /* end main() */ 35 /* function to convert the given string into upper case */ 36 void convert_to_upper(char* str) 37 { 38 while(*str != '\0') 39 { 40 *str = toupper(*str); 41 str++; 42 } 43 } /* end convert_to_upper() */ 44 /* end slave_pvm.c */
1 # Make file for the demo PVM program 2 .SILENT : 3 # paths fro PVM include files and libraries 4 INCDIR=-I/usr/share/pvm3/include 5 LIBDIR=-L/usr/share/pvm3/lib/LINUX 6 # link the PVM library 7 LIBS=-lpvm3 8 CFLAGS=-Wall 9 CC=gcc 10 TARGET=all 11 # this is where the PVM executables go 12 PVM_HOME=$(HOME)/pvm3/bin/LINUX 13 all : $(PVM_HOME)/master_pvm $(PVM_HOME)/slave_pvm 14 $(PVM_HOME)/master_pvm : master_pvm.c 15 $(CC) -o $(PVM_HOME)/master_pvm master_pvm.c $(CFLAGS) $(LIBS) \ 16 $(INCDIR) $(LIBDIR) 17 $(PVM_HOME)/slave_pvm : slave_pvm.c 18 $(CC) -o $(PVM_HOME)/slave_pvm slave_pvm.c $(CFLAGS) $(LIBS) \ 19 $(INCDIR) $(LIBDIR)
Once your programs have been
compiled, you must copy them into the ~/pvm3/bin/LINUX
directory.
(The makefile does it by default). Now to run the programs, you must first
start the PVM system. To do this give the command pvm
to start the
PVM Console. Now at the pvm>
prompt, type quit
. The output
will be as follows:
pvm> quit quit Console: exit handler called pvmd still running.Notice the last line, indicating that the PVM daemon (
pvmd
) is still
running. To run the PVM programs, you need to run the PVM daemon which manages
the exchange of messages and that what we are doing here. Once the PVM daemon
is running, you can run the program by the following commands:
[rahul@joshicomp rahul]$ cd ~/pvm3/bin/LINUX/ [rahul@joshicomp LINUX]$ ./master_pvm Data from the slave : HELLO PVM [rahul@joshicomp LINUX]$
Notice that the string is now in upper case as expected.
In this section, we will see exactly how this program works. First of all
to use PVM function, you need to include a header file pvm3.h
in
your programs. This is done in line 8 of master_pvm.c
and in
line 9 of slave_pvm.c
. Also when compiling the programs, you need
to link it with the PVM library. This is done by specifying the -lpvm3
option to the compiler, as done in line 7 of makefile.demo
. Also, you
need to specify to the compiler the paths of header and library files, as
is done on lines 4 and 5 of the makefile.
In the master program, we first get the task ID of the master by
calling the PVM function pvm_mytid()
. The
PVM system assigns each process a unique 32 bit integer called as its task
ID in the same way as Linux assigns each process a process ID. The task
ID helps us identify the process with which we need to communicate. However,
the master does not uses its task ID (stored in mytid
) ever. Our
intention here is just to call the function pvm_mytid()
. This
function enrolls the process into the PVM system and generates a unique task
ID for the process. If we do not explicitly enroll the process, PVM
automatically enrolls our process on the first call to any PVM function. Next
we use pvm_spawn()
to create the slave process. The first parameter,
"slave_pvm"
is the name of the executable for the slave. The second
parameter is the arguments that you wish to the pass to the slaves (similar to
argv
in normal C). Since we do not want to send any arguments, we
set this value to 0. The third parameter is a flag with which we can control
how and where PVM starts the slave. Since we have only a single machine, we
set this flag to PvmTaskDefault
, specifying PVM to use default
criteria while spawning the slave. The fourth parameter is the name of the
host or the architecture on which we wish to run the program and here it is
kept empty. It is used to specify the host or the architecture when the flag
is other than PvmTaskDefault
.The fifth parameter specifies the number
of slaves to spawn and the sixth parameter is a pointer to an array in which
the IDs of the slaves will be returned. This function returns the number of
slaves actually spawned which we check for correctness.
A message in PVM consists of basically two parts, the data and a tag
that identifies the type of the message. The tag helps us distinguish between
different messages. For example, in the addition example, which we are going
to implement, suppose that you are expecting that each slave will send to the
master an integer which is the sum of the elements it added. It is also
quite possible that some slave may encounter some error and may want to send
the master an integer which indicates the error code. How does the master
distinguish whether an integer it received from the slave is an intermediate
result or an error code? This is where tags come in picture. You may assign
the message for intermediate result a tag say MSG_RESULT
which you
will #define
in some header file and a tag say MSG_ERROR
for
the message indicating error. The master will then look at the message tags
to decide whether the message contains intermediate result or error.
To send a message, you first need to ``initialize'' the send buffer. This is
done by calling the pvm_initsend()
function. The parameter to
this function specifies the ``encoding'' scheme to be used. When we want to
exchange data between machines with different architectures (like say between
a Pentium machine and a SPARC Workstation) then we need to encode the data at
the sending end and decode at the receiving end so that data is properly
delivered. The parameter to pvm_initsend()
specifies the encoding
scheme to be used. The value PvmDataDefault
specifies an encoding
scheme which enables data to be safely exchanged between heterogeneous
architectures. Once the buffer has been initialized, we need to put data into
the buffer and encode it. In our case, the data is a string, so we use the
function pvm_pkstr()
to ``pack'' i.e. encode and put the data into
the buffer. If we had to send an integer, there is a different function
pvm_pkint()
. Similarly, there are functions for other data types.
Once the data is packed, we call pvm_send()
to send the message.
The first argument is the ID of the process to which the message is to be sent
and the second argument is the message tag. Since there is only one type of
message here, we set the tag to 0.
Once the data is sent to the slave, the slave will process it and return it
to the master as we shall see. So we now call pvm_recv()
to receive
the data from the slave. Again, the parameters are the task ID from which
the message is expected and the tag of the expected message. If the desired
message has not yet been sent, this function waits and does not return. Thus,
in effect, the master is now waiting for the slave to process the data. Once
the message arrives, the data is still in the receive buffer. It needs to be
``unpacked'' i.e decoded to get the original message. This decoding is done
by the pvm_upkstr()
function. We then display the processes string.
Before the PVM program exits, it must tell the PVM system that it is leaving
the PVM system so that resources occupied by the process can be released. This
is done by calling the pvm_exit()
function. After that, the master
exits.
The slave program is easy to understand. First it finds the task ID of the
master (which is also its parent as the master spawned the slave) by calling
the function pvm_parent()
. It then receives the message string from
the master, converts it to uppercase and send the resulting string to the
master.
Now that you know some basics of a PVM program, let us implement the addition
algorithm we developed using PVM. There will be one master and 4 slaves. The
master will first spawn 4 slaves and send each one their part of data. The
slaves will add the data and send the results to the master. Thus, two
types of messages are exchanged, one when the master send data to slaves, for
which we will use the tag MSG_DATA
and the other when the slaves
send results to master, for which we will use the tag MSG_RESULT
.
The rest is simple. The master and the slave programs are given below.
1 /* -------------------------------------------------------------------- * 2 * common.h * 3 * * 4 * This header file defines some common constants. * 5 * -------------------------------------------------------------------- */ 6 #ifndef COMMON_H 7 #define COMMON_H 8 #define NUM_SLAVES 4 /* number of slaves */ 9 #define SIZE 100 /* size of total data */ 10 #define DATA_SIZE (SIZE/NUM_SLAVES) /* size for each slave */ 11 #endif 12 /* end common.h */
1 /* -------------------------------------------------------------------- * 2 * tags.h * 3 * * 4 * This header file defines the tags that will be used for messages. * 5 * -------------------------------------------------------------------- */ 6 #ifndef TAGS_H 7 #define TAGS_H 8 #define MSG_DATA 101 /* data from master to slave */ 9 #define MSG_RESULT 102 /* result from slave to master */ 10 #endif 11 /* end tags.h */
1 /* -------------------------------------------------------------------- * 2 * master_add.c * 3 * * 4 * Master program for adding the elements of an array by using PVM * 5 * -------------------------------------------------------------------- */ 6 #include <stdio.h> 7 #include <stdlib.h> 8 #include <pvm3.h> /* PVM constants and declarations */ 9 #include "tags.h" /* tags for messages */ 10 #include "common.h" /* common constants */ 11 int get_slave_no(int*, int); 12 int main() 13 { 14 int mytid; 15 int slaves[NUM_SLAVES]; /* array to store the task IDs of slaves */ 16 int items[SIZE]; /* data to be processes */ 17 int result, i, sum; 18 int results[NUM_SLAVES]; /* results from the slaves */ 19 /* enroll into the PVM system */ 20 mytid = pvm_mytid(); 21 /* initialize the array `items' */ 22 for(i = 0; i < SIZE; i++) 23 items[i] = i; 24 /* spawn the slaves */ 25 result = pvm_spawn("slave_add", (char**)0, PvmTaskDefault, 26 "", NUM_SLAVES, slaves); 27 /* check if proper number of slaves are spawned */ 28 if(result != NUM_SLAVES) 29 { 30 fprintf(stderr, "Error: Cannot spawn slaves.\n"); 31 pvm_exit(); 32 exit(EXIT_FAILURE); 33 } 34 /* distribute the data among the slaves */ 35 for(i = 0; i < NUM_SLAVES; i++) 36 { 37 pvm_initsend(PvmDataDefault); 38 pvm_pkint(items + i*DATA_SIZE, DATA_SIZE, 1); 39 pvm_send(slaves[i], MSG_DATA); 40 } 41 /* receive the results from the slaves */ 42 for(i = 0; i < NUM_SLAVES; i++) 43 { 44 int bufid, bytes, type, source; 45 int slave_no; 46 47 /* receive message from any of the slaves */ 48 bufid = pvm_recv(-1, MSG_RESULT); 49 /* get information about the message */ 50 pvm_bufinfo(bufid, &bytes, &type, &source); 51 52 /* get the slave number that sent the message */ 53 slave_no = get_slave_no(slaves, source); 54 /* unpack the results at appropriate position */ 55 pvm_upkint(results + slave_no, 1, 1); 56 } 57 /* find the final result */ 58 sum = 0; 59 for(i = 0; i < NUM_SLAVES; i++) 60 sum += results[i]; 61 printf("The sum is %d\n", sum); 62 /* clean up and exit from the PVM system */ 63 pvm_exit(); 64 exit(EXIT_SUCCESS); 65 } /* end main() */ 66 67 /* function to return the slave number of a slave given its task ID */ 68 int get_slave_no(int* slaves, int task_id) 69 { 70 int i; 71 for(i = 0; i < NUM_SLAVES; i++) 72 if(slaves[i] == task_id) 73 return i; 74 return -1; 75 } /* end get_slave_no() */ 76 /* end master_add.c */
1 /* -------------------------------------------------------------------- * 2 * slave_add.c * 3 * * 4 * Slave program for adding elements of an array using PVM * 5 * -------------------------------------------------------------------- */ 6 #include <stdlib.h> 7 #include <pvm3.h> 8 #include "tags.h" 9 #include "common.h" 10 int main() 11 { 12 int mytid, parent_tid; 13 int items[DATA_SIZE]; /* data sent by the master */ 14 int sum, i; 15 16 /* enroll into the PVM system */ 17 mytid = pvm_mytid(); 18 /* get the task ID of the master */ 19 parent_tid = pvm_parent(); 20 /* receive the data from the master */ 21 pvm_recv(parent_tid, MSG_DATA); 22 pvm_upkint(items, DATA_SIZE, 1); 23 /* find the sum of the elements */ 24 sum = 0; 25 for(i = 0; i < DATA_SIZE; i++) 26 sum = sum + items[i]; 27 /* send the result to the master */ 28 pvm_initsend(PvmDataDefault); 29 pvm_pkint(&sum, 1, 1); 30 pvm_send(parent_tid, MSG_RESULT); 31 /* clean up and exit from PVM */ 32 pvm_exit(); 33 34 exit(EXIT_SUCCESS); 35 } /* end main() */
1 # Make file for the PVM program for addition - makefile.add 2 .SILENT : 3 # paths fro PVM include files and libraries 4 INCDIR=-I/usr/share/pvm3/include 5 LIBDIR=-L/usr/share/pvm3/lib/LINUX 6 # link the PVM library 7 LIBS=-lpvm3 8 CFLAGS=-Wall 9 CC=gcc 10 TARGET=all 11 # this is where the PVM executables go 12 PVM_HOME=$(HOME)/pvm3/bin/LINUX 13 all : $(PVM_HOME)/master_add $(PVM_HOME)/slave_add 14 $(PVM_HOME)/master_add : master_add.c common.h tags.h 15 $(CC) -o $(PVM_HOME)/master_add master_add.c $(CFLAGS) $(LIBS) \ 16 $(INCDIR) $(LIBDIR) 17 18 $(PVM_HOME)/slave_add : slave_add.c common.h tags.h 19 $(CC) -o $(PVM_HOME)/slave_add slave_add.c $(CFLAGS) $(LIBS) \ 20 $(INCDIR) $(LIBDIR)
Let us consider the slave program first, because it is simple. The slave
receives the 25 array elements from the master in the array items
,
finds their sum and sends the result to the master with the message tag
as MSG_RESULT
. Now consider the master. We define an array
slaves
of size NUM_SLAVES
which will store the task ID's of
the slaves spawned by the parent. There is another array results
in
which the results from the slaves are stored. The master first initializes
the array items
and then spawns the slaves. After that it distributes
the data among the slaves. In the call to pvm_pkint()
on line 38,
the first parameter is the pointer to the array in which the integers are
stored, the second is the number of integers to pack and the third is the
``stride.'' Stride means how many elements to skip when packing. When it is 1,
consecutive elements are packed. When it is 2, PVM will skip 2 elements when
packing with the result that all even numbered elements (0, 2, 4 ...) will
be packed. Here we keep its value as 1.
Once the data has been distributed among the slaves, the master has to wait
till the slaves return the intermediate results. One possibility when
accepting the results is that the master will first collect the result from
slave 0 (i.e slave whose task ID is stored in slave[0]
), then from
slave 1 and so on. However, this may not be an efficient approach. For example,
it may be that slave 0 is working on a slower machine than slaves 1, 2 and 3.
In that case, since the master is waiting from slave 0, the results from
slaves 1, 2 and 3 are yet to be collected even though the calculations are
completed. In this case it may be fine, but consider the situation in which
the slave, when finished doing one job is given another job. In that case, we
would like to give a slave its next job immediately after it has completed its
current job. Thus, the master must be in a position to respond messages from
any of the slaves. This is what is being done here.
In the call to pvm_recv()
on line 48, we know that the first
parameter is the task ID of the message source. If this value is kept -1, it
signifies a wild card i.e. messages from any process with message tag
MSG_RESULT
will be received by the master. The received message
along with some control information is stored in a buffer called as active
receive buffer. The call returns a unique ID for this buffer. Now, we
want to know who is the sender of the message so that we can assign the message
data to the appropriate element of the array results
. The function
pvm_bufinfo()
returns information about the message in the buffer,
such as the message tag, the number of bytes and the senders task ID. Once we
have the senders task ID, we set the appropriate element of the results
array to the integer sent by the slave. The rest of the program should be
easy to understand.
In case you are interested, you can think of some problems for which you can
write parallel programs. Many a times, due to bugs etc., you may need to clean
up the state of the things before starting. The PVM Console provides with
the command halt
that kills the PVM daemon. Then all the PVM processes
will halt or you can halt them with the Linux kill
command. In case
you have a network of Linux machines interconnected by say a LAN, then you
can also do ``real'' parallel processing. First of all, install PVM on all the
hosts you wish to use and then use the add
command in the PVM Console
to add hosts to the virtual machine. Then PVM will schedule some of the
processes to run on these hosts, so that real parallel processing is achieved.
We have seen in the previous section the implementation of the addition program using the PVM. Now let us consider another approach that can be used in developing parallel programs. This approach is using the MPI library. MPI stands for Message Passing Interface. It is a standard developed to enable us to write portable message passing applications. It provides functions for exchanging messages and many other activities as well. It must be noted that unlike PVM which is a software system, MPI is a standard, so that many implementations of the MPI standard exist. We will use an implementation of MPI called LAM which stands for Local Area Multicomputer. It is also available on the Red Hat Linux CD as an RPM package, so installation may not be a problem.
After you have installed the RPM package, go to the /usr/boot
directory and create a file named conf.lam
and type in a single line
in it: lamd $inet_topo
. The same directory will also have a file
named bhost.def
else create it and type in a single line in it:
localhost
. Now to test whether everything is working correctly,
type at the prompt, lamboot
. You will get the following response:
[rahul@joshicomp boot]$ lamboot LAM 6.3.1/MPI 2 C++/ROMIO - University of Notre Dame [rahul@joshicomp boot]$
If the output indicates an error, then there is some problem with the installation, either follow the above steps or see the lamboot(1) manual page for troubleshooting.
Assuming that LAM/MPI is properly installed on your system, let us again write a small demonstration program for MPI.
We will again write a simple master - slave program in which we are supposed to evaluate the expression (a + b) * (c - d). The master will read the values of a, b, c, and d from the user and one slave will calculate (a + b) and the other one will calculate (c - d). The program is as follows.
1 /* -------------------------------------------------------------------- * 2 * mpi_demo.c * 3 * * 4 * A simple MPI demonstration program to evaluate an expression. * 5 * -------------------------------------------------------------------- */ 6 #include <stdio.h> 7 #include <stdlib.h> 8 #include <lam/mpi.h> /* for MPI constants and functions */ 9 #define MSG_DATA 100 /* message from master to slaves */ 10 #define MSG_RESULT 101 /* message from slave to master */ 11 #define MASTER 0 /* rank of master */ 12 #define SLAVE_1 1 /* rank of first slave */ 13 #define SLAVE_2 2 /* rank of second slave */ 14 /* functions to handle the tasks of master, and the two slaves */ 15 void master(void); 16 void slave_1(void); 17 void slave_2(void); 18 int main(int argc, char** argv) 19 { 20 int myrank, size; 21 22 /* initialize the MPI system */ 23 MPI_Init(&argc, &argv); 24 /* get the size of the communicator i.e. number of processes */ 25 MPI_Comm_size(MPI_COMM_WORLD, &size); 26 /* check for proper number of processes */ 27 if(size != 3) 28 { 29 fprintf(stderr, "Error: Three copies of the program should be run.\n"); 30 MPI_Finalize(); 31 exit(EXIT_FAILURE); 32 } 33 34 /* get the rank of the process */ 35 MPI_Comm_rank(MPI_COMM_WORLD, &myrank); 36 /* perform the tasks according to the rank */ 37 if(myrank == MASTER) 38 master(); 39 else if(myrank == SLAVE_1) 40 slave_1(); 41 else 42 slave_2(); 43 /* clean up and exit from the MPI system */ 44 MPI_Finalize(); 45 exit(EXIT_SUCCESS); 46 } /* end main() */ 47 /* function to carry out the masters tasks */ 48 void master(void) 49 { 50 int a, b, c, d; 51 int buf[2]; 52 int result1, result2; 53 MPI_Status status; 54 printf("Enter the values of a, b, c, and d: "); 55 scanf("%d %d %d %d", &a, &b, &c, &d); 56 /* send a and b to the first slave */ 57 buf[0] = a; 58 buf[1] = b; 59 MPI_Send(buf, 2, MPI_INT, SLAVE_1, MSG_DATA, MPI_COMM_WORLD); 60 /* send c and d to the secons slave */ 61 buf[0] = c; 62 buf[1] = d; 63 MPI_Send(buf, 2, MPI_INT, SLAVE_2, MSG_DATA, MPI_COMM_WORLD); 64 /* receive results from the slaves */ 65 MPI_Recv(&result1, 1, MPI_INT, SLAVE_1, MSG_RESULT, 66 MPI_COMM_WORLD, &status); 67 MPI_Recv(&result2, 1, MPI_INT, SLAVE_2, MSG_RESULT, 68 MPI_COMM_WORLD, &status); 69 /* final result */ 70 printf("Value of (a + b) * (c - d) is %d\n", result1 * result2); 71 } /* end master() */ 72 /* function to carry out the tasks of the first slave */ 73 void slave_1(void) 74 { 75 int buf[2]; 76 int result; 77 MPI_Status status; 78 79 /* receive the two values from the master */ 80 MPI_Recv(buf, 2, MPI_INT, MASTER, MSG_DATA, MPI_COMM_WORLD, &status); 81 82 /* find a + b */ 83 result = buf[0] + buf[1]; 84 /* send result to the master */ 85 MPI_Send(&result, 1, MPI_INT, MASTER, MSG_RESULT, MPI_COMM_WORLD); 86 } /* end slave_1() */ 87 /* function to carry out the tasks of the second slave */ 88 void slave_2(void) 89 { 90 int buf[2]; 91 int result; 92 MPI_Status status; 93 94 /* receive the two values from the master */ 95 MPI_Recv(buf, 2, MPI_INT, MASTER, MSG_DATA, MPI_COMM_WORLD, &status); 96 97 /* find c - d */ 98 result = buf[0] - buf[1]; 99 /* send result to master */ 100 MPI_Send(&result, 1, MPI_INT, MASTER, MSG_RESULT, MPI_COMM_WORLD); 101 } /* end slave_2() */ 102 /* end mpi_demo.c */
1 # Makefile for MPI demo program - makefile.mpidemo 2 .SILENT: 3 CFLAGS=-I/usr/include/lam -L/usr/lib/lam 4 CC=mpicc 5 mpi_demo : mpi_demo.c 6 $(CC) $(CFLAGS) mpi_demo.c -o mpi_demo
To compile this program, give the command make -f makefile.mpidemo
.
Once you have compiled the program, to run the program you first need to
``start'' or ``boot'' the Local Area Multicomputer system. This is done with
the lamboot
command. After that, to run the program by giving the
following command: mpirun -np 3 mpi_demo
.
[rahul@joshicomp parallel]$ lamboot LAM 6.3.1/MPI 2 C++/ROMIO - University of Notre Dame [rahul@joshicomp parallel]$ mpirun -np 3 mpi_demo Enter the values of a, b, c, and d: 1 2 3 4 Value of (a + b) * (c - d) is -3 [rahul@joshicomp parallel]$
To use the MPI system and functions, you first need to include the header
file mpi.h
as is done in line 8. In case of PVM, different processes
are identified with their task ID's. In case of MPI, the MPI system assigns
each process a unique integer called as its rank beginning with 0.
The rank is used to identify a process and communicate with it. Secondly,
each process is a member of some communicator. A communicator can
be thought of as a group of processes that may exchange messages with each
other. By default, every process is a member of the communicator called
MPI_COMM_WORLD
. Although we can create new communicators, this leads
to an unnecessary increase in complexity, so we suffice ourselves by using the
MPI_COMM_WORLD
communicator.
Any MPI program must first call the MPI_Init()
function. This function
is used by the process to enter the MPI system and also do any specific
initialization required by the system. Next, we get the size of the
MPI_COMM_WORLD
communicator i.e. the number of processes in it using the MPI_Comm_size()
function. The
first parameter is the communicator and the second is a pointer to an integer
in which the size will be returned. Here, we need exactly 3 processes, one
master and two slaves. After that, we get the rank by calling
MPI_Comm_rank()
. The three processes will have ranks 0, 1 and 2. All
these processes are essentially identical i.e. there is no inherent
master - slave relationship between them. So it is up to us to decide who will
be the master and who will be the slaves. We choose rank 0 as master and ranks
1 and 2 as slaves. It can also be seen that we have included the code for both
the master and the two slaves in the same program. Depending upon the rank,
we choose to execute the appropriate function. Note that there is no spawning
of processes as in PVM, and as we shall see, we choose to decide the number
of process to be spawned from a command line argument rather than the program
spawning slaves. Once the execution is finished,
we must call the MPI_Finalize()
function to perform final clean up.
Let us now consider the master function. After reading the values of a, b, c,
and d from the user, the master must send a and b to slave 1 and c and d to
slave 2. Instead of sending the variables individually, we choose to pack them
up in an array and send the array of 2 integers instead. It is always better
to pack up the data you want to send into a single message rather than to send
a number of messages for individual data items, this saves the communication
overhead involved in passing the messages. Once the buffer is ready, unlike PVM,
we do not need to pack or encode the data, MPI will manage these details
internally. So we can directly call the MPI_Send()
function to send
the data. The first parameter (line 59) is the address of the buffer, the
second one the number of elements in the message, the third is a specification
of the data type of the buffer, which here is MPI_INT
specifying that
the buffer is an array of integers. Next comes the rank of the process to which
we want to send the message. Here it is SLAVE_1
(#defined as 1).
Next is the message tag similar to that in case of PVM. Final parameter
is the communicator of which the receiver is a member, which in this case, is
MPI_COMM_WORLD
.
Once the data is distributed among the slaves, the master must wait for the
slaves to send the results. For simplicity, we first collect the message from
the slave 1 and then from slave 2. To receive a message, we use the
MPI_Recv()
function. Again, packing and decoding is handled by MPI
internally. The first argument (line 65) is the address of the buffer in which
to receive the data. The second is the size of the buffer in terms of the
number of elements, which in this case is 1. Next is the data type, which is
MPI_INT
here. Next three parameters specify the rank of the source of
the message, the tag of the expected message and the communicator of which the
source is the member. The final argument is a pointer to a structure of type
MPI_Status
in which some status information will be returned (however,
we ignore this information). Now that you know about the basic MPI terms,
the slave_1()
and slave_2()
functions should be clear.
In this program, the code for the master as well as the slaves was in the same
executable file. Later on we will see how we can execute multiple executables.
From the makefile, we see that to compile the MPI program, a wrapper program
mpicc
is provided which links the required libraries automatically.
To run the program, use the mpirun -np 3 mpi_demo
command after
booting the LAM. Here we specify LAM to create 3 processes, one master and two
slaves.
Let us now re implement the addition program that we designed before using MPI. Here we will also show you how to execute separate programs in MPI. When we use a single executable in the MPI program, we call it Single Program Multiple Data (SPMD) application. When two or more executables are involved, we call it Multiple Program Multiple Data (MPMD) application. With LAM, MPMD programs are executed with the help of an application schema. But before that, let us see the source of the master and the slave programs.
1 /* -------------------------------------------------------------------- * 2 * master_mpi.c * 3 * * 4 * Master program for adding the elements of an array using MPI * 5 * -------------------------------------------------------------------- */ 6 #include <stdio.h> 7 #include <stdlib.h> 8 #include <lam/mpi.h> /* MPI constants and functions */ 9 #include "tags.h" /* tags for different messages */ 10 #include "common.h" /* common constants */ 11 int main(int argc, char** argv) 12 { 13 int size, i, sum; 14 int items[SIZE]; 15 int results[NUM_SLAVES]; 16 MPI_Status status; 17 /* initlalize the MPI System */ 18 MPI_Init(&argc, &argv); 19 /* check for proper number of processes */ 20 MPI_Comm_size(MPI_COMM_WORLD, &size); 21 if(size != 5) 22 { 23 fprintf(stderr, "Error: Need exactly five processes.\n"); 24 MPI_Finalize(); 25 exit(EXIT_FAILURE); 26 } 27 /* initialize the `items' array */ 28 for(i = 0; i < SIZE; i++) 29 items[i] = i; 30 /* distribute the data among the slaves */ 31 for(i = 0; i < NUM_SLAVES; i++) 32 MPI_Send(items + i*DATA_SIZE, DATA_SIZE, MPI_INT, i + 1, 33 MSG_DATA, MPI_COMM_WORLD); 34 /* collect the results from the slaves */ 35 for(i = 0; i < NUM_SLAVES; i++) 36 { 37 int result; 38 39 MPI_Recv(&result, 1, MPI_INT, MPI_ANY_SOURCE, MSG_RESULT, 40 MPI_COMM_WORLD, &status); 41 results[status.MPI_SOURCE - 1] = result; 42 } 43 /* find the final answer */ 44 sum = 0; 45 for(i = 0; i < NUM_SLAVES; i++) 46 sum = sum + results[i]; 47 printf("The sum is %d\n", sum); 48 /* clean up and exit the MPI system */ 49 MPI_Finalize(); 50 exit(EXIT_SUCCESS); 51 } /* and main() */ 52 /* end master_mpi.c */
1 /* -------------------------------------------------------------------- * 2 * slave_mpi.c * 3 * * 4 * Slave program for adding array elements using MPI. * 5 * -------------------------------------------------------------------- */ 6 #include <stdio.h> 7 #include <stdlib.h> 8 #include <lam/mpi.h> /* MPI functions and constants */ 9 #include "tags.h" /* message tags */ 10 #include "common.h" /* common constants */ 11 #define MASTER 0 /* rank of the master */ 12 int main(int argc, char** argv) 13 { 14 int items[DATA_SIZE]; 15 int size, sum, i; 16 MPI_Status status; 17 /* initialize the MPI system */ 18 MPI_Init(&argc, &argv); 19 /* check for proper number of processes */ 20 MPI_Comm_size(MPI_COMM_WORLD, &size); 21 if(size != 5) 22 { 23 fprintf(stderr, "Error: Need exactly five processes.\n"); 24 MPI_Finalize(); 25 exit(EXIT_FAILURE); 26 } 27 /* receive data from the master */ 28 MPI_Recv(items, DATA_SIZE, MPI_INT, MASTER, MSG_DATA, 29 MPI_COMM_WORLD, &status); 30 /* find the sum */ 31 sum = 0; 32 for(i = 0; i < DATA_SIZE; i++) 33 sum = sum + items[i]; 34 /* send the result to the master */ 35 MPI_Send(&sum, 1, MPI_INT, MASTER, MSG_RESULT, MPI_COMM_WORLD); 36 /* clean up and exit MPI system */ 37 MPI_Finalize(); 38 exit(EXIT_SUCCESS); 39 } /* end main() */ 40 /* end slave_mpi.c */
1 # Makefile for MPI addition program - makefile.mpiadd 2 .SILENT: 3 CFLAGS=-I/usr/include/lam -L/usr/lib/lam 4 CC=mpicc 5 all : master_mpi slave_mpi 6 master_mpi : master_mpi.c common.h tags.h 7 $(CC) $(CFLAGS) master_mpi.c -o master_mpi 8 slave_mpi : slave_mpi.c common.h tags.h 9 $(CC) $(CFLAGS) slave_mpi.c -o slave_mpi
To compile the programs, type make -f makefile.mpiadd
. (The
files common.h
and tags.h
are the same as used for the PVM program.)
This will create the master_mpi
and slave_mpi
executables. Now how do
we tell MPI to run both these executables. This is where application
schema file comes in. The application schema file specifies the executables
to be run, the nodes on which to run and the number of copies of the executable
to run. Create a new file add.schema
and type in it the following
lines:
# Application schema for the addition program using MPI n0 master_mpi n0 -np 4 slave_mpi
This file specifies that MPI should start 1 copy of the master (which will have rank 0) and 4 copies of slaves on the node n0, i.e. the local node. You can specify many more parameters in this schema file like command line arguments etc., see the manual page appschema(1). Once the schema file is ready, you can run the programs as follows:
[rahul@joshicomp parallel]$ lamboot LAM 6.3.1/MPI 2 C++/ROMIO - University of Notre Dame [rahul@joshicomp parallel]$ mpirun add.schema The sum is 4950 [rahul@joshicomp parallel]$
Much of the program should be easy to understand. On line 39, when receiving
intermediate results from the slaves, we specify the source as
MPI_ANY_SOURCE
, since we want to respond to slaves in the order in
which they complete the calculations, as discussed earlier. In this case, the
status
structure contains the actual source in the field
MPI_SOURCE
. We use this information to set the appropriate element from
the results
array to the intermediate result received.
In case you have a network of interconnected computers, you can make programs run on many computers by suitably modifying the application schema file. Instead of specifying n0 as the host, specify the name of the host and the number of processes you wish to schedule on that host. For more information about this, see the manual pages and the references.
We have seen how to write parallel programs using the PVM and MPI libraries. Since there libraries are available on many platforms and these are the defacto standards used for implementing parallel programs, programs written with PVM or MPI will run with little or no modification on large scale machines, if the need arises. What we have basically concentrated on in this article is the point to point communication functions provides by these libraries and their use in message passing. Apart from these facilities, both PVM and MPI provide a number of advanced features such as collective communication (broadcasting or multicasting), process groups and group management, reduction functions etc. You are welcome to explore these advanced features. These public domain softwares enable us to use a network of computers as a single large computer, so in case you have some such large problem to solve, you may consider using a network at your college or office. You will have to refer to the books given below for the exact details of how such a setup may be established. Many tutorials as well as books are available to help you. Below is a list of the material I referred.
I would like to thank my project guide Dr. Uday Khedker for his encouragement and help. I would like to thank the Center for Developement of Advanced Computing for allowing me to run the MPI and PVM programs on the PARAM Supercomputer and Dr. Anabarsu for guiding me during the implementation.