Parallel Programming with MPI in C or C++

$30-250 USD

En curso

Publicado

hace casi 11 años

$30-250 USD

Pagado a la entrega

In this project you are going to implement a parallel algorithm with C/C++ language usinMPI library. MapReduce Everyday we create tremendous amounts of data in our every activity. Tweeter adds 400million tweets to its database in every day. The sensors in the Large Hadron Collider at Cern records petabytes of data each year. We can find many more examples from astronomy, biology, internet activities, sensor networks, etc. This makes big-data processing, today's one of the biggest computer science and engineering problems. Distributed processing is the general approach for handling large volume data, but designing an efficient distributed system is a challenging task. There are some general distributed pro- gramming frameworks in order to simplify the implementation. One of them is the MapReduce model. There are many libraries for MapReduce, so that the programmer does not care about the distribution of the data, instead he or she supplies the necessary map and reduce functions. However, we are not going to use any library; instead, we borrow the idea from this model, and implement our solution using MPI library in C or C++. In this project, you are going to demonstrate a small distributed data processing solution using the MapReduce programming model. This model consists of map and reduce steps. In the map step, master node takes the input, divides it and distributes to worker nodes and each worker node works on its own data independently. In the reduce step, the master node collects the answers from the workers, and combines them to generate the final result. (This programming model can be implemented in a multi-level way, i.e a worker node can map its input to other idle workers and collect the results, but we are not going to implement this.) Problem Definition You are going to extract records and calculate statistics from a large gene expression database. The data set consists of 2467 genes. Each gene can belong to one of the following 6 classes: tricarboxylic acid cycle (TCA), respiration (Resp), cytoplasmic ribosomes (Ribo), proteasome (Proteas), histones (Hist) and helix-turn-helix proteins(HTH). There are also 79 expressions for each gene, corresponding to different measurements. The data is stored in a tab separated file where the first column is the unique identifier of the gene (ORF=open reading frame), the second column is the name, next 6 columns are the class labels and the remaining 79 columns are the measurements (Table 1). When your program starts, the master node should load the data, divide and distribute it among the worker processors. Then, the master node should wait for the user to input a query. 1 ORF NAME TCA Resp Ribo Proteas Hist HTH alpha 0 alpha 7 . . . YMR056C AAC1 TRAN... -1 -1 -1 -1 -1 -1 -0.18 -0.58 . . . YBR085W AAC3 TRAN... -1 -1 -1 -1 -1 -1 -0.01 -0.42 . . . YNL141W AAH1 PURI... -1 -1 -1 -1 -1 -1 0.46 -0.71 . . . ... ... ... ... ... ... ... ... ... ... . . . Table 1: Gene Expressions. If a gene is labeled with a class its corresponding value is 1, otherwise it is -1. There will be 2 types of queries as listed below. When your program answers a query, it should not terminate, instead wait for the next query. Your program should terminate when the user enters: quit 1. Finding a record The user may want to see the data about a single gene. For example, if the user wants to see the gene YMR056C, he or she will enter: gene YMR056C Your output should contain all information about the gene. The output format is as follows: YMR056C Name: AAC1 TRANSPORT MITOCHONDRIAL ADP/ATP TRANSLOCATOR TCA: -1 Resp: -1 Ribo: -1 Proteas: -1 Hist: -1 HTH: -1 alpha 0: -0.18 alpha 7: -0.58 ... ... 2. Calculating Statistics The user may wonder the mean and the standard deviation of the measurements of genes belonging to a specific class. For example, if the user wants to list the statistics for the TCA class, he or she enters: class TCA You have to output the mean and standard deviation of the 79 measurements of the

Parallel Programming with MPI in C or C++

$30-250 USD

$30-250 USD

Información sobre el proyecto

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Sobre este cliente

Verificación del cliente

Otros trabajos de este cliente

Trabajos similares