Find Jobs
Hire Freelancers

Parallel Programming with MPI in C or C++

$30-250 USD

En curso
Publicado hace casi 11 años

$30-250 USD

Pagado a la entrega
In this project you are going to implement a parallel algorithm with C/C++ language usinMPI library. MapReduce Everyday we create tremendous amounts of data in our every activity. Tweeter adds 400million tweets to its database in every day. The sensors in the Large Hadron Collider at Cern records petabytes of data each year. We can find many more examples from astronomy, biology, internet activities, sensor networks, etc. This makes big-data processing, today's one of the biggest computer science and engineering problems. Distributed processing is the general approach for handling large volume data, but designing an efficient distributed system is a challenging task. There are some general distributed pro- gramming frameworks in order to simplify the implementation. One of them is the MapReduce model. There are many libraries for MapReduce, so that the programmer does not care about the distribution of the data, instead he or she supplies the necessary map and reduce functions. However, we are not going to use any library; instead, we borrow the idea from this model, and implement our solution using MPI library in C or C++. In this project, you are going to demonstrate a small distributed data processing solution using the MapReduce programming model. This model consists of map and reduce steps. In the map step, master node takes the input, divides it and distributes to worker nodes and each worker node works on its own data independently. In the reduce step, the master node collects the answers from the workers, and combines them to generate the final result. (This programming model can be implemented in a multi-level way, i.e a worker node can map its input to other idle workers and collect the results, but we are not going to implement this.) Problem Definition You are going to extract records and calculate statistics from a large gene expression database. The data set consists of 2467 genes. Each gene can belong to one of the following 6 classes: tricarboxylic acid cycle (TCA), respiration (Resp), cytoplasmic ribosomes (Ribo), proteasome (Proteas), histones (Hist) and helix-turn-helix proteins(HTH). There are also 79 expressions for each gene, corresponding to different measurements. The data is stored in a tab separated file where the first column is the unique identifier of the gene (ORF=open reading frame), the second column is the name, next 6 columns are the class labels and the remaining 79 columns are the measurements (Table 1). When your program starts, the master node should load the data, divide and distribute it among the worker processors. Then, the master node should wait for the user to input a query. 1 ORF NAME TCA Resp Ribo Proteas Hist HTH alpha 0 alpha 7 . . . YMR056C AAC1 TRAN... -1 -1 -1 -1 -1 -1 -0.18 -0.58 . . . YBR085W AAC3 TRAN... -1 -1 -1 -1 -1 -1 -0.01 -0.42 . . . YNL141W AAH1 PURI... -1 -1 -1 -1 -1 -1 0.46 -0.71 . . . ... ... ... ... ... ... ... ... ... ... . . . Table 1: Gene Expressions. If a gene is labeled with a class its corresponding value is 1, otherwise it is -1. There will be 2 types of queries as listed below. When your program answers a query, it should not terminate, instead wait for the next query. Your program should terminate when the user enters: quit 1. Finding a record The user may want to see the data about a single gene. For example, if the user wants to see the gene YMR056C, he or she will enter: gene YMR056C Your output should contain all information about the gene. The output format is as follows: YMR056C Name: AAC1 TRANSPORT MITOCHONDRIAL ADP/ATP TRANSLOCATOR TCA: -1 Resp: -1 Ribo: -1 Proteas: -1 Hist: -1 HTH: -1 alpha 0: -0.18 alpha 7: -0.58 ... ... 2. Calculating Statistics The user may wonder the mean and the standard deviation of the measurements of genes belonging to a specific class. For example, if the user wants to list the statistics for the TCA class, he or she enters: class TCA You have to output the mean and standard deviation of the 79 measurements of the
ID del proyecto: 4561145

Información sobre el proyecto

3 propuestas
Proyecto remoto
Activo hace 11 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
Adjudicado a:
Avatar del usuario
I have extensive experience with MPI; I have done several projects running on our university's super cluster. Also see my PM for more info.
$150 USD en 3 días
4,3 (2 comentarios)
3,2
3,2
3 freelancers están ofertando un promedio de $293 USD por este trabajo
Avatar del usuario
Hello, I will implement this program in C++ and MPI. Thanks, Paul
$500 USD en 7 días
4,9 (48 comentarios)
5,4
5,4
Avatar del usuario
Hi, expert in parallel programming here, I'll use C/C++/MPI for querying your gene database. Thank you, Danny
$250 USD en 5 días
5,0 (11 comentarios)
2,4
2,4
Avatar del usuario
Hi, please see my PM
$230 USD en 10 días
5,0 (5 comentarios)
2,2
2,2

Sobre este cliente

Bandera de TURKEY
Istanbul, Turkey
5,0
1
Forma de pago verificada
Miembro desde may 27, 2013

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.