Realtime Hadoop data ingest with Kafka, Storm, HBase (OpenTSDB), and Postgres on AWS

$1500-3000 USD

Terminado

Publicado

hace alrededor de 8 años

$1500-3000 USD

Pagado a la entrega

SKILLS REQUIRED This project requires a data engineer with development experience setting up a realtime data ingestion demo using open-source Apache tools Kafka and Storm. It also requires someone that's familiar with setting up a Hadoop environment using the configuration tool Puppet. If you have familiarity with these tools, this will be a fairly easy but well-paying project! BASIC DESCRIPTION I need to ingest realtime time-series data (from an API such as Twitter's) into four different data stores: HDFS, OpenTSDB (a time-series database built on HBase), a PostgreSQL database, and Amazon Web Services S3. First, Kafka will ingest from the API and send the raw messages (data) to both HDFS and Storm. You will need to write a very simple data transformation in a Storm bolt and then another Storm bolt will send the transformed data to those four data stores. Overall, this is a fairly simple development process. Most of the work of the project is just in installing and configuring all of the different software components of this architecture. I will provide an AWS EC2 cluster for it all to run on. You will need to first set up a Puppet environment on top of the AWS cluster. Within that Puppet environment you will need to install, configure, and connect together all the various software components: Hadoop (HDFS), Zookeeper, YARN, Kafka, Storm, OpenTSDB and HBase, PostgreSQL, as well as any associated subcomponents or any other open-source software packages these require to run. I have included a software architecture diagram. PROJECT DELIVERABLES The final goal of the project will be to have a working system that actually ingests data from an external API such as Twitter in realtime and stores that data in these four data stores with low latency. You will need to create a simple demo that continually demonstrates all of this happening using data from the API. You should also be able to walk me through all of the code and you will be required to provide very simple documentation of all the steps you took to set all of this up. In addition, any advice you can provide me along the way on proper cluster setup and sizing, hardware config, or parallelization count (for Storm and Kafka) would also be appreciated! ADDITIONAL INFO Once complete, I will then take this project and add my own code it and expand it. For example, I will be ingesting stock market data (financial time-series) through this platform and writing complex transformations and realtime processing for the data in Storm before it is stored. So the platform you set up should be designed and developed in a way that it can be easily expanded and added to. I can provide much more details to the chosen bidder for this project when they are ready, and will help them get started and assist with everything (such as accessing the AWS cluster) as needed. If you do a good job, I will give you a perfect rating, great feedback, and have a lot of optional paid follow-on work for you to do and make more money.

ID del proyecto: 9629976

Información sobre el proyecto

17 propuestas

Proyecto remoto

Activo hace 8 años

¿Buscas ganar dinero?

Dirección de email

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto

Cobra por tu trabajo

Describe tu propuesta

Es gratis registrarse y presentar ofertas en los trabajos

Adjudicado a:

@ubfapps

Hi, We are a team of 3 who has 3 plus years experience in languages Java and scala, and big data technologies like hadoop, spark, hive, mongo, cassandra, elasticsearch etc. Since working as part of product development, we have the knowledge of making end to end applications. Our ultimate aim is to do the projects at the best quality and earn rating. Since you are unaware about the domain and technology skills of us, we are ready to give a proposal with a sample working code as per your requirement before excepting the bid as well. We have experience in android. We created a sample android app which is available in the Google Play Store called "SETTLE EASY". link : [login to view URL] Thanks, Ubfapps Team

$2.222 USD en 5 días

4,8

(7 comentarios)

4,2

@rpawlikowski

I HAVE CREATED A NEARLY IDENTICAL ARCHITECTURE IN MY PREVIOUS JOB In this project the processing pipeline consisted of Flume (for data ingestion from Twitter), Kafka, Storm, Cassandra, HDFS, and MySQL. I have setup clusters, installed and configured all components, wrote the software for processing (in Java), everything from beginning to end. I have worked with all the technologies you mentioned except Ansible (will gladly learn) and OpenTSDB (but I have used HBase). About me: Please look at my Odesk profile for reviews and past projects: [login to view URL]~0164a5dba7c5432999/ I am an experienced Big Data engineer and architect (over 6 years of professional experience). My key area of expertise is the design and development of scalable distributed systems, encompassing acquisition, processing, storage, search, and analysis of data (especially Big Data technologies from the Hadoop ecosystem). I lead a small team of highly skilled and motivated Big Data developers. My qualifications: 1. For over 4 years I've been constantly working with Big Data technologies from the Hadoop ecosystem (Mapreduce, Hive, Pig, Nutch, Zookeeper, Storm, and others), NoSQL storage (Cassandra, HBase, MongoDB), and search engines (Solr). 2. Nearly 8 years of coding experience in Java and 4 in Python. 3. Fluent spoken English - I lived in the US for over 5 years. I think it would be best to discuss all the details through Skype. Looking forward to hearing from you!

$2.500 USD en 30 días

5,0

(1 comentario)

4,8

17 freelancers están ofertando un promedio de $2.864 USD por este trabajo

@chirgeo

Hi. I read the project description and got interested. Even if I do not have experience with few tools that you mention I'm sure I can handle it. Few questions I have so far: 1. Who is responsable to get data fro the Twitter and send to the system we want to build? 2. " time-series data" - I guess this does have some kind of format? I need more details about this. 3. Why you also need to upload data to S3? And what should be the format of this S3 files? 4. "You will need to write a very simple data transformation in a Storm" - can you elaborate on this data transformation ? Thx, waiting for details. Hope we will collaborate.

$2.500 USD en 30 días

4,9

(72 comentarios)

6,8

@prashushinde9

================== Amazon MWS API Experts ================== We are Amazon MWS API experts and completed so many projects using its API I have ready to use API for -- 1)Amazon Orders of seller 2)Amazon Product API 3)Amazon Price API 4)Amazon Repricer 5)Amazon SES API 6)Amazon SQS API 7)Amazon Product Advertising API I have done so many complex projects based on Amazon MWS API and i am sure your project would be very easy for me. I have demos ready along with me, ping me so that i can share the links of demos with you.

$3.092 USD en 50 días

4,6

(2 comentarios)

4,4

@satishiiith

HI I have decent experience of implementing map-reduce tasks. I have also worked stream data apps like storrm and apache spark. Projects seems very interesting, would like to work with you.

$2.500 USD en 30 días

5,0

(3 comentarios)

2,6

@sumodirjo

Hi, I'm a system engineer. Experienced in Linux, virtualisation and cloud computing. I use git, ansible and AWS on daily work. I'm familiar with both Github and Bitbucket. I can help you create the infrastructure that you requested. The deliverables will be ansible playbook where you can reuse to recreate the cluster in the future. Please contact me so we can discuss further about this project. Looking forward to hear from you.

$3.333 USD en 30 días

5,0

(2 comentarios)

1,4

@MilesChino

DEAR HIRING MANAGER, PLEASE ASK US ABOUT OUR PROMOTIONAL DISCOUNT TODAY. WE ARE EXPERTS IN THE REALM OF BUILDING AND CONSTRUCTING FIRST CLASS SOFTWARE SOLUTIONS, TOP OF THE LINE WEBSITES, MOBILE APPS, AS WELL AS CONSTRUCTING COMPLEX DATABASES. WE ARE THE PREMIERE DEVELOPMENT TEAM WITHIN THE WEB/SOFTWARE INDUSTRY. Here's our first review and PORTFOLIO: https://www.freelancer.com/u/MilesChino.html I love the idea of your project! PLEASE MESSAGE ME. Regards, Laguna Hills, CA 92653

$2.368 USD en 30 días

0,0

(0 comentarios)

0,0

@amalzackaria

Hi, I am a data warehousing resource with 4.5 years of experience in Informatia,ODI,UNIX ad databases like Oracle,mssql & Netezza. And I am an intermediate in expertise on these AWS and HDFS. I have done some minor projects to transfer files to S3 and then to Redshift, from local to HDFS, transferring data from Netezza to Hive etc. your project description sounds interesting to me. You can trust me that I would deliver what you are looking for, and It wold be a great learning opportunity for me as well. Thanks.

$1.666 USD en 30 días

0,0

(0 comentarios)

0,0

@cpuntodanielcc

Hi, how are you? i don't know how to work with Ansible or Puppet. Any way i think this is no problem, Just i'll need read about that. I have experience on this technologies (no experience with hbase or Amazon S3) but i know this is no problem, this is easy to learn and implement. I have a question. Why dont use spark? I Think maybe is a good idea. But any way, i love this project, if you don't have chosen an freelancer please let me know to do this"very simple data transformation in a Storm bolt and then another Storm bolt will send the transformed data to those four data stores".

$3.125 USD en 45 días

0,0

(0 comentarios)

0,0

@Digiroo

Hi we are a UK IT company specialising in data architecture. We are currently working on two large UK contracts for a Utility company (Market Reform Programme) to implement a big data solution and the Royal Air Force. We have the expertise to deliver and the resource to deliver quickly. Setting up the Hadoop demo is fairly straight forward, your solution architecture is clear and we can't see anything we would change. We can provide all technical infrastructure if required. Happy to work with your AWS EC2 cluster if that is your proffered option. Ansible is fine. We have corporate Ansible accounts or can work with yours. For this project we will assign myself (Data and Hadoop Architect for solution design and project management) as well as a UK based Hadoop engineer. Depending on your final specification this should be completed less than 15 working days. I'm happy to set up a Skype call to discuss in more detail. We're ready to start any time you require. Regards. Nathanael Ward (CEO and Data Architect)

$3.333 USD en 15 días