End to End Big data project

$30-250 USD

Cerrado

Publicado

hace alrededor de 3 años

$30-250 USD

Pagado a la entrega

Problem Statement: Imagine you are part of a data team that wants to bring in daily data for COVID-19 test occurring in New York state for analysis. Your team has to design a daily workflow that would run at 9:00 AM and ingest the data into the system. API: [login to view URL] By following the ETL process, extract the data for each county in New York state from the above API, and load them into individual tables in the database. Each county table should contain following columns : ❖ Test Date ❖ New Positives ❖ Cumulative Number of Positives ❖ Total Number of Tests Performed ❖ Cumulative Number of Tests Performed ❖ Load date Implementation options: 1. Python scripts to run a daily cron job a. Utilize SQLite in memory database for data storage b. You should have one main standalone script for a daily cron job that orchestrates all other remaining ETL processes c. Multi-threaded approach to fetch and load data for multiple counties concurrently 2. Airflow to create a daily scheduled dag a. Utilize docker to run the Airflow and Postgres database locally b. There should be one dag containing all tasks needed to perform the end to end ETL process c. Dynamic concurrent task creation and execution in Airflow for each county based on number of counties available in the response Implement unit and/or integration tests for your application

ID del proyecto: 29026380

Información sobre el proyecto

5 propuestas

Proyecto remoto

Activo hace 3 años

¿Buscas ganar dinero?

Dirección de email

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto

Cobra por tu trabajo

Describe tu propuesta

Es gratis registrarse y presentar ofertas en los trabajos

5 freelancers están ofertando un promedio de $231 USD por este trabajo

@nmogilip

Hi, I am a certified big data developer, used pyspark in my many of applications, I feel you should use pyspark for multithreaded applications as spark distribute the load into different node and executors. If you have spark environment ready then you should start using it, otherwise it can be done using thread mechanism in pure python code too. Please let’s connect and discuss more on your requirements. Thanks, Naresh.

$244 USD en 7 días

5,0

(6 comentarios)

4,2

@ashishpatel0720

Hi I am Ashish, I am working as Software Engineer III - Data for Walmart, Previously I was with Deutsche Bank. I have total experience of 3 years in BigData, Java Spring, Competitive Programming. I am just trying out this platform, I can do your project in 7 days. please contact in chat if you are interested.

$200 USD en 7 días

0,0

(0 comentarios)

0,0

@sanjayrathore556

hello i am fullstack rubyonrails developer 3 year of experience . i am also working on govt covid data to analysis with bigdata i have team having much experience in python spark hadoop hive and kafka

$222 USD en 2 días