Find Jobs
Hire Freelancers

End to End Big data project

$30-250 USD

Cerrado
Publicado hace alrededor de 3 años

$30-250 USD

Pagado a la entrega
Problem Statement: Imagine you are part of a data team that wants to bring in daily data for COVID-19 test occurring in New York state for analysis. Your team has to design a daily workflow that would run at 9:00 AM and ingest the data into the system. API: [login to view URL] By following the ETL process, extract the data for each county in New York state from the above API, and load them into individual tables in the database. Each county table should contain following columns : ❖ Test Date ❖ New Positives ❖ Cumulative Number of Positives ❖ Total Number of Tests Performed ❖ Cumulative Number of Tests Performed ❖ Load date Implementation options: 1. Python scripts to run a daily cron job a. Utilize SQLite in memory database for data storage b. You should have one main standalone script for a daily cron job that orchestrates all other remaining ETL processes c. Multi-threaded approach to fetch and load data for multiple counties concurrently 2. Airflow to create a daily scheduled dag a. Utilize docker to run the Airflow and Postgres database locally b. There should be one dag containing all tasks needed to perform the end to end ETL process c. Dynamic concurrent task creation and execution in Airflow for each county based on number of counties available in the response Implement unit and/or integration tests for your application
ID del proyecto: 29026380

Información sobre el proyecto

5 propuestas
Proyecto remoto
Activo hace 3 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
5 freelancers están ofertando un promedio de $231 USD por este trabajo
Avatar del usuario
Hi, I am a certified big data developer, used pyspark in my many of applications, I feel you should use pyspark for multithreaded applications as spark distribute the load into different node and executors. If you have spark environment ready then you should start using it, otherwise it can be done using thread mechanism in pure python code too. Please let’s connect and discuss more on your requirements. Thanks, Naresh.
$244 USD en 7 días
5,0 (6 comentarios)
4,2
4,2
Avatar del usuario
Hi I am Ashish, I am working as Software Engineer III - Data for Walmart, Previously I was with Deutsche Bank. I have total experience of 3 years in BigData, Java Spring, Competitive Programming. I am just trying out this platform, I can do your project in 7 days. please contact in chat if you are interested.
$200 USD en 7 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
hello i am fullstack rubyonrails developer 3 year of experience . i am also working on govt covid data to analysis with bigdata i have team having much experience in python spark hadoop hive and kafka
$222 USD en 2 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
I have worked extensively in python , ETL , database and airflow in both linear and distributed environment.
$240 USD en 5 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de UNITED STATES
Los Angeles, United States
0,0
0
Miembro desde ene 21, 2021

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.