Cerrado

Python power programmer needed to create a function to ingest and process data as a stream

Looking for a python developer / data engineer

should have experience ingesting and processing data as a stream

demonstrable experience handling 2-3 GB of source data

**knowledge of object oriented programming concepts, professional documentation methods and python lambda functions are a must

Oracle VM box, linux Ubuntu 14.06.5 LTS, pycharm, Anaconda environment

Data is available as TSV extracts from multiple sources in CDL. Data Engineer should be able to merge the TSV extracts by means of applying correct join techniques. As the data will be available in compressed format, data engineer should apply right techniques such as reading data in a streams rather than reading the entire uncompressed format of data - as it might not fit the entire memory. Hence optimal coding is expected. The merged data will be transformed and stored in a postgreSQL data base ([login to view URL]).

The function should follow Object Oriented Paradigm with continuous integration and deployment in focus. Also version controlling is expected.

Some remarks:

- Each data snapshot can contain multiple headerless main data files in TSV format, with each file having a size of up to 2GB. Engineer should be able to read files as a stream while unpacking them, because they usually do not fit into RAM.

- In addition to the main data files, each snapshot has a file with the header names and multiple lookup files that map the numeric IDs from the main data to Strings, comparable to a foreign key in an SQL DB.

- Data should be read and transformed on a record by record base (stream or mini-batch processing).

- Each combined and transformed record should be prepared for multiple data sinks, e.g. SQL query strings to write a record into a PostgreSQL, MS SQL. Engineer will create code for a write adapter for each data sink with a common interface so that the same function call can used to write into any of the specified data sinks.

*** Code provided should be modular, reusable and well documented. Engineer needs to know how to build Python modules with classes, using OOP decomposition practices, inheritance (e.g. abstract classes).

- Code should have Unit Tests, if appropriate

- Code will be implemented as a Python AWS Lambda function. Engineer should be familiar with building Lambda functions and should ideally have a local development environment, setup for building and uploading Lambda functions.

Habilidades: Aws Lambda, Procesamiento de datos, Linux, Object Oriented Programming (OOP), Python

Ver más: python 2.7 yield, python generator while true, python generator next, python generator yield, python generator class, python generator send, python yield list, iterator and generator in python, software english portuguese translation programmer needed 2008, urgent programmer needed, dancewear programmer needed, access programmer needed, vbnet programmer needed, python projects help needed, php oscommerce programmer needed, free lance access database programer software programmer needed, python power designer, asp programmer needed, programmer needed ecommerce website, speech recognition programmer needed mobile

Información del empleador:
( 35 comentarios ) faridabad, India

Nº del proyecto: #18026938

2 freelancers están ofertando el promedio de $194 para este trabajo

$166 NZD en 3 días
(2 comentarios)
2.8
HelloLeon

Experienced Python Developer here! Would like to give it a try~ My full profile: [login to view URL] Looking forward to a chat about the project details!

$222 NZD en 8 días
(0 comentarios)
0.0