Find Jobs
Hire Freelancers

Build an Online Store

min $50000 USD

Cerrado
Publicado hace alrededor de 7 años

min $50000 USD

Pagado a la entrega
Large Scale Crawler Looking for a developer (or company) to build a robust web crawler system. There are approximately 20,000+ websites that we want to crawl and extract data from. We want to be able to extract these data within 3-6 months. 1. Design the architecture of the crawler or use existing open source crawler as a template. Because we’re dealing with large volume of data the architecture needs to be: • Robust and scalable • Efficient and Fast • Support proxies (to bypass anti-scraping systems) 2. Create Admin dashboard where Admin can: a. Add, Edit, View, Delete, Stop, Search crawler b. Input the URL to crawl c. Specify the data that needs to be extracted (ie. Title, Title URL, etc.) d. View, Edit, and Delete extracted data e. Option to download the data in JSON, XML, CSV f. API of the data (either via Authorization Tokens or other means) for upload and integration h. Users Management with ACL (Access Control List), Create, Edit, View, Delete users 3. Data normalization and clean up. The data coming in are unformatted and unstructured; an example would be the location or city, some site list location or city as Houston, TX, while other list as Houston, Texas or USA-TX-Houston. Therefore, the location or city data needs to be formatted, we use Google Location. 4. Because the data changes daily on these 20,000+ websites, there needs to be notifications put in place to notify the system of the changes (ie. what’s been added and what’s been removed) and update the data automatically. 5. Once the data is verified and cleansed, it will be available for search either via Solr or ElasticSearch or any other recommendation. Some of the technical challenges that need to be addressed from the beginning: • Make sure that the crawler compresses the data before fetching it otherwise it will uses a huge amount of storage • No need to re-crawl a website every 1-2 days, because it would be a waste of resources, however we do want the data every 1-2 days • Ways to prevent crawler from DoS (Denial of Service) • Ways to prevent the system from crashing and overloading because there are so many crawlers running • System should be scalable to handle crawling 100,000 – 200,000 websites • Queuing: does the crawler start right away or does it run in batches at a certain time? How does it scale when we start adding more sites to crawl? Example Day 1: Admin adds 100 sites to crawl Day 2: Admin adds 200 sites to crawl Day 3: Admin adds 500 sites to crawl Day 4: etc.
ID del proyecto: 13528239

Información sobre el proyecto

12 propuestas
Proyecto remoto
Activo hace 7 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
12 freelancers están ofertando un promedio de $53.931 USD por este trabajo
Avatar del usuario
Hello sir I hope you are doing well. I have read your requirements carefully and I am very much confident to execute your requirements successfully. I am very expert in PHP , Laravel Framework ,Magento ,WordPress & woo-commerce, Drupal, Joomla and Website Design. I have 6+ year in Website Design and development. Please have a look at my profile, I have successfully done many projects. I work round the clock and available for discussions anytime. I am available now and ready to start the project immediately. I can provide work samples in private chat. Message me for further discussion. Many thanks for providing the opportunity to bid on the project. Thanks & Regards Gamdur Singh
$50.000 USD en 10 días
4,9 (164 comentarios)
7,2
7,2
Avatar del usuario
Hello, I want to show you all relevant Demo and Designs which is similar to your project completed previously. To make sure about the requirement set and customizations, I want to discuss this project with you further on personal chat. Let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, I use to be online 24x7 on Freelancer so probably you will get a quick response from my end. Following are my Expertise Area: 1)PHP with CodeIgniter and Laravel Framework. 2)Node JS 3)Angular JS 4)Mobile App Development Thanks
$51.546 USD en 40 días
5,0 (20 comentarios)
6,7
6,7
Avatar del usuario
Hi mate, I’d be glad to assist for web development . I have read description carefully understand requirement and planned to proceed with your requirement. I am excited for this opportunity and I have strong feeling that I could be the best fit for this job. I have 5+ years experience with web development. Proven experience in MySQL, HTML5, CSS3, JavaScript, Ajax & Strong JQuery. Excellent command over MVC framework. Good experience on working on large projects. We can discuss more about work on chat. Thanks Vishal
$50.000 USD en 60 días
4,8 (68 comentarios)
5,9
5,9

Sobre este cliente

Bandera de NETHERLANDS
Netherlands
0,0
0
Miembro desde mar 26, 2017

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.