Find Jobs
Hire Freelancers

Simple web crawler

$10-30 USD

Cerrado
Publicado hace más de 7 años

$10-30 USD

Pagado a la entrega
Create a database/php application that will crawl a list of URLs, determined by a priority number using a master/slave system. The master and slave will most likely be done using Ubuntu/Debian EC2's. Using a LAMP stack and with php5-curl installed (To do the requests). The code has to work with that setup, it can be developed in windows but the code has to work for linux filesystem. The main server/database (lets call it MAIN) will have a mysql database with a few tables: Urls - (Url, Priority, SlaveId) Slaves - (SlaveId, ServerIP, QueueSize, State) State options: Online, Offline Priorities will be 1-5. Each slave reports to MAIN its state every 5 minutes, confirming its 'Online'. If MAIN doesn't hear from the slave after 5 minutes, it reports state as 'Offline'. URLs will be removed once completed by the slave (The slave will do a SQL delete and delete it from the MAIN). urls will be added to the URL table and can be added randomly to the slaves (doesn't need to be balanced, but if there are 5 new urls then they should be added to slave1, slave2, slave3...etc) The balance algorithm needs to happen instantly when a slave goes offline, goes online, and every 1 minute. The MAIN servers job is to assign slaves to the Urls and try to balance workload between all slaves as much as possible. If a slave gets marked as Offline, or a new slave becomes online all queued URLs get even distributed appropriately, making sure not only the number of assigned URLs to a slave is even but the average priority is about the same. The SLAVEs job is to process their assigned URLs, in order by priority (5 is highest priority). The slave will use php5-curl to make a request to the URL, and then save the contents of the request to a file on the hard drive. Then it will report to MAIN that it's queue is 1 less, and it will delete the URL record it just deleted.
ID del proyecto: 13075992

Información sobre el proyecto

7 propuestas
Proyecto remoto
Activo hace 7 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
7 freelancers están ofertando un promedio de $125 USD por este trabajo
Avatar del usuario
Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me
$15 USD en 1 día
5,0 (59 comentarios)
5,5
5,5
Avatar del usuario
Hello, I am available for your job, I can start right now. I will provide you good quality work with fast turnaround. Please hire me for this project. Waiting for your kind reply for more discussion. Humfi
$250 USD en 1 día
5,0 (8 comentarios)
3,3
3,3
Avatar del usuario
Hello, Hope you are doing well. I read your project description, Lets have a technical discussion then we understand, negotiate costing, timeline and then we proceed further. Also I shall show my past work when we discuss. Thanks!
$100 USD en 2 días
4,7 (10 comentarios)
3,4
3,4
Avatar del usuario
hello I have read your requirement. I can help you to finish this work. Can you provide more information about this project? I can use python to scrape. Thank you
$25 USD en 1 día
5,0 (1 comentario)
1,5
1,5
Avatar del usuario
Hey, I've bid, but I'd also like to say that based upon what you are trying to achieve (central list of URLs, work servers visit them and store the contents somewhere) I would approach it differently. I'd need to clarify what exactly you are trying to do, but if you just want a prioritised list of URLs fetched and stored, with some form of rate limiting, here's how I would approach it: Create a MySQL database on a central node, write a small Laravel/Lumen PHP app, which when queried, responds with a URL that needs to be fetched, according to priorities (sort by priority, then sort by age). Once the worker has fetched and stored the page, it tells the central app, which then marks it as completed. The worker nodes/servers would themselves be small Laravel/Lumen PHP apps (although it could also be done in Python, possibly even BASH) which run on a crontab. This way you can add new workers, run several workers on one server, do whatever you want, and there's no need for complicated offline/online scenarios, workers simply take work according to their configured time interval. I can see your budget is low for this, so apologies for the wall of text if you're not looking to spend much. If you're looking to do something properly and pay higher, get in touch. If you're looking for something cheap that'll do the job, use one of the other bidders, but I hope maybe this was helpful! Thanks, Dan
$222 USD en 10 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de UNITED STATES
Adrian, United States
5,0
1
Forma de pago verificada
Miembro desde may 2, 2012

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.