Expired domain scraper script or Outgoing Links Scraper

Cerrado Publicado Jul 28, 2015 Pagado a la entrega
Cerrado Pagado a la entrega

Project Description:

I am looking to get a custom software or script built that will scrape the outgoing links from a particular website which we call it as "seed site" or backlinks from a particular website

This will be in 2 parts :

Part 1 : SCRAPER

Example : Lets consider [url removed, login to view] as the seed site. So I want to scrape all the domains that link out from [url removed, login to view] or all the domains that [url removed, login to view] is backlinking to. For example a post about domain "[url removed, login to view]" posted on bbc and has a backlink from it. So bbc links out to thousands of sites and I want to extract all those sites

So not just bbc I want this to work for any of the seed sites or scrape from any of the sites that i enter in software

Part 2 : Check for domain metrics by Integration with API

After it scrapes these domains I want to check metrics of these extracted domains like PA, DA, Tf etc. Meaning they should work with or intergrate with API of [url removed, login to view], [url removed, login to view] and [url removed, login to view] services. It should also check for domain availibility for registration.

I am aware that many such similar scripts have been built in freelancer sucessfully. I would be glad to award them this project

__________________________________________________________________________

Inputs to the tool

------------------

* Mandatory - 1 or more seed urls

* Optional - Crawl depth (Default value = 0, max value = 10)

* Optional - TLD list (Default values = [.org, .net, .com, .info, .biz]) If user enters TLDs, then append them to existing ones.

* Optional - Number of parallel threads to use. (Default value = 6)

* Optional - Proxy server configuration

Output from the tool

--------------------

* CSV file with list of domain names scraped

Requirements

-------------

SCRAPER :

1) Take 1 or more seed urls as input via UI field or from a file

2) Take crawl/scrape depth (e.g., 1, 2, 3 and so forth), that is to determinate in a parameter field

3) Take TLD from a list, that is to determinate in a parameter field (.org,.net,.com,.info,.biz and a customer needs to be able to add more and his preferred TLDs)

4) It also needs to work with subdomains

5) Crawl the urls for backlinks (showing the process, so customer knows that something happens and is working, like counting the processed

6) If the backlink is invalid (e.g., HTTP 404 not found), write it to a separate file

7) If the depth is 0, crawl only the seed url and domain. If the depth is 1, crawl backlink domain [url removed, login to view] depth is 3 count backlinks of the backlinks, and so forth.”

8) Possibility to use proxies (to determinate in a parameter field) for proxies)

9) Use multiple threads to scrape

10) Save the invalid to cvs file

11) Build a web application using JSP which will run on a Tomcat. The wordpress site / pop up window

a) should display the status of the scraping

b) should work in all browsers

DOMAIN METRICS CHECKER

1)Upload all the domains in UI or text file

2)It should check for MOZ - DA PA ; Majestic : Trust flow & citation flow; check for domain availibilty

Deliverables & Scopes

---------------------

Following are the deliverables the developer will provide the employer

1) A standalone Java program that scrape

2) A web page to enter the inputs mentioned above

Example of such exisitng and working domain scraper :

[url removed, login to view]

Java PHP Arquitectura de software

Nº del proyecto: #8159095

Sobre el proyecto

10 propuestas Proyecto remoto Activo Oct 6, 2015

10 freelancers están ofertando un promedio de $494 por este trabajo

malviyamanish

A proposal has not yet been provided

$631 USD en 20 días
(92 comentarios)
6.2
Truemanhardy

Hello Hiring Manager, I have read your job description very carefully and I’m very energised to provide my solutions for your job. According to your requirements, I have done many projects exmple baclinking I a Más

$473 USD en 5 días
(4 comentarios)
3.2
dk2k

Hi! I have to edit my bid after discussion in the chat. I hope we understood each other :) _some_text_

$277 USD en 6 días
(4 comentarios)
2.6
nagarajdvdg2009

Hi, i have 5 years of experience in java j2ee technologies. I have extensive working knowledge in Spring and Front end technologies. I can do this job for less price. Hope we can have a conversation

$555 USD en 10 días
(2 comentarios)
1.9
gminfotechindia

This distinct approach is physically embodied in our Information technology design centers – the only facilities of their kind – where clients create value and tackle specific challenges through design immersion.

$555 USD en 10 días
(0 comentarios)
0.0