c# desktop app to crawl and save website html as json
$250-750 USD
Cerrado
Publicado hace alrededor de 9 años
$250-750 USD
Pagado a la entrega
I would like a c# desktop app that will craw a directory of urls/domains and save the html pages to a local folder. Here is how it should work
I have a txt file of all of the domains i need crawled. the TXT will be rows of something like
ID | Name | MetaValue |Domain
C# app will check local config for some settings
---number of concurrent threads
---pages per sec per domain (so we dont DOS a url)
---pages per sec total (for entire app so i dont kill my machine)
---limit of pages in a single domain
Output will be folder per row in the excel
Output will be json document with the info from the row in excel as well as a blob of the html from the site
I have found some libraries you might be able to leverage but its up to you when you post to let me know your approach. This will need to be very high performance as we will be crawling millions of pages.
[login to view URL]
[login to view URL]
[login to view URL]