The Job
Scraping deals info from 4 websites, storing the info in a Excel.
My Work
I have developed (in the Java Programming Language) a Generic Web-Scraper Tool - called OpenMana Web Information Miner (OmanaWIM) - that can be configured to scrape any information from any website. It can do log-in, process JavaScript / AJAX call results, chase multi-level links, post search-forms and handle pagination; can accept / process response in XML; can download images and files; is multi-threaded in a configurable way; can use proxies; supports user-specifiable filters; scraped info can be delivered in JSON or XML / posted to database or Excel/CSV.
THERE WILL BE NO NEED TO WRITE SITE-SPECIFIC-CODE. CAN WORK FOR FUTURE NEEDS ALSO.
When page / navigation structure on the web site changes, no need to write new code - just tweak the configuration.
This tool can also straightaway work with sites exposing HTTP-protocol-based APIs / web-services.
My Solution
I propose a solution in Java, built on top of my OmanaWIM tool.
The solution will use the following open-source libraries:
1. Selenium WebDriver with FireFox
2. HtmlUnit
3. Castor XML
Deliverables
1. Perpetual Non-exclusive non-transferable node-bound Use Licence for the OmanaWIM Tool with executable Java Application .
2. Custom Java classes for populating Excel.
3. Configuration-files for 4 web-sites; for more sites: extra $30./site.
4. Installation Guide
Me
15 + years rich experience in software development.