Crawl a university website and generate a database of books from it
$30-100 USD
Terminado
Publicado hace más de 13 años
$30-100 USD
Pagado a la entrega
The goal is to generate a database of university books by crawling a specific university website. Crawling means that starting from the root you will traverse and parse the pages to grab the required data.
From the university page you will get the list of studies (law, computer science...). In each of them you will find a list of topics (civil rights, compiler design, etc) and in each of them a list of books required for study.
We need the complete list of studies, topics and books. For each book we will need the title, author, editor, ISBN number and picture, sizes, pages and price (all the information found in the page) .
You can write a program to get this information in any language you want, as long as you are able deliver the database (we prefer MySQL but we will accept any other as long as we can export the data later) with all the data.
This is throw away code (the important this is the data): No expectations in code quality except that is must done what's been described here.
The HTML to be parsed is itself generated dynamically from a database so you can expect to have to write code just to parse a few types of pages which the same structure.
Note that one topic may have more than one book, and that one book can be used in more than one topic (N:N relationship).