Need HTML spider to catalog and convert video gallery pages into CSV/ZIP(repost)
$100-500 USD
Pagado a la entrega
Web application with simple one-page interface. I'm open to this being created in PERL or PHP5.2.
Input: A list of URLs of thumbnail/movie gallery HTML pages which have textual, graphical, image, and video content in various formats and layouts.
Output: A CSV dump file which includes the Title, Description, Video thumbnail filename, Video filename (H.264 or FLV format), Video duration (minutes:seconds), Video height (pixels), Video width (pixels) AND a ZIP file including the video and thumbnail files.
Magic:
1) Spider will need to navigate page and catalog all direct links to video files and video files hosted on the page.
1a) Spider will need to forge referral page information when requesting images and videos to get around webserver leaching restrictions.
2) If the video file is in WMV format, convert it to H.264 MP4 and save a local copy for inclusion in ZIP file. If the video file is already in H.264 MP4 or Flash Video (FLV) format, just save a local copy.
3) If the video file is direct linked from the gallery page, and it is linked from this page via a linked image, save that image as the thumbnail for that video.
4) Determine the height and width of the video file and resize the image thumbnail to the same dimensions.
5) If there is no image thumbnail (see #3), create a thumbnail (with the same height/width as the video; jpeg format) from a random frame of the video file.
6) Determine the duration of the video clip and save this for export in the resulting CSV file
7) Save the title of the gallery page as the the title of the video clip.? If there are multiple videos on a single gallery, append "#1", "#2", etc. to the end of the title. 7a) Rename video file and video thumbnail file to correspond to this same title format (to prevent filename collisions with past or future videos).
8) Catalog all text of at least 2 sentences long for export as description of video clips on page. ?
9) Use a predefined list of 'stop' words to disqualify certain sentences from being included in Description collation.
9) Export a CSV file with one line per video clip and all other information described in Output section above. There may be multiple entries if there are multiple videos hosted or linked from the gallery page. ?
10) Export a zip file which includes all of the thumbnails and MP4/FLV video files.
## Deliverables
You will be required to develop and test this software with pages which contain adult content.? You must accept this provision to be selected for this project.
Gallery pages will be provided to use during development and test phases.
Here are some examples of the gallery pages:
[login to view URL]
[login to view URL]
[login to view URL]:revtc:bflc,0
[login to view URL]
[login to view URL]|Uniformed
[login to view URL],1456,2,1,0
[login to view URL]
http://www.sapphic-erotica.com/VGMwMDc3fDc2Mjk=/MjQwMy4zLjEuMS4wLjAuMC4wLjA
[login to view URL]
[login to view URL]
Nº del proyecto: #3624338