I need a ~500MB WARC file extracted.
I'm not sure what skills you need, but I assume (from what I've read) that python or java may be useful. I'm not sure though.
See [login to view URL]
I need the individual files, which are in the archive, in standard format (html, asp, pdf, jpg, gif etc) so that I can view them using ordinary tools (browser, Photoshop, Adobe Reader etc).
The WARC file is contained in an archive which is located at [login to view URL]