You will be given a PDF file that contains a list of classes for a university. Use this PDF in any way of your choosing to extract the text and create a CSV file.
Here is an example of what's in the PDF:
WMNS 4996 Experiential Education Directed Study (1 to 4 SH)
Draws upon the studentâ??s approved experiential activity and integrates it with study in the academic major. Restricted to those students who are using the course to fulfill their experiential education requirement. â?¢ Prerequisite: Permission of instructor. â?¢ Repeatability: May be repeated without limit.
This should go into the CSV file as follows:
WMNS,4996,"Experimential Education Directed Study","Draws upon..."
We only need this information, all the rest in the PDF can be discarded. We only need the capital letter code for the classes, the digital code, the name (between quotes) and the description (between quotes).
The description should include the bullet points, though the actual bullet point symbol may be discarded for simplicity. New lines within the strings should be converted to: <br/> when appropriate.
A suggested method for doing this is copy-pasting the PDF's text to a notepad, saving that. Then writing a script to perform some regex operations on the text to produce a CSV file.
Use any method you wish, the only deliverable is the CSV file. It should contain roughly 6,000 classes.
Hi, I can do it. Please send some sample page of PDFs so I can check the text structure of the PDFs and select suitable tool for conversion in advanced.
Looking forward to hearing you,
Cuong NH