The PyDocX python module converts docx files to HTML files. Currently, it handles images by returning them as Data URIs embedded in the HTML. This causes some issues when a file has large images or many images.
I need a mixin that intercepts the images, uploads them to S3 and replaces the Data URI with the URL to the file on S3 so that the resulting HTML loads the images remotely.
The tricky part is that the S3 signature and upload destination need to be provided on a per-document basis with the request. They can't be part of the ENV or hardcoded in the library.
The reason for this is that each document that gets processed will be coming from a different customer, and the customer's S3 information should be used so that the information is stored in their bucket.
Maybe something like:
exporter = CustomHTMLExporter('[url removed, login to view]', s3_options)
[url removed, login to view]
Where s3_options is: