Create a tesseract configuration to split documents based on pagecode T pages

Cerrado Publicado hace 6 años Pagado a la entrega
Cerrado Pagado a la entrega

we have two type of documents:

- multipage PDF files (could already contain also OCR detected text)

- multipage Tiff files

These pages contain the standarized patchcode T separator pages.

Samples of the patchcode T

- [login to view URL] on page 11

- [login to view URL] on page 75

Your job is to provide us a shell script which

- gets as input either a PDF file or a Tiff file (choosable by param)

- parses through the file and splits the file the by given patchcode T into multiple files (with same filetype)

- does OCR of the content (shall be switchable with on/off to decide if OCR shall be done or not)

Ensure the pagecode page can have any arbitrary content between the code lines (like in the samples)


alternative to Shell-Script is also a Java-Implementation

Java OCR Shell Script

Nº del proyecto: #15704623

Sobre el proyecto

4 propuestas Proyecto remoto Activo hace 6 años

4 freelancers están ofertando un promedio de €90 por este trabajo

iitmshanker

A proposal has not yet been provided

€200 EUR en 2 días
(1 comentario)
3.2
ranzhie07

i have existing project here ready and similar to your needs i use enhanced tesseract ocr Stay tuned, I'm still working on this proposal.

€61 EUR en 0 días
(1 comentario)
1.1
sreejith1993

Hi, I hope you are doing fine, I have relevant experience in parsing PDF and reading TIFF images using Java. I have also worked on Tesseract OCR engine as well and I assure you i am the best fit. Relevant Skills and Más

€45 EUR en 10 días
(0 comentarios)
0.0
livezingy

Thanks very much for your invitation! I'm not the right one, because I'm not familiar with Shell-Script and Java. I come here to say thanks and sorry to you . Best Wishes for you! Relevant Skills and Experience Más

€55 EUR en 10 días
(0 comentarios)
0.0