Cart 0 $0.00
IRE favicon

Shop

Resource ID: #5049
Subject: 
Source: Jacob FentonJacob Fenton
Affiliation: 
Date: 1905-07-09

$0.00

Description

So now you know how to get data from basic pdfs and large batches. What about mixed formats, ongoing jobs etc?

This repo covers how to process pdfs in large batches, use Amazon Mechanical Turk, schedule OCR jobs to work while you aren't around, and build data pipeline. If you’re comfortable with the command line and Tesseract this is for you.

https://github.com/jsfenfen/pdf17

109 Lee Hills Hall, Missouri School of Journalism   |   221 S. Eighth St., Columbia, MO 65201   |   573-882-2042   |   info@ire.org   |   Privacy Policy
apartmentpenciluserscalendar-fullcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
My cart
Your cart is empty.

Looks like you haven't made a choice yet.