0 $0.00

Shop

Resource ID: #5049

Subject:

Source: Jacob FentonJacob Fenton

Affiliation:

Date: 1905-07-09

$0.00

Description

So now you know how to get data from basic pdfs and large batches. What about mixed formats, ongoing jobs etc?

This repo covers how to process pdfs in large batches, use Amazon Mechanical Turk, schedule OCR jobs to work while you aren't around, and build data pipeline. If you’re comfortable with the command line and Tesseract this is for you.

https://github.com/jsfenfen/pdf17

Membership

Join Renew Benefits

Quick Links

News Events Resources Awards Fellowships & Scholarships

Job Center

Find a Job Post a Job

Our Organization

About Board of Directors Staff Shop Pay an Invoice

Get Involved

Donate Advertise Contact

109 Lee Hills Hall, Missouri School of Journalism | 221 S. Eighth St., Columbia, MO 65201 | 573-882-2042 | info@ire.org | Privacy Policy