jpegdoc2pdf

Convert smartphone JPGs of typewritten English documents into searchable OCRed PDFs using parallel batch processing.

Prerequisites

Install the following tools:

# macOS
brew install tesseract img2pdf ocrmypdf

# Linux (Debian/Ubuntu)
apt-get install tesseract-ocr img2pdf ocrmypdf

./convert.sh ROOT_DIR [OUT_DIR] [-P N] [--recursive]

Process subdirectories in ROOT with default settings:

./convert.sh ./ROOT

Specify custom output directory:

./convert.sh ./ROOT ./my_output

Use 4 parallel processes:

./convert.sh ./ROOT ./out_pdfs -P 4

Process nested subdirectories recursively:

./convert.sh ./ROOT ./out_pdfs -P 4 --recursive

Organize your images with one subdirectory per PDF:

ROOT/
  CaseA/
    001.jpg
    002.jpg
  CaseB/
    page1.jpg
    page2.jpg

jpg, jpeg, png, tif, tiff (case-insensitive)