initial commit

2025-11-01 18:04:28 -04:00
commit 4eb7ddfd99
5 changed files with 703 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,84 @@
+# jpegdoc2pdf
+
+Convert smartphone JPGs of typewritten English documents into searchable **OCRed PDFs** using parallel batch processing.
+
+## Prerequisites
+
+Install the following tools:
+- **Tesseract OCR** (ensure it's in PATH)
+- **img2pdf** - lossless image to PDF converter
+- **ocrmypdf** - adds OCR layer to PDFs
+
+```bash
+# macOS
+brew install tesseract img2pdf ocrmypdf
+
+# Linux (Debian/Ubuntu)
+apt-get install tesseract-ocr img2pdf ocrmypdf
+```
+
+## Usage
+
+### Basic Usage
+
+```bash
+./convert.sh ROOT_DIR [OUT_DIR] [-P N] [--recursive]
+```
+
+### Examples
+
+**Process subdirectories in ROOT with default settings:**
+```bash
+./convert.sh ./ROOT
+```
+
+**Specify custom output directory:**
+```bash
+./convert.sh ./ROOT ./my_output
+```
+
+**Use 4 parallel processes:**
+```bash
+./convert.sh ./ROOT ./out_pdfs -P 4
+```
+
+**Process nested subdirectories recursively:**
+```bash
+./convert.sh ./ROOT ./out_pdfs -P 4 --recursive
+```
+
+## Folder Structure
+
+Organize your images with one subdirectory per PDF:
+
+```
+ROOT/
+  CaseA/
+    001.jpg
+    002.jpg
+  CaseB/
+    page1.jpg
+    page2.jpg
+```
+
+- Each subdirectory under `ROOT` becomes a single PDF
+- Nested subfolders (with `--recursive`) are named like `Parent__Child.pdf`
+- Output PDFs are saved to `out_pdfs/` (or your specified output directory)
+
+## Options
+
+- **ROOT_DIR** (required): Root directory containing subdirectories of images
+- **OUT_DIR** (optional): Output directory (default: `out_pdfs`)
+- **-P N** (optional): Number of parallel processes (default: CPU core count)
+- **--recursive** or **-r**: Process nested subdirectories recursively
+
+## Supported Image Formats
+
+jpg, jpeg, png, tif, tiff (case-insensitive)
+
+## OCR Settings
+
+- Language: English (`eng`)
+- Tesseract OEM: 1 (LSTM neural net mode)
+- Page segmentation mode: 6 (uniform text block)
+- Optimization level: 1