Skip to main content

What it does

Converts uploaded PDF documents into markdown text that the assistant can read and discuss. Uses Marker, an open-source PDF-to-markdown converter that handles complex layouts, tables, and OCR.

Requirements

Install Marker separately (it’s not bundled with Spaceduck):
pip install marker-pdf   # requires Python 3.10+, PyTorch
When marker_single is on your PATH, the marker_scan tool is automatically registered at gateway startup.
Marker is GPL-3.0 with Open Rail model weight restrictions. Spaceduck never bundles Marker — it calls marker_single as an external process.

How to use it

  1. Click the paperclip button in the chat UI (or drag and drop a PDF)
  2. The file is uploaded to the gateway
  3. Ask the assistant about the document — it automatically invokes marker_scan
  4. The PDF is converted to markdown and the assistant can read and discuss it

Configuration

SettingDefaultDescription
timeoutMs120,000 ms (2 min)Subprocess timeout — large PDFs can be slow
maxOutputChars100,000Output truncation limit
pageRangeAll pagesConvert specific pages only
forceOcrfalseForce OCR even on text PDFs
Set MARKER_USE_LLM=true as an environment variable to enable LLM-assisted conversion for higher-quality output on complex documents.

Troubleshooting

ProblemCauseFix
Tool not registeredmarker_single not on PATHRun pip install marker-pdf and restart the gateway
Timeout on large PDFsDefault 2-minute timeout too shortIncrease timeout or use pageRange to convert specific pages
Poor OCR qualityDefault OCR settingsTry forceOcr: true or enable LLM-assisted conversion
Upload rejectedFile isn’t a valid PDFSpaceduck validates magic bytes — only real PDFs are accepted