Last updated: June 20, 2025 at 08:12 AM
Summary of Reddit Comments on "PDF to Markdown":
Docling:
- Docling is described as a tool with good quality output and compatibility with HTML.
- One user mentioned, "I've been using Docling for about a month or so. The output quality is the best of all the open-source solutions."
- Pros: Good output quality, compatible with HTML.
- Cons: Processing speed could be improved.
PyMuPDF:
- Users compared Docling with PyMuPDF and mentioned that Docling works better than PyMuPDF for their needs.
- One user stated, "Seems to be able to do tables better than PyMuPDF4LLM, but suffers with code."
MistralOCR for Markdown Conversion:
- A user recommended using MistralOCR for PDF to Markdown conversion due to its speed and accuracy.
- The user mentioned using PDF OCR Obsidian for efficient conversion at a lower cost.
Gemini API:
- Some users found the Gemini API to be useful for PDF conversions, particularly for dealing with technical diagrams and images.
Comparison with Other Tools:
- Users compared Gemini with Docling and Marker, stating that Gemini is better for PDF conversions.
- One user suggested, "Just use Gemini 2.5 Pro directly! It’s better at pdf conversions than most dedicated software."
Other Tools and Suggestions:
- Users recommended tools like Tesseract for OCR and mentioned Google DocumentAI ASYNC for text extraction.
- Pandoc was suggested for converting Markdown to other formats, while mkdocs + with-pdf plugin was praised for converting HTML to PDF effectively.
Limitations and User Feedback:
- Some users found issues with certain tools' pricing, link expiry, and output quality.
- Concerns were raised about the necessity of using advanced AI tools for OCR tasks instead of traditional OCR software.
Requests for Features:
- Users asked about support for tables, image extraction, and providing CLI options.
- Some users inquired about exporting page numbers and implementing API or watch folder functionality.
Additional Tools and Resources:
- Modern Markdown Editor, Quarto.org, and mkdocs-with-pdf plugin were recommended for markdown to PDF conversion.
- MisterMD was suggested for a clean and simple UI for markdown to PDF/png conversion.
Overall, Docling and MistralOCR were praised for their output quality and efficiency, while Gemini API was recommended for its performance in PDF conversions. Users highlighted the importance of speed, accuracy, pricing, and support for various elements like tables and images when choosing a PDF to Markdown tool.