PDF Tools
PDF tools for Flow-Like workflows
About
PDF Utils
Build PDF workflows directly in Flow-Like. PDF Utils provides a broad set of nodes for inspecting, editing, splitting, merging, cleaning, and extracting data from PDF files without leaving your flow.
The package is designed around FlowPath: every PDF input is passed as a
FlowPath, and every generated PDF or extracted file is written to a FlowPath
destination. The filename is already part of the FlowPath, so nodes do not ask
for separate filename fields.
What You Can Do
Inspect Documents
Read core PDF details such as page count, PDF version, file size, object count, encryption state, metadata presence, page boxes, fonts, images, bookmarks, annotations, links, attachments, form fields, and blank-page candidates.
Work With Pages
Create new PDFs from selected pages, delete pages, merge files, rotate pages, crop or set page boxes, remove blank pages, reorder pages, reverse page order, duplicate pages, insert pages from another PDF, replace page ranges, and split documents by ranges, chunk size, or bookmark sections.
Extract and Search Text
Extract text from full documents or page ranges, return per-page text records, count words and characters, search with literal text or regular expressions, and export extracted text to Markdown, JSON, or CSV.
Manage Metadata and Cleanup
Read, write, or remove document information metadata. Read and remove XMP metadata. Remove JavaScript, automatic actions, annotations, links, bookmarks, attachments, and form field definitions. Use the sanitize node for common cleanup of active content and identifying metadata.
Security Basics
Check whether a PDF is encrypted, encrypt with standard password security, decrypt with a password, and inspect permission flags such as printing, copying, annotation, form filling, and assembly permissions.
Extract Embedded Content
List and extract embedded file attachments. List image XObjects and extract their encoded image streams to destination FlowPaths for downstream processing.
Good Fits
- Automating document intake checks
- Splitting large PDFs into workflow-ready chunks
- Merging report packets or document bundles
- Removing metadata, JavaScript, links, forms, annotations, or attachments
- Extracting text for search, routing, summaries, or indexing
- Reading PDF structure for validation and audit workflows
- Preparing selected pages as new PDFs
What To Expect
PDF Utils works best with parseable, standards-aligned PDFs. Text extraction depends on the text data actually present in the PDF; scanned image-only PDFs usually need OCR before useful text can be extracted. Text replacement is best-effort because PDF text is often stored in encoded, fragmented content streams.
Image extraction returns the embedded image stream bytes as stored in the PDF. It does not render pages or convert images to a normalized output format.
Current Limits
This package does not currently include page rendering, thumbnails, OCR, HTML/Office-to-PDF conversion, PDF/A conversion, digital signing, accessibility validation, redaction, watermarking, or form filling. Those workflows require a renderer, OCR engine, conversion stack, signing stack, or a fuller PDF appearance editing layer.
Included Node Areas
- Inspect
- Pages
- Text
- Metadata
- Annotations and links
- Images
- Attachments
- Forms
- Security and cleanup
- Optimization
Publisher and License
Published by Rheosoph GmbH. Licensed under either Apache-2.0 or MIT, at your option.
Use Case
PDF Utilities
Provided Nodes
55 nodes included in this package.
PDF / Annotations
Lists annotations on selected pages
Lists link annotations on selected pages
Removes page annotation arrays from selected pages and writes a new PDF
Removes link annotations from selected pages and writes a new PDF
PDF / Attachments
Writes selected embedded file attachments to destination FlowPaths
Lists embedded file attachments from name trees and file attachment annotations
Removes embedded file references and file attachment annotations from a PDF
PDF / Forms
Lists AcroForm field definitions and values
Removes the AcroForm catalog entry and writes a new PDF
PDF / Images
Writes encoded image XObject streams from selected pages to destination FlowPaths
PDF / Inspect
Detects pages with no content stream bytes or no extractable text
Lists outline bookmarks and their destination pages when available
Lists font resources used by selected pages
Lists image XObjects referenced by selected pages
Reads PDF version, page count, object count, encryption presence, and file details
Counts pages in a PDF
Reads MediaBox, CropBox, BleedBox, TrimBox, and ArtBox for selected pages
Checks whether a FlowPath points to a parseable PDF and returns diagnostics
PDF / Metadata
Reads the document information dictionary from a PDF
Reads the catalog XMP metadata stream from a PDF
Removes document information metadata and optionally XMP metadata
Removes catalog XMP metadata and writes a new PDF
Writes document information metadata to a new PDF FlowPath
PDF / Optimize
Compresses PDF streams and writes the optimized PDF to a FlowPath
PDF / Pages
Writes a new PDF with CropBox set on selected pages
Writes a new PDF with selected pages removed
Duplicates one page after its original position and writes a new PDF
Writes a new PDF containing only selected pages
Inserts pages from one PDF into another and writes a new PDF
Merges multiple PDF FlowPaths into one destination PDF
Removes detected blank pages and writes a new PDF
Removes outline bookmarks and writes a new PDF
Writes a new PDF with pages in the requested order
Replaces a contiguous page range with pages from another PDF
Writes a new PDF with all pages in reverse order
Writes a new PDF with selected pages rotated in 90 degree increments
Writes a new PDF with a selected page box updated on selected pages
Writes one output PDF per bookmark section at the selected outline level
Writes one output PDF per requested page range
Writes output PDFs containing fixed-size page chunks
PDF / Security
Decrypts a password-protected PDF and writes an unencrypted copy
Encrypts a PDF with standard password security and writes a new PDF
Checks whether a PDF appears to contain an encryption dictionary
Reads available permission flags for PDFs loaded with an optional password
Removes common document and object JavaScript action entries
Removes metadata, XMP, annotations, and common JavaScript entries from a PDF
PDF / Text
Counts words and characters in extracted PDF text
Extracts text from all pages or a page range
Extracts text per page from a PDF
Extracts PDF text into a CSV string with one row per page
Extracts PDF text and returns both typed page records and a JSON string
Extracts PDF text and formats it as page-section Markdown
Searches extracted PDF text with a regular expression
Replaces exact encoded text occurrences on selected pages and writes a new PDF
Searches extracted PDF text for a literal query