PDF Tools

PDF Tools

PDF tools for Flow-Like workflows

Free
v0.1.0 2,044 downloads ✓ Verified MIT public

About

PDF Utils

Build PDF workflows directly in Flow-Like. PDF Utils provides a broad set of nodes for inspecting, editing, splitting, merging, cleaning, and extracting data from PDF files without leaving your flow.

The package is designed around FlowPath: every PDF input is passed as a FlowPath, and every generated PDF or extracted file is written to a FlowPath destination. The filename is already part of the FlowPath, so nodes do not ask for separate filename fields.

What You Can Do

Inspect Documents

Read core PDF details such as page count, PDF version, file size, object count, encryption state, metadata presence, page boxes, fonts, images, bookmarks, annotations, links, attachments, form fields, and blank-page candidates.

Work With Pages

Create new PDFs from selected pages, delete pages, merge files, rotate pages, crop or set page boxes, remove blank pages, reorder pages, reverse page order, duplicate pages, insert pages from another PDF, replace page ranges, and split documents by ranges, chunk size, or bookmark sections.

Extract and Search Text

Extract text from full documents or page ranges, return per-page text records, count words and characters, search with literal text or regular expressions, and export extracted text to Markdown, JSON, or CSV.

Manage Metadata and Cleanup

Read, write, or remove document information metadata. Read and remove XMP metadata. Remove JavaScript, automatic actions, annotations, links, bookmarks, attachments, and form field definitions. Use the sanitize node for common cleanup of active content and identifying metadata.

Security Basics

Check whether a PDF is encrypted, encrypt with standard password security, decrypt with a password, and inspect permission flags such as printing, copying, annotation, form filling, and assembly permissions.

Extract Embedded Content

List and extract embedded file attachments. List image XObjects and extract their encoded image streams to destination FlowPaths for downstream processing.

Good Fits

  • Automating document intake checks
  • Splitting large PDFs into workflow-ready chunks
  • Merging report packets or document bundles
  • Removing metadata, JavaScript, links, forms, annotations, or attachments
  • Extracting text for search, routing, summaries, or indexing
  • Reading PDF structure for validation and audit workflows
  • Preparing selected pages as new PDFs

What To Expect

PDF Utils works best with parseable, standards-aligned PDFs. Text extraction depends on the text data actually present in the PDF; scanned image-only PDFs usually need OCR before useful text can be extracted. Text replacement is best-effort because PDF text is often stored in encoded, fragmented content streams.

Image extraction returns the embedded image stream bytes as stored in the PDF. It does not render pages or convert images to a normalized output format.

Current Limits

This package does not currently include page rendering, thumbnails, OCR, HTML/Office-to-PDF conversion, PDF/A conversion, digital signing, accessibility validation, redaction, watermarking, or form filling. Those workflows require a renderer, OCR engine, conversion stack, signing stack, or a fuller PDF appearance editing layer.

Included Node Areas

  • Inspect
  • Pages
  • Text
  • Metadata
  • Annotations and links
  • Images
  • Attachments
  • Forms
  • Security and cleanup
  • Optimization

Publisher and License

Published by Rheosoph GmbH. Licensed under either Apache-2.0 or MIT, at your option.

PDF Utils Build PDF workflows directly in Flow-Like. PDF Utils provides a broad set of nodes for inspecting, editing, splitting, merging, cleaning, and extracting data from PDF files without leaving your flow. The package is designed around FlowPath : every PDF input is passed as a FlowPath, and every generated PDF or extracted file is written to a FlowPath destination. The filename is already part of the FlowPath, so nodes do not ask for separate filename fields. What You Can Do Inspect Documents Read core PDF details such as page count, PDF version, file size, object count, encryption state, metadata presence, page boxes, fonts, images, bookmarks, annotations, links, attachments, form fields, and blank-page candidates. Work With Pages Create new PDFs from selected pages, delete pages, merge files, rotate pages, crop or set page boxes, remove blank pages, reorder pages, reverse page order, duplicate pages, insert pages from another PDF, replace page ranges, and split documents by ranges, chunk size, or bookmark sections. Extract and Search Text Extract text from full documents or page ranges, return per-page text records, count words and characters, search with literal text or regular expressions, and export extracted text to Markdown, JSON, or CSV. Manage Metadata and Cleanup Read, write, or remove document information metadata. Read and remove XMP metadata. Remove JavaScript, automatic actions, annotations, links, bookmarks, attachments, and form field definitions. Use the sanitize node for common cleanup of active content and identifying metadata. Security Basics Check whether a PDF is encrypted, encrypt with standard password security, decrypt with a password, and inspect permission flags such as printing, copying, annotation, form filling, and assembly permissions. Extract Embedded Content List and extract embedded file attachments. List image XObjects and extract their encoded image streams to destination FlowPaths for downstream processing. Good Fits Automating document intake checks Splitting large PDFs into workflow-ready chunks Merging report packets or document bundles Removing metadata, JavaScript, links, forms, annotations, or attachments Extracting text for search, routing, summaries, or indexing Reading PDF structure for validation and audit workflows Preparing selected pages as new PDFs What To Expect PDF Utils works best with parseable, standards-aligned PDFs. Text extraction depends on the text data actually present in the PDF; scanned image-only PDFs usually need OCR before useful text can be extracted. Text replacement is best-effort because PDF text is often stored in encoded, fragmented content streams. Image extraction returns the embedded image stream bytes as stored in the PDF. It does not render pages or convert images to a normalized output format. Current Limits This package does not currently include page rendering, thumbnails, OCR, HTML/Office-to-PDF conversion, PDF/A conversion, digital signing, accessibility validation, redaction, watermarking, or form filling. Those workflows require a renderer, OCR engine, conversion stack, signing stack, or a fuller PDF appearance editing layer. Included Node Areas Inspect Pages Text Metadata Annotations and links Images Attachments Forms Security and cleanup Optimization Publisher and License Published by Rheosoph GmbH. Licensed under either Apache-2.0 or MIT, at your option.

Use Case

PDF Utilities

PDF Utilities

Provided Nodes

55 nodes included in this package.

PDF / Annotations

List PDF Annotations

Lists annotations on selected pages

storage:read
List PDF Links

Lists link annotations on selected pages

storage:read
Remove PDF Annotations

Removes page annotation arrays from selected pages and writes a new PDF

storage:read storage:write
Remove PDF Links

Removes link annotations from selected pages and writes a new PDF

storage:read storage:write

PDF / Attachments

Extract PDF Attachments

Writes selected embedded file attachments to destination FlowPaths

storage:read storage:write
List PDF Attachments

Lists embedded file attachments from name trees and file attachment annotations

storage:read
Remove PDF Attachments

Removes embedded file references and file attachment annotations from a PDF

storage:read storage:write

PDF / Forms

List PDF Form Fields

Lists AcroForm field definitions and values

storage:read
Remove PDF Form Fields

Removes the AcroForm catalog entry and writes a new PDF

storage:read storage:write

PDF / Images

Extract PDF Images

Writes encoded image XObject streams from selected pages to destination FlowPaths

storage:read storage:write

PDF / Inspect

Detect Blank Pages

Detects pages with no content stream bytes or no extractable text

storage:read
List PDF Bookmarks

Lists outline bookmarks and their destination pages when available

storage:read
List PDF Fonts

Lists font resources used by selected pages

storage:read
List PDF Images

Lists image XObjects referenced by selected pages

storage:read
PDF Info

Reads PDF version, page count, object count, encryption presence, and file details

storage:read
PDF Page Count

Counts pages in a PDF

storage:read
Read PDF Page Boxes

Reads MediaBox, CropBox, BleedBox, TrimBox, and ArtBox for selected pages

storage:read
Validate PDF

Checks whether a FlowPath points to a parseable PDF and returns diagnostics

storage:read

PDF / Metadata

Read PDF Metadata

Reads the document information dictionary from a PDF

storage:read
Read XMP Metadata

Reads the catalog XMP metadata stream from a PDF

storage:read
Remove PDF Metadata

Removes document information metadata and optionally XMP metadata

storage:read storage:write
Remove XMP Metadata

Removes catalog XMP metadata and writes a new PDF

storage:read storage:write
Set PDF Metadata

Writes document information metadata to a new PDF FlowPath

storage:read storage:write

PDF / Optimize

Compress PDF

Compresses PDF streams and writes the optimized PDF to a FlowPath

storage:read storage:write

PDF / Pages

Crop PDF Pages

Writes a new PDF with CropBox set on selected pages

storage:read storage:write
Delete PDF Pages

Writes a new PDF with selected pages removed

storage:read storage:write
Duplicate PDF Page

Duplicates one page after its original position and writes a new PDF

storage:read storage:write
Extract PDF Pages

Writes a new PDF containing only selected pages

storage:read storage:write
Insert PDF Pages

Inserts pages from one PDF into another and writes a new PDF

storage:read storage:write
Merge PDFs

Merges multiple PDF FlowPaths into one destination PDF

storage:read storage:write
Remove Blank Pages

Removes detected blank pages and writes a new PDF

storage:read storage:write
Remove PDF Bookmarks

Removes outline bookmarks and writes a new PDF

storage:read storage:write
Reorder PDF Pages

Writes a new PDF with pages in the requested order

storage:read storage:write
Replace PDF Pages

Replaces a contiguous page range with pages from another PDF

storage:read storage:write
Reverse PDF Pages

Writes a new PDF with all pages in reverse order

storage:read storage:write
Rotate PDF Pages

Writes a new PDF with selected pages rotated in 90 degree increments

storage:read storage:write
Set PDF Page Box

Writes a new PDF with a selected page box updated on selected pages

storage:read storage:write
Split PDF by Bookmarks

Writes one output PDF per bookmark section at the selected outline level

storage:read storage:write
Split PDF by Ranges

Writes one output PDF per requested page range

storage:read storage:write
Split PDF Every N Pages

Writes output PDFs containing fixed-size page chunks

storage:read storage:write

PDF / Security

Decrypt PDF

Decrypts a password-protected PDF and writes an unencrypted copy

storage:read storage:write
Encrypt PDF

Encrypts a PDF with standard password security and writes a new PDF

storage:read storage:write
Is PDF Encrypted

Checks whether a PDF appears to contain an encryption dictionary

storage:read
Read PDF Permissions

Reads available permission flags for PDFs loaded with an optional password

storage:read
Remove PDF JavaScript

Removes common document and object JavaScript action entries

storage:read storage:write
Sanitize PDF

Removes metadata, XMP, annotations, and common JavaScript entries from a PDF

storage:read storage:write

PDF / Text

Count PDF Words

Counts words and characters in extracted PDF text

storage:read
Extract PDF Text

Extracts text from all pages or a page range

storage:read
Extract PDF Text By Page

Extracts text per page from a PDF

storage:read
PDF Text To CSV

Extracts PDF text into a CSV string with one row per page

storage:read
PDF Text To JSON

Extracts PDF text and returns both typed page records and a JSON string

storage:read
PDF Text To Markdown

Extracts PDF text and formats it as page-section Markdown

storage:read
Regex Search PDF Text

Searches extracted PDF text with a regular expression

storage:read
Replace PDF Text

Replaces exact encoded text occurrences on selected pages and writes a new PDF

storage:read storage:write
Search PDF Text

Searches extracted PDF text for a literal query

storage:read

Versions

v0.1.0 May 30, 2026
1547 KB

Have feedback?

Found an issue with this package or have suggestions for improvement? Let us know.