PDF Tools

PDF tools for Flow-Like workflows

Free

v0.1.0 2,044 downloads ✓ Verified MIT public

Open in Web App Open in Desktop App

About

PDF Utils

Build PDF workflows directly in Flow-Like. PDF Utils provides a broad set of nodes for inspecting, editing, splitting, merging, cleaning, and extracting data from PDF files without leaving your flow.

The package is designed around FlowPath: every PDF input is passed as a FlowPath, and every generated PDF or extracted file is written to a FlowPath destination. The filename is already part of the FlowPath, so nodes do not ask for separate filename fields.

What You Can Do

Inspect Documents

Read core PDF details such as page count, PDF version, file size, object count, encryption state, metadata presence, page boxes, fonts, images, bookmarks, annotations, links, attachments, form fields, and blank-page candidates.

Work With Pages

Create new PDFs from selected pages, delete pages, merge files, rotate pages, crop or set page boxes, remove blank pages, reorder pages, reverse page order, duplicate pages, insert pages from another PDF, replace page ranges, and split documents by ranges, chunk size, or bookmark sections.

Extract and Search Text

Extract text from full documents or page ranges, return per-page text records, count words and characters, search with literal text or regular expressions, and export extracted text to Markdown, JSON, or CSV.

Manage Metadata and Cleanup

Read, write, or remove document information metadata. Read and remove XMP metadata. Remove JavaScript, automatic actions, annotations, links, bookmarks, attachments, and form field definitions. Use the sanitize node for common cleanup of active content and identifying metadata.

Security Basics

Check whether a PDF is encrypted, encrypt with standard password security, decrypt with a password, and inspect permission flags such as printing, copying, annotation, form filling, and assembly permissions.

Extract Embedded Content

List and extract embedded file attachments. List image XObjects and extract their encoded image streams to destination FlowPaths for downstream processing.

Good Fits

Automating document intake checks

Splitting large PDFs into workflow-ready chunks

Merging report packets or document bundles

Removing metadata, JavaScript, links, forms, annotations, or attachments

Extracting text for search, routing, summaries, or indexing

Reading PDF structure for validation and audit workflows

Preparing selected pages as new PDFs

What To Expect

PDF Utils works best with parseable, standards-aligned PDFs. Text extraction depends on the text data actually present in the PDF; scanned image-only PDFs usually need OCR before useful text can be extracted. Text replacement is best-effort because PDF text is often stored in encoded, fragmented content streams.

Image extraction returns the embedded image stream bytes as stored in the PDF. It does not render pages or convert images to a normalized output format.

Current Limits

This package does not currently include page rendering, thumbnails, OCR, HTML/Office-to-PDF conversion, PDF/A conversion, digital signing, accessibility validation, redaction, watermarking, or form filling. Those workflows require a renderer, OCR engine, conversion stack, signing stack, or a fuller PDF appearance editing layer.

Included Node Areas

Inspect

Pages

Text

Metadata

Annotations and links

Images

Attachments

Forms

Security and cleanup

Optimization

Publisher and License

Published by Rheosoph GmbH. Licensed under either Apache-2.0 or MIT, at your option.

PDF Utils Build PDF workflows directly in Flow-Like. PDF Utils provides a broad set of nodes for inspecting, editing, splitting, merging, cleaning, and extracting data from PDF files without leaving your flow. The package is designed around FlowPath : every PDF input is passed as a FlowPath, and every generated PDF or extracted file is written to a FlowPath destination. The filename is already part of the FlowPath, so nodes do not ask for separate filename fields. What You Can Do Inspect Documents Read core PDF details such as page count, PDF version, file size, object count, encryption state, metadata presence, page boxes, fonts, images, bookmarks, annotations, links, attachments, form fields, and blank-page candidates. Work With Pages Create new PDFs from selected pages, delete pages, merge files, rotate pages, crop or set page boxes, remove blank pages, reorder pages, reverse page order, duplicate pages, insert pages from another PDF, replace page ranges, and split documents by ranges, chunk size, or bookmark sections. Extract and Search Text Extract text from full documents or page ranges, return per-page text records, count words and characters, search with literal text or regular expressions, and export extracted text to Markdown, JSON, or CSV. Manage Metadata and Cleanup Read, write, or remove document information metadata. Read and remove XMP metadata. Remove JavaScript, automatic actions, annotations, links, bookmarks, attachments, and form field definitions. Use the sanitize node for common cleanup of active content and identifying metadata. Security Basics Check whether a PDF is encrypted, encrypt with standard password security, decrypt with a password, and inspect permission flags such as printing, copying, annotation, form filling, and assembly permissions. Extract Embedded Content List and extract embedded file attachments. List image XObjects and extract their encoded image streams to destination FlowPaths for downstream processing. Good Fits Automating document intake checks Splitting large PDFs into workflow-ready chunks Merging report packets or document bundles Removing metadata, JavaScript, links, forms, annotations, or attachments Extracting text for search, routing, summaries, or indexing Reading PDF structure for validation and audit workflows Preparing selected pages as new PDFs What To Expect PDF Utils works best with parseable, standards-aligned PDFs. Text extraction depends on the text data actually present in the PDF; scanned image-only PDFs usually need OCR before useful text can be extracted. Text replacement is best-effort because PDF text is often stored in encoded, fragmented content streams. Image extraction returns the embedded image stream bytes as stored in the PDF. It does not render pages or convert images to a normalized output format. Current Limits This package does not currently include page rendering, thumbnails, OCR, HTML/Office-to-PDF conversion, PDF/A conversion, digital signing, accessibility validation, redaction, watermarking, or form filling. Those workflows require a renderer, OCR engine, conversion stack, signing stack, or a fuller PDF appearance editing layer. Included Node Areas Inspect Pages Text Metadata Annotations and links Images Attachments Forms Security and cleanup Optimization Publisher and License Published by Rheosoph GmbH. Licensed under either Apache-2.0 or MIT, at your option.

Use Case

PDF Utilities

Provided Nodes

55 nodes included in this package.

PDF / Annotations

List PDF Annotations

Lists annotations on selected pages

storage:read

List PDF Links

Lists link annotations on selected pages

storage:read

Remove PDF Annotations

Removes page annotation arrays from selected pages and writes a new PDF

storage:read storage:write

Remove PDF Links

Removes link annotations from selected pages and writes a new PDF

storage:read storage:write

PDF / Attachments

Extract PDF Attachments

Writes selected embedded file attachments to destination FlowPaths

storage:read storage:write

List PDF Attachments

Lists embedded file attachments from name trees and file attachment annotations

storage:read

Remove PDF Attachments

Removes embedded file references and file attachment annotations from a PDF

storage:read storage:write

PDF / Forms

List PDF Form Fields

Lists AcroForm field definitions and values

storage:read

Remove PDF Form Fields

Removes the AcroForm catalog entry and writes a new PDF

storage:read storage:write

PDF / Images

Extract PDF Images

Writes encoded image XObject streams from selected pages to destination FlowPaths

storage:read storage:write

PDF / Inspect

Detect Blank Pages

Detects pages with no content stream bytes or no extractable text

storage:read

List PDF Bookmarks

Lists outline bookmarks and their destination pages when available

storage:read

List PDF Fonts

Lists font resources used by selected pages

storage:read

List PDF Images

Lists image XObjects referenced by selected pages

storage:read

PDF Info

Reads PDF version, page count, object count, encryption presence, and file details

storage:read

PDF Page Count

Counts pages in a PDF

storage:read

Read PDF Page Boxes

Reads MediaBox, CropBox, BleedBox, TrimBox, and ArtBox for selected pages

storage:read

Validate PDF

Checks whether a FlowPath points to a parseable PDF and returns diagnostics

storage:read

PDF / Metadata

Read PDF Metadata

Reads the document information dictionary from a PDF

storage:read

Read XMP Metadata

Reads the catalog XMP metadata stream from a PDF

storage:read

Remove PDF Metadata

Removes document information metadata and optionally XMP metadata

storage:read storage:write

Remove XMP Metadata

Removes catalog XMP metadata and writes a new PDF

storage:read storage:write

Set PDF Metadata

Writes document information metadata to a new PDF FlowPath

storage:read storage:write

PDF / Optimize

Compress PDF

Compresses PDF streams and writes the optimized PDF to a FlowPath

storage:read storage:write

PDF / Pages

Crop PDF Pages

Writes a new PDF with CropBox set on selected pages

storage:read storage:write

Delete PDF Pages

Writes a new PDF with selected pages removed

storage:read storage:write

Duplicate PDF Page

Duplicates one page after its original position and writes a new PDF

storage:read storage:write

Extract PDF Pages

Writes a new PDF containing only selected pages

storage:read storage:write

Insert PDF Pages

Inserts pages from one PDF into another and writes a new PDF

storage:read storage:write

Merge PDFs

Merges multiple PDF FlowPaths into one destination PDF

storage:read storage:write

Remove Blank Pages

Removes detected blank pages and writes a new PDF

storage:read storage:write

Remove PDF Bookmarks

Removes outline bookmarks and writes a new PDF

storage:read storage:write

Reorder PDF Pages

Writes a new PDF with pages in the requested order

storage:read storage:write

Replace PDF Pages

Replaces a contiguous page range with pages from another PDF

storage:read storage:write

Reverse PDF Pages

Writes a new PDF with all pages in reverse order

storage:read storage:write

Rotate PDF Pages

Writes a new PDF with selected pages rotated in 90 degree increments

storage:read storage:write

Set PDF Page Box

Writes a new PDF with a selected page box updated on selected pages

storage:read storage:write

Split PDF by Bookmarks

Writes one output PDF per bookmark section at the selected outline level

storage:read storage:write

Split PDF by Ranges

Writes one output PDF per requested page range

storage:read storage:write

Split PDF Every N Pages

Writes output PDFs containing fixed-size page chunks

storage:read storage:write

PDF / Security

Decrypt PDF

Decrypts a password-protected PDF and writes an unencrypted copy

storage:read storage:write

Encrypt PDF

Encrypts a PDF with standard password security and writes a new PDF

storage:read storage:write

Is PDF Encrypted

Checks whether a PDF appears to contain an encryption dictionary

storage:read

Read PDF Permissions

Reads available permission flags for PDFs loaded with an optional password

storage:read

Remove PDF JavaScript

Removes common document and object JavaScript action entries

storage:read storage:write

Sanitize PDF

Removes metadata, XMP, annotations, and common JavaScript entries from a PDF

storage:read storage:write

PDF / Text

Count PDF Words

Counts words and characters in extracted PDF text

storage:read

Extract PDF Text

Extracts text from all pages or a page range

storage:read

Extract PDF Text By Page

Extracts text per page from a PDF

storage:read

PDF Text To CSV

Extracts PDF text into a CSV string with one row per page

storage:read

PDF Text To JSON

Extracts PDF text and returns both typed page records and a JSON string

storage:read

PDF Text To Markdown

Extracts PDF text and formats it as page-section Markdown

storage:read

Regex Search PDF Text

Searches extracted PDF text with a regular expression

storage:read

Replace PDF Text

Replaces exact encoded text occurrences on selected pages and writes a new PDF

storage:read storage:write

Search PDF Text

Searches extracted PDF text for a literal query

storage:read

Versions

v0.1.0 May 30, 2026

1547 KB

PDF Tools

About

PDF Utils

What You Can Do

Inspect Documents

Work With Pages

Extract and Search Text

Manage Metadata and Cleanup

Security Basics

Extract Embedded Content

Good Fits

What To Expect

Current Limits

Included Node Areas

Publisher and License

Use Case

Provided Nodes

PDF / Annotations

PDF / Attachments

PDF / Forms

PDF / Images

PDF / Inspect

PDF / Metadata

PDF / Optimize

PDF / Pages

PDF / Security

PDF / Text

Versions

More Packages

Have feedback?