Skip to content
This documentation is currently in preview, therefore subject to change.

Extract PDF Data

The Extract PDF Data action pulls interactive form‑field values and XMP/document metadata from a PDF, returning JSON objects representing each.

Parameters

NameTypeRequiredDescription
DocumentFile contentYesPDF file content (e.g. output of a “Get file content” action from SharePoint or OneDrive).

Returns

This action returns two distinct JSON objects:

NameTypeDescription
MetadataObjectKey/value mapping of metadata extracted from the PDF. Each property name is the metadata key.
FieldsObjectKey/value mapping of form field values extracted from the PDF. Each property name is the field’s name.

Example

A simple fillable PDF form with these fields:

Field LabelField NameType
First Namefirst_nameTextbox
Last Namelast_nameTextbox
Email AddressemailTextbox
Subscribe to Newslettersubscribe_newsletterCheckbox

The PDF also contains these XMP metadata entries:

Metadata KeyValue
xmp:DepartmentSales
xmp:Created2021-04-15T14:42:15Z
dc:creatorAlice Johnson
pdf:ProducerAdobe Acrobat

Troubleshooting

Click to expand common errors and fixes

Document Missing, Truncated or Invalid

Cause:
PDF input is empty, truncated, or not a valid PDF file.

Fix:

  • Provide the complete PDF binary/base64 payload.
  • Verify the file opens successfully in a PDF reader to confirm it is not corrupted.

No Form Fields or Metadata Were Returned

Cause:
The PDF contains no interactive form fields (fields may have been flattened into static content) and/or contains no embedded XMP or document metadata.

Fix:

  • Confirm the source PDF contains interactive form fields and that fields are not flattened.
  • Confirm metadata is embedded as XMP or document-level properties.
  • If the PDF has been exported from another system, re-export with forms and metadata intact.

Form Fields Present but Values Are Empty or Default

Cause:
Fields exist but have not been filled, or the PDF stores default values instead of submitted values.

Fix:

  • Verify the PDF was saved after field entry (some viewers keep values in memory until the file is saved).
  • Check whether values are stored as defaults versus current values and save the PDF with current values prior to extraction.

Field Names Duplicated or Ambiguous Keys

Cause:
Multiple fields share the same name or use characters that cause collisions in the returned JSON structure.

Fix:

  • Ensure field names are unique within the form.
  • If duplicates are required, rename fields or add an index or namespace to make keys unique before extraction.

XMP / Metadata Keys Missing, Malformed or Encoded Incorrectly

Cause:
Metadata is absent, stored in a non-standard location, or uses non-standard character encodings.

Fix:

  • Inspect the PDF’s metadata with a PDF tool to confirm presence and encoding.
  • Re-embed metadata using standard XMP or document metadata practices and UTF-8 where possible.

Flattened Forms (Fields Rendered as Static Content)

Cause:
Fields have been flattened during export, making them part of the page content rather than interactive fields. Extraction cannot recover flattened values.

Fix:

  • Use a copy of the PDF where fields are preserved (unflattened).
  • If only flattened files exist, extract values by other means.

Encoding or Special-Character Corruption in Field Values

Cause:
Field values or metadata use character encodings that are not preserved, leading to distorted output.

Fix:

  • Ensure field values and metadata use standard encodings (UTF-8 or Unicode).
  • Test extraction with sample input containing representative characters to verify behaviour.

Permission, Encryption or Password Protection Prevents Extraction

Cause:
PDF is encrypted, password-protected, or has permissions that block metadata or form access.

Fix:

  • Provide an unencrypted copy with extraction permissions enabled.
  • Remove password protection before attempting extraction.

Generic Runtime or Transient Error

Cause:
Malformed PDF internals, intermittent extraction engine failure, or unexpected internal state.

Fix:

  • Reproduce the issue with a minimal sample PDF containing a single field and a simple metadata entry.
  • Validate inputs and retry to rule out transient problems.

Quick Checklist

  • Document is a complete PDF binary/base64 payload (not a file path or URL).
  • Confirm the PDF contains interactive form fields (not flattened) and/or embedded XMP or document metadata.
  • Field names are unique and use safe characters for JSON keys.
  • Checkbox or radio values are standardised or normalised downstream.
  • Metadata uses standard XMP or document properties and UTF-8-friendly encodings.
  • PDF is not password protected or encrypted in a way that blocks access.
  • Reproduce with a minimal sample (one field plus one metadata key) to isolate issues before testing larger documents.