Extract PDF Data

The Extract PDF Data action pulls interactive form‑field values and XMP/document metadata from a PDF, returning JSON objects representing each.

Parameters

Name	Type	Required	Description
Document	`File content`	Yes	PDF file content (e.g. output of a “Get file content” action from SharePoint or OneDrive).

Returns

This action returns two distinct JSON objects:

Name	Type	Description
Metadata	Object	Key/value mapping of metadata extracted from the PDF. Each property name is the metadata key.
Fields	Object	Key/value mapping of form field values extracted from the PDF. Each property name is the field’s name.

Example

A simple fillable PDF form with these fields:

Field Label	Field Name	Type
First Name	`first_name`	Textbox
Last Name	`last_name`	Textbox
Email Address	`email`	Textbox
Subscribe to Newsletter	`subscribe_newsletter`	Checkbox

The PDF also contains these XMP metadata entries:

Metadata Key	Value
`xmp:Department`	`Sales`
`xmp:Created`	`2021-04-15T14:42:15Z`
`dc:creator`	`Alice Johnson`
`pdf:Producer`	`Adobe Acrobat`

{
"first_name": "John",
"last_name": "Smith",
"email": "john.smith@example.com",
"subscribe_newsletter": true
}

{
  "xmp:Department": "Sales",
  "xmp:Created": "2021-04-15T14:42:15Z",
  "dc:creator": "Alice Johnson",
  "pdf:Producer": "Adobe Acrobat"
}

Troubleshooting

Click to expand common errors and fixes

Document Missing, Truncated or Invalid

Cause:
PDF input is empty, truncated, or not a valid PDF file.

Fix:

Provide the complete PDF binary/base64 payload.
Verify the file opens successfully in a PDF reader to confirm it is not corrupted.

No Form Fields or Metadata Were Returned

Cause:
The PDF contains no interactive form fields (fields may have been flattened into static content) and/or contains no embedded XMP or document metadata.

Fix:

Confirm the source PDF contains interactive form fields and that fields are not flattened.
Confirm metadata is embedded as XMP or document-level properties.
If the PDF has been exported from another system, re-export with forms and metadata intact.

Form Fields Present but Values Are Empty or Default

Cause:
Fields exist but have not been filled, or the PDF stores default values instead of submitted values.

Fix:

Verify the PDF was saved after field entry (some viewers keep values in memory until the file is saved).
Check whether values are stored as defaults versus current values and save the PDF with current values prior to extraction.

Field Names Duplicated or Ambiguous Keys

Cause:
Multiple fields share the same name or use characters that cause collisions in the returned JSON structure.

Fix:

Ensure field names are unique within the form.
If duplicates are required, rename fields or add an index or namespace to make keys unique before extraction.

XMP / Metadata Keys Missing, Malformed or Encoded Incorrectly

Cause:
Metadata is absent, stored in a non-standard location, or uses non-standard character encodings.

Fix:

Inspect the PDF’s metadata with a PDF tool to confirm presence and encoding.
Re-embed metadata using standard XMP or document metadata practices and UTF-8 where possible.

Flattened Forms (Fields Rendered as Static Content)

Cause:
Fields have been flattened during export, making them part of the page content rather than interactive fields. Extraction cannot recover flattened values.

Fix:

Use a copy of the PDF where fields are preserved (unflattened).
If only flattened files exist, extract values by other means.

Encoding or Special-Character Corruption in Field Values

Cause:
Field values or metadata use character encodings that are not preserved, leading to distorted output.

Fix:

Ensure field values and metadata use standard encodings (UTF-8 or Unicode).
Test extraction with sample input containing representative characters to verify behaviour.

Permission, Encryption or Password Protection Prevents Extraction

Cause:
PDF is encrypted, password-protected, or has permissions that block metadata or form access.

Fix:

Provide an unencrypted copy with extraction permissions enabled.
Remove password protection before attempting extraction.

Generic Runtime or Transient Error

Cause:
Malformed PDF internals, intermittent extraction engine failure, or unexpected internal state.

Fix:

Reproduce the issue with a minimal sample PDF containing a single field and a simple metadata entry.
Validate inputs and retry to rule out transient problems.

Quick Checklist

Document is a complete PDF binary/base64 payload (not a file path or URL).
Confirm the PDF contains interactive form fields (not flattened) and/or embedded XMP or document metadata.
Field names are unique and use safe characters for JSON keys.
Checkbox or radio values are standardised or normalised downstream.
Metadata uses standard XMP or document properties and UTF-8-friendly encodings.
PDF is not password protected or encrypted in a way that blocks access.
Reproduce with a minimal sample (one field plus one metadata key) to isolate issues before testing larger documents.