Extract PDF Data
The Extract PDF Data action pulls interactive form‑field values and XMP/document metadata from a PDF, returning JSON objects representing each.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| Document | File content | Yes | PDF file content (e.g. output of a “Get file content” action from SharePoint or OneDrive). |
Returns
This action returns two distinct JSON objects:
| Name | Type | Description |
|---|---|---|
| Metadata | Object | Key/value mapping of metadata extracted from the PDF. Each property name is the metadata key. |
| Fields | Object | Key/value mapping of form field values extracted from the PDF. Each property name is the field’s name. |
Example
A simple fillable PDF form with these fields:
| Field Label | Field Name | Type |
|---|---|---|
| First Name | first_name | Textbox |
| Last Name | last_name | Textbox |
| Email Address | email | Textbox |
| Subscribe to Newsletter | subscribe_newsletter | Checkbox |
The PDF also contains these XMP metadata entries:
| Metadata Key | Value |
|---|---|
xmp:Department | Sales |
xmp:Created | 2021-04-15T14:42:15Z |
dc:creator | Alice Johnson |
pdf:Producer | Adobe Acrobat |
{"first_name": "John","last_name": "Smith","email": "john.smith@example.com","subscribe_newsletter": true}{ "xmp:Department": "Sales", "xmp:Created": "2021-04-15T14:42:15Z", "dc:creator": "Alice Johnson", "pdf:Producer": "Adobe Acrobat"}Troubleshooting
Click to expand common errors and fixes
Document Missing, Truncated or Invalid
Cause:
PDF input is empty, truncated, or not a valid PDF file.
Fix:
- Provide the complete PDF binary/base64 payload.
- Verify the file opens successfully in a PDF reader to confirm it is not corrupted.
No Form Fields or Metadata Were Returned
Cause:
The PDF contains no interactive form fields (fields may have been flattened into static content) and/or contains no embedded XMP or document metadata.
Fix:
- Confirm the source PDF contains interactive form fields and that fields are not flattened.
- Confirm metadata is embedded as XMP or document-level properties.
- If the PDF has been exported from another system, re-export with forms and metadata intact.
Form Fields Present but Values Are Empty or Default
Cause:
Fields exist but have not been filled, or the PDF stores default values instead of submitted values.
Fix:
- Verify the PDF was saved after field entry (some viewers keep values in memory until the file is saved).
- Check whether values are stored as defaults versus current values and save the PDF with current values prior to extraction.
Field Names Duplicated or Ambiguous Keys
Cause:
Multiple fields share the same name or use characters that cause collisions in the returned JSON structure.
Fix:
- Ensure field names are unique within the form.
- If duplicates are required, rename fields or add an index or namespace to make keys unique before extraction.
XMP / Metadata Keys Missing, Malformed or Encoded Incorrectly
Cause:
Metadata is absent, stored in a non-standard location, or uses non-standard character encodings.
Fix:
- Inspect the PDF’s metadata with a PDF tool to confirm presence and encoding.
- Re-embed metadata using standard XMP or document metadata practices and UTF-8 where possible.
Flattened Forms (Fields Rendered as Static Content)
Cause:
Fields have been flattened during export, making them part of the page content rather than interactive fields. Extraction cannot recover flattened values.
Fix:
- Use a copy of the PDF where fields are preserved (unflattened).
- If only flattened files exist, extract values by other means.
Encoding or Special-Character Corruption in Field Values
Cause:
Field values or metadata use character encodings that are not preserved, leading to distorted output.
Fix:
- Ensure field values and metadata use standard encodings (UTF-8 or Unicode).
- Test extraction with sample input containing representative characters to verify behaviour.
Permission, Encryption or Password Protection Prevents Extraction
Cause:
PDF is encrypted, password-protected, or has permissions that block metadata or form access.
Fix:
- Provide an unencrypted copy with extraction permissions enabled.
- Remove password protection before attempting extraction.
Generic Runtime or Transient Error
Cause:
Malformed PDF internals, intermittent extraction engine failure, or unexpected internal state.
Fix:
- Reproduce the issue with a minimal sample PDF containing a single field and a simple metadata entry.
- Validate inputs and retry to rule out transient problems.
Quick Checklist
- Document is a complete PDF binary/base64 payload (not a file path or URL).
- Confirm the PDF contains interactive form fields (not flattened) and/or embedded XMP or document metadata.
- Field names are unique and use safe characters for JSON keys.
- Checkbox or radio values are standardised or normalised downstream.
- Metadata uses standard XMP or document properties and UTF-8-friendly encodings.
- PDF is not password protected or encrypted in a way that blocks access.
- Reproduce with a minimal sample (one field plus one metadata key) to isolate issues before testing larger documents.