Convert PDF Document
The Convert PDF Document action converts a PDF file to another format and returns the resulting file content.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| Document | File content | Yes | PDF file content (e.g. the output of a Get file content action from SharePoint or OneDrive). |
| Output Format | String | Yes | Desired target format. One of: .doc, .docx, .pptx, .html, SVG. |
Optional Parameters by Output Format
DOC
| Parameter | Type | Description | Options |
|---|---|---|---|
| Add Return To Line End | Boolean | Append a line break at the end of each extracted paragraph. | Yes, No |
| Image Resolution X | Integer | Horizontal resolution for any images extracted from the PDF. | Any integer |
| Image Resolution Y | Integer | Vertical resolution for any images extracted from the PDF. | Any integer |
| Recognition Mode | Enum | OCR layout strategy for text recognition. | Textbox, Enhanced flow, Flow |
| Recognise Bullets | Boolean | Convert bullet characters into actual list items. | Yes, No |
DOCX
| Parameter | Type | Description | Options |
|---|---|---|---|
| Add Return To Line End | Boolean | Append a line break at the end of each extracted paragraph. | Yes, No |
| Image Resolution X | Integer | Horizontal resolution for any images extracted from the PDF. | Any integer |
| Image Resolution Y | Integer | Vertical resolution for any images extracted from the PDF. | Any integer |
| Recognition Mode | Enum | OCR layout strategy for text recognition. | Textbox, Enhanced flow, Flow |
| Recognise Bullets | Boolean | Convert bullet characters into actual list items. | Yes, No |
HTML
| Parameter | Type | Description | Options |
|---|---|---|---|
| Scale To Pixels | Boolean | Render vector elements at pixel-aligned dimensions. | Yes, No |
| Explicit List Of Saved Pages | Integer Array | One-based list of page numbers to convert; only these pages will be rendered. | e.g. [1,3,5] |
| Fixed Layout | Boolean | Preserve original PDF layout exactly, without any reflow. | Yes, No |
| Flow Layout Paragraph Full Width | Boolean | Allow paragraphs to span the full available width when reflowing text. | Yes, No |
| Image Resolution | Integer | Resolution for any images rendered within the HTML output. | Any integer |
| Minimal Line Width | Float | Minimum stroke width (in pixels) for lines. | Any positive number |
PPTX
| Parameter | Type | Description | Options |
|---|---|---|---|
| Image Resolution | Integer | Resolution for any images embedded in the generated slides. | Any integer |
| Optimise Text Boxes | Boolean | Merge and simplify text boxes for cleaner PowerPoint editing. | Yes, No |
SVG
| Parameter | Type | Description | Options |
|---|---|---|---|
| Scale to Pixels | Boolean | Render SVG elements with pixel-aligned dimensions. | Yes, No |
Returns
| Name | Type | Description |
|---|---|---|
| File content | String (base64‑encoded) | Base64‑encoded bytes of the converted file, suitable for downstream actions (e.g. “Create file”). |
Troubleshooting
Click to expand common errors and fixes
Document Missing, Truncated or Invalid
Cause:
PDF input is empty, corrupted, or not valid PDF bytes.
Fix:
- Provide the full PDF base64 and verify it decodes to a valid PDF.
- Test conversion with a sample PDF to confirm baseline behavior.
Unsupported or Misspelled Output Format
Cause:
Output format string is incorrect or not supported.
Fix:
- Verify the format value exactly matches one of the supported strings (
.doc,.docx,.pptx,.html,SVG). - Correct typos and retry.
Password-Protected or Encrypted PDF
Cause:
PDF is encrypted or requires a password to open.
Fix:
- Supply an unencrypted copy.
- Remove password protection prior to conversion.
OCR / Recognition Errors (Text Missing, Garbled, or Layout Incorrect)
Cause:
Source PDF is scanned or contains images of text and OCR settings are inappropriate (wrong recognition mode, low resolution).
Fix:
- Select an OCR/recognition mode appropriate to the content (e.g., Enhanced flow for complex layouts).
- Increase image resolution parameters if text is small or low-quality.
- For consistently poor OCR, provide a higher-quality source or pre-OCR the PDF with a specialised OCR tool.
Bullets, Lists or Paragraph Breaks Lost or Malformed
Cause:
Recognition settings or paragraph handling options (e.g., “Add Return To Line End”) do not match the PDF’s layout semantics.
Fix:
- Toggle paragraph/line-end options to preserve expected line breaks.
- Enable “Recognise Bullets” if list detection is required.
- Test small samples to confirm behavior.
Images Missing or Poor-Quality in Output
Cause:
Image extraction/resolution settings too low, or compression removed image fidelity.
Fix:
- Increase image resolution (
Image Resolution X/Y) and set a higher image quality/compression setting. - Use lossless or less aggressive compression if detail is important.
Output Corrupted or Cannot Be Opened
Cause:
Returned payload truncated, incorrectly encoded, or not matching expected type.
Fix:
- Verify the full base64 string is returned and decodes without error.
- Re-run with a minimal PDF to confirm whether truncation occurs consistently.
Page Selection or Partial Conversion Wrong
Cause:
Options specifying page ranges, explicit pages, or page indices are invalid or out of range.
Fix:
- Validate page indices/arrays and ensure values are within the PDF’s page count.
- Use default full-range behavior if unsure.
Performance, Timeouts or Resource Limits on Large PDFs
Cause:
Very large PDFs or extremely high DPI/image settings exceed processing limits.
Fix:
- Reduce DPI/resolution or convert fewer pages at a time.
- Test with representative smaller files to determine safe limits.
Invalid Format-Specific Option Values
Cause:
Out-of-range integers, invalid enum values, or conflicting options for the chosen format.
Fix:
- Validate numeric ranges and use documented enum values.
- Remove conflicting parameters and retry.
Generic Runtime or Transient Failure
Cause:
Malformed inputs, intermittent conversion engine error, or unexpected internal state.
Fix:
- Reproduce the failure with a minimal PDF and a minimal option set.
- Validate inputs and retry to rule out transients.
Quick Checklist
- Document contains the complete PDF binary/base64 payload (not a path/URL).
- Output Format is exactly one of the supported values and correctly spelled.
- If PDF pages are targeted, page indices/ranges are valid and within the document’s page count.
- OCR/recognition mode and image resolution are appropriate for scanned PDFs.
- Image resolution and compression settings are tuned for required fidelity.
- For editable outputs (
.doc/.docx/.pptx), expect reflow and validate on a small sample. - If output is corrupted, verify base64 decodes correctly and reproduce with a minimal test file.
- For persistent failures, reproduce with: the exact PDF bytes (or a small redacted sample), the chosen Output Format, and the full set of format options - that input set is required to isolate the root cause.