Skip to main content
Parsium Documentation

Extraction & Files

How to use Parsium to extract data from documents and populate Salesforce records, plus every supported file type.

Extraction Workflow

Make sure you have completed all setup steps and have at least one field mapping configured before running an extraction.

1

Open a record

Navigate to any Salesforce record that has field mappings configured (for example, an Account, Contact, or Opportunity).

2

Launch the Extraction Wizard

Click the Parsium action button or open the Parsium component on the record page.

3

Upload your file(s)

Drag and drop or click to upload one or more files. You can mix file types. The combined file size must stay under the request limit.

4

Start the extraction

Click the Extract button. Parsium reads your files, builds a prompt using your field mapping instructions, and sends everything to the AI model.

5

Review extracted values

The AI returns structured data for each mapped field. Review the extracted values on screen before applying them.

6

Apply to record

Click Apply to save the extracted values to the Salesforce record. Parsium handles type conversion, resolves lookup fields, and updates the record in a single transaction.

What Happens Behind the Scenes

  1. File processing: Each file is converted to a format the AI can understand. Images become data URLs. DOCX files are decompressed and the text is extracted from the XML. XLSX spreadsheets are parsed into readable text. PDFs are sent with a parser plugin.
  2. Prompt assembly: Parsium builds a structured prompt. The system message tells the AI to be precise. The user message lists each field with its name, type, and your custom instruction.
  3. API call: The prompt and files are sent to OpenRouter, which routes them to your selected AI model. Parsium retries automatically up to 3 times with increasing delays on temporary errors.
  4. Response parsing: The AI returns a JSON object. Parsium extracts this using multiple fallback strategies (direct parse, markdown code block extraction, substring matching).
  5. Value application (3-pass): Pass 1 resolves lookup field values to record IDs. Pass 2 updates related lookup records. Pass 3 updates the main record. All changes happen in a single transaction with rollback on error.
  6. Usage logging: A Usage Log record is created with the model used, token counts, estimated cost, processing time, and success/error status.

Tips for Best Results

  • Use clear, high-resolution files. Blurry images or scanned PDFs with poor quality will reduce extraction accuracy.
  • Write specific instructions in your field mappings. Instead of 'extract the date', write 'extract the invoice date in YYYY-MM-DD format from the top-right corner'.
  • Start with a small number of fields (3 to 5) and add more once you confirm the extraction works well.
  • Use the Precise (0) temperature setting for data extraction. Creative mode adds randomness that is not helpful for structured data.
  • If extraction is slow, try a faster model. Smaller models like GPT-4o Mini or Claude Haiku are faster and cheaper for simple documents.

File Types & Limits

Parsium processes files differently depending on their type. Here is every supported format and how it is handled.

CategoryFormatsHow It's Processed
ImagesPNG, JPG, JPEG, WebP, GIFConverted to base64 data URL and sent as an image for the AI to see and read visually.
PDFPDFSent with a built-in parser plugin that extracts text and layout.
Word DocumentsDOCXDecompressed (DOCX is a ZIP file), then the XML content is parsed to extract all text.
SpreadsheetsXLSXDecompressed, shared string table is read, then cell values are mapped to create a readable text table.
CSVCSVDecoded from base64 and sent as plain text that the AI can read directly.
MarkdownMDDecoded from base64 and sent as plain text with file markers.
AudioMP3, WAVSent as binary audio data. The AI model must support audio input.
VideoMP4, MOV, MPEG, WebMConverted to base64 data URL and sent as video. The AI model must support video input.

The maximum combined request payload is 4 MB. If your files are too large, compress them or upload fewer files at a time. The Salesforce heap size limit is 6 MB, which includes file data plus processing overhead.