Extraction & Files
How to use Parsium to extract data from documents and populate Salesforce records, plus every supported file type.
Extraction Workflow
Make sure you have completed all setup steps and have at least one field mapping configured before running an extraction.
Open a record
Navigate to any Salesforce record that has field mappings configured (for example, an Account, Contact, or Opportunity).
Launch the Extraction Wizard
Click the Parsium action button or open the Parsium component on the record page.
Upload your file(s)
Drag and drop or click to upload one or more files. You can mix file types. The combined file size must stay under the request limit.
Start the extraction
Click the Extract button. Parsium reads the record's current field values and existing child records for context, builds a prompt using your field mapping instructions, and sends everything to the AI model.
Review extracted values
The AI returns structured data for each mapped field. Review the extracted values on screen before applying them.
Apply to record
Click Apply to save the extracted values to the Salesforce record. Parsium handles type conversion, resolves lookup fields, and updates the record in a single transaction.
What Happens Behind the Scenes
- File processing: Each file is converted to a format the AI can understand. Images become data URLs. DOCX files are decompressed and the text is extracted from the XML. XLSX spreadsheets are parsed into readable text. PDFs are sent with a parser plugin.
- Prompt assembly: Parsium builds a structured prompt. The system message tells the AI to be precise. The user message lists each field with its name, type, and your custom instruction.
- API call: The prompt and files are sent to OpenRouter, which routes them to your selected AI model. Parsium retries automatically up to 3 times with increasing delays on temporary errors.
- Response parsing: The AI returns a JSON object. Parsium extracts this using multiple fallback strategies (direct parse, markdown code block extraction, substring matching).
- Value application (4-pass): Pass 1 resolves lookup field values to record IDs (batched by target object). Pass 2 updates related records via batch DML. Pass 3 updates the main record (reads the latest version, applies accepted values, single DML). Pass 4 inserts child records in bulk DML. All wrapped in a database savepoint with rollback on error.
- Usage logging: A Usage Log record is created with the model used, token counts, estimated cost, processing time, and success/error status.
Tips for Best Results
- •Use clear, high-resolution files. Blurry images or scanned PDFs with poor quality will reduce extraction accuracy.
- •Write specific instructions in your field mappings. Instead of 'extract the date', write 'extract the invoice date in YYYY-MM-DD format from the top-right corner'.
- •Start with a small number of fields (3 to 5) and add more once you confirm the extraction works well.
- •If extraction is slow, try a faster model. Smaller models are faster and cheaper for simple documents.
File Types & Limits
Parsium processes files differently depending on their type. Here is every supported format and how it is handled.
| Category | Formats | How It's Processed |
|---|---|---|
| Images | PNG, JPG, JPEG, GIF, WebP, BMP | Converted to base64 data URL and sent as an image for the AI to see and read visually. |
| Sent with a built-in parser plugin that extracts text and layout. | ||
| Word Documents | DOCX | Decompressed (DOCX is a ZIP file), then the XML content is parsed to extract all text. |
| Spreadsheets | XLSX | Decompressed, shared string table is read, then cell values are mapped to create a readable text table. |
| CSV | CSV | Decoded from base64 and sent as plain text that the AI can read directly. |
| Text Files | TXT, MD, JSON | Decoded from base64 and sent as plain text. Useful for structured data, markdown notes, or JSON payloads. |
| Audio | MP3, WAV, OGG, M4A, AAC | Sent as binary audio data. The AI model must support audio input. |
Client-side compression: Images are compressed via Canvas API and audio is re-encoded at a lower bitrate via Web Audio API. Documents (DOCX, XLSX, CSV) are text-extracted server-side without compression.
The maximum combined request payload is 4 MB. If your files are too large, compress them or upload fewer files at a time. The Salesforce heap size limit is 6 MB, which includes file data plus processing overhead.