Optical Character Recognition (OCR)

Overview

Optical Character Recognition (OCR) provides a flexible, standalone endpoint for extracting text data from documents across multiple formats, including PDF, JPG, and PNG. The Tesouro OCR returns a structured JSON response with all detected text data for contextual analysis, enabling use cases such as receipt parsing, contract analysis, or custom form data extraction. The response retains the extracted content without post-processing, allowing you to integrate and manipulate the data according to your specific requirements.

This OCR endpoint is a generic one and is not meant to replace the existing OCR for Accounts Payable (AP).

Considerations and limitations

The OCR process is asynchronous and the processing time may vary according to the document size.
The minimum image file size is 100 KB.
The maximum image file size is 20 MB.
Multipage PDF files can have up to 100 pages.

Covered languages

While this is not an exhaustive list, Tesouro OCR guarantees tested coverage for the following languages:

Arabic
Bulgarian
Catalan
Chinese
Czech
Danish
Dutch
English
Estonian
Finnish
French
Georgian
German
Greek
Hebrew
Hungarian
Italian
Japanese
Latvian
Lithuanian
Norwegian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Swedish
Thai
Turkish
Ukrainian

Create an OCR task

There are two ways you can send a document for OCR recognition:

Process file via URL
Upload from file

The OCR scanning might take up to 10-15 seconds, so it is recommended to poll the document every 5 seconds. In the meantime, the document’s status field is set as processing and the recognized_data field is not populated yet. The webhook ocr_task.finished is triggered once the OCR has finished processing the file and storing the extracted data. The webhook will be triggered only if the x-organization-id header parameter is provided.

Process file via URL

To process a file, thus creating an OCR task, call POST /ocr-tasks. The query parameter document_type is optional and its possible values are invoice, credit_note, and receipt. If document_type is not specified, the system will attempt to determine it automatically:

curl -X POST 'https://api.sandbox.tesouro.com/v1/ocr-tasks' \
  -H 'accept: application/json' \
  -H 'x-tesouro-version: 2025-06-23' \
  -H 'x-organization-id: aad60980-e2d0-436d-aa68-b9463f39870c' \
  -H 'Content-Type: application/json' \
  -d '{
        "document_type": "invoice",
        "file_url": "https://www.file.com/invoice.pdf"
  }'

The 202 Accepted response contains the information found in the file:

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "created_at": "2025-02-11T14:18:01.013Z",
  "updated_at": "2025-02-11T14:18:01.013Z",
  "status": "success",
  "document_type": "invoice",
  "recognized_data": {
    "type": "invoice",
    "currency": "USD",
    "due_date": "2025-02-25",
    "issue_date": "2025-02-11",
    "document_number": "INV-2287",
    "sender": {
      "vat_number": "12-3456789",
      "tax_number": "47-1234567",
      "email": "billing@acme-corp.com",
      "name": "Acme Corporation",
      "address": {
        "street_and_number": "123 Congress Ave",
        "city": "Austin",
        "postal_code": "78701",
        "country": "US",
        "state": "TX"
      },
      "bank_account": {
        "bank_account_number": "1234567890",
        "bic": "ACMEUSAAXXX"
      }
    },
    "recipient": {
      "vat_number": "98-7654321",
      "tax_number": "98-7654321",
      "email": "finance@xyz-enterprise.com",
      "name": "XYZ Enterprise",
      "address": {
        "street_and_number": "456 Market Street",
        "city": "New York",
        "postal_code": "10001",
        "country": "US",
        "state": "NY"
      },
      "bank_account": {
        "bank_account_number": "0987654321",
        "bic": "XYZUUS33XXX"
      }
    },
    "subtotal": 2000.0,
    "total_amount": 2380.0,
    "tax_amount": 380.0,
    "tax_rate": 19.0,
    "amount_paid": 0.0,
    "payment_terms": "Payment due within 14 days",
    "line_items": [
      {
        "line_reference": "ITEM001",
        "name": "Consulting Services",
        "description": "IT consulting services for January 2025",
        "quantity": 10,
        "unit_price": 200.0,
        "unit": "hour",
        "subtotal": 2000.0,
        "tax_rate": 19.0,
        "tax_amount": 380.0,
        "total_amount": 2380.0
      }
    ]
  }
}

If the file cannot be recognized, the system returns an error.

Upload from file

You can upload files in the PDF, PNG, or JPG format by calling POST /ocr-tasks/upload-from-file. The query parameter document_type is optional and its possible values are invoice, credit_note, and receipt. If document_type is not specified, the system will attempt to determine it automatically:

curl -X POST 'https://api.sandbox.tesouro.com/v1/ocr-tasks/upload-from-file?document_type=invoice' \
  -H 'accept: application/json' \
  -H 'x-tesouro-version: 2025-06-23' \
  -H 'x-organization-id: 3b28a315-1e10-46bf-8d83-37f64147ae9f' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@Invoice-01-25.pdf;type=application/pdf'

The sucessful 202 Accepted response contains the id and other parameters of the file:

{
  "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "created_at": "2025-02-11T14:18:01.013Z",
  "updated_at": "2025-02-11T14:18:01.013Z",
  "status": "success",
  "document_type": "invoice",
  "recognized_data": {
    "type": "invoice",
    "currency": "USD",
    "due_date": "2025-02-25",
    "issue_date": "2025-02-11",
    "document_number": "INV-2287",
    "sender": {
      "vat_number": "12-3456789",
      "tax_number": "47-1234567",
      "email": "billing@acme-corp.com",
      "name": "Acme Corporation",
      "address": {
        "street_and_number": "123 Congress Ave",
        "city": "Austin",
        "postal_code": "78701",
        "country": "US",
        "state": "TX"
      },
      "bank_account": {
        "bank_account_number": "1234567890",
        "bic": "ACMEUSAAXXX"
      }
    },
    "recipient": {
      "vat_number": "98-7654321",
      "tax_number": "98-7654321",
      "email": "finance@xyz-enterprise.com",
      "name": "XYZ Enterprise",
      "address": {
        "street_and_number": "456 Market Street",
        "city": "New York",
        "postal_code": "10001",
        "country": "US",
        "state": "NY"
      },
      "bank_account": {
        "bank_account_number": "0987654321",
        "bic": "XYZUUS33XXX"
      }
    },
    "subtotal": 2000.0,
    "total_amount": 2380.0,
    "tax_amount": 380.0,
    "tax_rate": 19.0,
    "amount_paid": 0.0,
    "payment_terms": "Payment due within 14 days",
    "line_items": [
      {
        "line_reference": "ITEM001",
        "name": "Consulting Services",
        "description": "IT consulting services for January 2025",
        "quantity": 10,
        "unit_price": 200.0,
        "unit": "hour",
        "subtotal": 2000.0,
        "tax_rate": 19.0,
        "tax_amount": 380.0,
        "total_amount": 2380.0
      }
    ]
  }
}

If the file cannot be recognized, the system returns an error.

List all OCR tasks

To obtain a list of all OCR tasks, call GET /ocr-tasks.

Retrieve a specific OCR task

To obtain information about a specific OCR task, call GET /ocr-tasks/{task_id}.

Getting started

Embedded components

Authentication

Roles & permissions

Manage organizations

Common objects

Accounts payable

Accounts receivable

Expense management

Payments

Advanced

Accounting integrations

Optical Character Recognition (OCR)

Overview

Considerations and limitations

Covered languages

Create an OCR task

Process file via URL

Upload from file

List all OCR tasks

Retrieve a specific OCR task

​Overview

​Considerations and limitations

​Covered languages

​Create an OCR task

​Process file via URL

​Upload from file

​List all OCR tasks

​Retrieve a specific OCR task

Overview

Considerations and limitations

Covered languages

Create an OCR task

Process file via URL

Upload from file

List all OCR tasks

Retrieve a specific OCR task