Jobs API

Overview

Jobs are individual executions of your scraping recipes. Each job tracks the progress, results, and any errors that occur during the scraping process.

Endpoints

Get Job Results

Retrieve the results of a specific job.

GET

/recipes/{id}/jobs/{jobId}/results

Parameters
Response

Path Parameters

Parameter	Type	Description
`id`	string	Recipe ID
`jobId`	string	Job ID

Query Parameters

Parameter	Type	Description
`page`	integer	Page number for paginated results

{
  "success": true,
  "jobResults": [
    {
      "id": "string",
      "page": 0,
      "rowIndex": 0,
      "data": {}
    }
  ],
  "totalPages": 10,
  "totalRows": 100
}

Job Lifecycle

Status Flow

graph LR
    A[Created] --> B[Pending]
    B --> C[Running]
    C --> D[Completed]
    C --> E[Failed]

Status Definitions

Status	Description	Next States
`pending`	Job is queued and waiting to start	running
`running`	Job is currently executing	completed, failed
`completed`	Job has finished successfully	-
`failed`	Job encountered an error	-

Job Object

Schema
Example

{
  "id": "string",
  "status": "pending | running | completed | failed",
  "startDate": "2024-01-01T00:00:00Z",
  "endDate": "2024-01-01T00:00:00Z",
  "totalPaginationPages": 0,
  "totalPagesScraped": 0,
  "totalRows": 0,
  "error": "string"
}

{
  "id": "job_123",
  "status": "completed",
  "startDate": "2024-01-01T10:00:00Z",
  "endDate": "2024-01-01T10:05:00Z",
  "totalPaginationPages": 5,
  "totalPagesScraped": 5,
  "totalRows": 100,
  "error": null
}

Progress Monitoring

Track your job's progress using these metrics:

Metric	Description	Example
`totalPaginationPages`	Total number of pages to process	10
`totalPagesScraped`	Number of pages processed so far	7
`totalRows`	Number of data rows extracted	350

Progress Calculation

const progress = (totalPagesScraped / totalPaginationPages) * 100;

Error Handling

When a job fails, the error information is stored in the job object. Common error scenarios:

Network timeouts
Rate limiting
Invalid selectors
Website structure changes

Example Error Response

{
  "id": "job_123",
  "status": "failed",
  "error": "Rate limit exceeded: Too many requests to target website"
}

Best Practices

Monitoring
- Regularly check job status
- Set up notifications for job completion/failure
- Monitor progress for long-running jobs
Error Recovery
- Implement retry logic for failed jobs
- Use exponential backoff for rate limits
- Log detailed error information
Resource Management
- Limit concurrent jobs
- Set appropriate timeouts
- Clean up completed job data

Overview​

Endpoints​

Get Job Results​

Path Parameters​

Query Parameters​

Job Lifecycle​

Status Flow​

Status Definitions​

Job Object​

Progress Monitoring​

Progress Calculation​

Error Handling​

Example Error Response​

Best Practices​