Jobs API
Overview
Jobs are individual executions of your scraping recipes. Each job tracks the progress, results, and any errors that occur during the scraping process.
Endpoints
Get Job Results
Retrieve the results of a specific job.
GET
/recipes/{id}/jobs/{jobId}/results
- Parameters
- Response
Job Lifecycle
Status Flow
graph LR
A[Created] --> B[Pending]
B --> C[Running]
C --> D[Completed]
C --> E[Failed]
Status Definitions
Status | Description | Next States |
---|---|---|
pending | Job is queued and waiting to start | running |
running | Job is currently executing | completed, failed |
completed | Job has finished successfully | - |
failed | Job encountered an error | - |
Job Object
- Schema
- Example
{
"id": "string",
"status": "pending | running | completed | failed",
"startDate": "2024-01-01T00:00:00Z",
"endDate": "2024-01-01T00:00:00Z",
"totalPaginationPages": 0,
"totalPagesScraped": 0,
"totalRows": 0,
"error": "string"
}
{
"id": "job_123",
"status": "completed",
"startDate": "2024-01-01T10:00:00Z",
"endDate": "2024-01-01T10:05:00Z",
"totalPaginationPages": 5,
"totalPagesScraped": 5,
"totalRows": 100,
"error": null
}
Progress Monitoring
Track your job's progress using these metrics:
Metric | Description | Example |
---|---|---|
totalPaginationPages | Total number of pages to process | 10 |
totalPagesScraped | Number of pages processed so far | 7 |
totalRows | Number of data rows extracted | 350 |
Progress Calculation
const progress = (totalPagesScraped / totalPaginationPages) * 100;
Error Handling
When a job fails, the error information is stored in the job object. Common error scenarios:
- Network timeouts
- Rate limiting
- Invalid selectors
- Website structure changes
Example Error Response
{
"id": "job_123",
"status": "failed",
"error": "Rate limit exceeded: Too many requests to target website"
}
Best Practices
-
Monitoring
- Regularly check job status
- Set up notifications for job completion/failure
- Monitor progress for long-running jobs
-
Error Recovery
- Implement retry logic for failed jobs
- Use exponential backoff for rate limits
- Log detailed error information
-
Resource Management
- Limit concurrent jobs
- Set appropriate timeouts
- Clean up completed job data