Recipes API
Overview
Recipes are the foundation of Scrape Loop, defining the structure and rules for data extraction. Each recipe contains:
- Target URL and selectors
- Data extraction rules
- Pagination configuration
- Execution settings
Endpoints
List Recipes
Retrieve a list of all your scraping recipes.
GET
/recipes
- Parameters
- Response
Query Parameters
Parameter | Type | Description |
---|---|---|
statuses | string | Comma-separated list of statuses (active, paused, deleted) |
limit | integer | Number of items to return |
skip | integer | Number of items to skip |
{
"success": true,
"recipes": [
{
"id": "string",
"name": "string",
"url": "string",
"status": "active | paused | deleted",
"listSelector": "string",
"properties": [
{
"name": "string",
"description": "string",
"elementSelector": "string",
"type": "text | link | html | src"
}
],
"pagination": {
"type": "load-more | scroll-down | next-button | none",
"nextPageSelector": "string",
"maxPages": 0
},
"extractorType": "selector | llm",
"lastRun": "2024-01-01T00:00:00Z"
}
]
}
Get Recipe
Retrieve a single recipe by its ID.
GET
/recipes/{id}
- Parameters
- Response
Path Parameters
Parameter | Type | Description |
---|---|---|
id | string | Recipe ID |
{
"success": true,
"scrapeRecipe": {
"id": "string",
"name": "string",
"url": "string",
"status": "active | paused | deleted",
"listSelector": "string",
"properties": [
{
"name": "string",
"description": "string",
"elementSelector": "string",
"type": "text | link | html | src"
}
],
"pagination": {
"type": "load-more | scroll-down | next-button | none",
"nextPageSelector": "string",
"maxPages": 0
},
"extractorType": "selector | llm",
"lastRun": "2024-01-01T00:00:00Z"
}
}
Get Recipe Jobs
Retrieve all jobs associated with a recipe.
GET
/recipes/{id}/jobs
- Parameters
- Response
Path Parameters
Parameter | Type | Description |
---|---|---|
id | string | Recipe ID |
Query Parameters
Parameter | Type | Description |
---|---|---|
status | string | Filter by job status (pending, running, completed, failed) |
limit | integer | Number of items to return |
skip | integer | Number of items to skip |
{
"success": true,
"jobs": [
{
"id": "string",
"status": "pending | running | completed | failed",
"startDate": "2024-01-01T00:00:00Z",
"endDate": "2024-01-01T00:00:00Z",
"totalPaginationPages": 0,
"totalPagesScraped": 0,
"totalRows": 0,
"error": "string"
}
],
"count": 0
}
Recipe Components
Property Types
Type | Description | Example |
---|---|---|
text | Extract text content | Product title, description |
link | Extract href attribute | Product URL, next page link |
html | Extract raw HTML | Rich text content |
src | Extract source URL | Image URL, video source |
Pagination Types
Type | Description | Configuration |
---|---|---|
load-more | Click button to load more items | Requires nextPageSelector |
scroll-down | Infinite scroll pagination | Optional maxPages |
next-button | Traditional pagination with next button | Requires nextPageSelector |
none | Single page scraping | No additional config |
Extractor Types
Type | Description | Best For |
---|---|---|
selector | Uses CSS selectors | Structured content |
llm | AI-powered extraction | Dynamic/complex content |
Recipe Status
graph LR
A[Created] --> B[Active]
B --> C[Paused]
C --> B
B --> D[Deleted]
Status | Description |
---|---|
active | Recipe is enabled and can be run |
paused | Recipe is temporarily disabled |
deleted | Recipe is permanently disabled |
Best Practices
-
Selectors
- Use specific CSS selectors
- Test selectors across different pages
- Handle missing data gracefully
-
Pagination
- Set reasonable page limits
- Handle rate limiting
- Test edge cases (last page, empty results)
-
Maintenance
- Monitor recipe success rate
- Update selectors when sites change
- Archive unused recipes