Overview
OpenSteer provides powerful data extraction capabilities that combine AI vision models with persistent element paths. Define a schema, and OpenSteer extracts matching data with automatic caching for fast, deterministic replay.
Take extraction snapshot - Get data-oriented HTML representation
Define schema - Specify the structure of data you want
Extract data - OpenSteer uses AI to map page elements to schema
Cache paths - Element paths are saved for instant replay
Define a Schema
Schemas define the structure of data you want to extract:
const schema = {
title: '' ,
price: '' ,
description: ''
}
import { Opensteer } from 'opensteer'
const opensteer = new Opensteer ({ name: 'product-scraper' })
try {
await opensteer . launch ()
await opensteer . goto ( 'https://example.com/product' )
// Take extraction snapshot
await opensteer . snapshot ({ mode: 'extraction' })
// Extract with schema
const data = await opensteer . extract ({
description: 'product details' ,
schema: {
title: '' ,
price: '' ,
imageUrl: ''
}
})
console . log ( data )
// { title: 'Product Name', price: '$99.99', imageUrl: 'https://...' }
} finally {
await opensteer . close ()
}
The first extraction uses AI vision to locate elements. Subsequent runs use cached paths for instant extraction.
Schema Field Types
String Fields
Extract text content:
const schema = {
name: '' ,
description: '' ,
category: ''
}
Number Fields
Extract numeric values:
const schema = {
price: 0 ,
rating: 0 ,
reviewCount: 0
}
Boolean Fields
Extract boolean values:
const schema = {
inStock: true ,
onSale: false
}
Null Fields
Extract nullable values:
const schema = {
salePrice: null , // May or may not exist
badge: null
}
Nested Structures
Object Fields
Extract nested objects:
const schema = {
product: {
name: '' ,
price: '' ,
specs: {
weight: '' ,
dimensions: ''
}
}
}
const data = await opensteer . extract ({
description: 'product with specs' ,
schema
})
// Result:
// {
// product: {
// name: 'Widget',
// price: '$50',
// specs: { weight: '1kg', dimensions: '10x10cm' }
// }
// }
Array Fields
Extract lists of items:
const schema = {
products: [
{
title: '' ,
price: '' ,
imageUrl: ''
}
]
}
const data = await opensteer . extract ({
description: 'product listing' ,
schema
})
// Result:
// {
// products: [
// { title: 'Product 1', price: '$10', imageUrl: 'https://...' },
// { title: 'Product 2', price: '$20', imageUrl: 'https://...' },
// { title: 'Product 3', price: '$30', imageUrl: 'https://...' }
// ]
// }
For arrays, OpenSteer automatically finds all matching items and extracts their fields.
Advanced Field Options
Extract HTML attributes instead of text:
const schema = {
imageUrl: { element: 0 , attribute: 'src' },
linkUrl: { element: 0 , attribute: 'href' },
productId: { element: 0 , attribute: 'data-id' }
}
Include the current page URL in extraction:
const schema = {
title: '' ,
price: '' ,
sourceUrl: { source: 'current_url' }
}
const data = await opensteer . extract ({
description: 'product with url' ,
schema
})
// Result:
// {
// title: 'Product',
// price: '$99',
// sourceUrl: 'https://example.com/product/123'
// }
Explicit Element Selectors
Manually specify elements from snapshots:
const schema = {
title: { element: 5 },
price: { element: 8 },
image: { element: 3 , attribute: 'src' }
}
CSS Selectors
Use explicit CSS selectors:
const schema = {
title: { selector: 'h1.product-title' },
price: { selector: '.price-value' }
}
Real-World Example
Here’s a complete extraction script from the OpenSteer source:
import { Opensteer } from 'opensteer'
async function run () {
const opensteer = new Opensteer ({
name: 'product-extraction' ,
model: 'gpt-5.1' ,
})
await opensteer . launch ({ headless: false })
try {
await opensteer . goto (
'https://kbdfans.com/search?type=product&q=tactile+switches'
)
console . log ( 'Starting extraction...' )
const data = await opensteer . extract ({
description: 'Extract product cards with title, price, image, and url' ,
schema: {
products: [
{
title: '' ,
price: '' ,
imageUrl: '' ,
url: '' ,
},
],
},
})
console . log ( data )
} finally {
await opensteer . close ()
}
}
run (). catch (( err ) => {
console . error ( err )
process . exit ( 1 )
})
For complex extractions, use extractFromPlan() to separate planning from execution.
Phase 1: Generate Plan
First extraction generates an extraction plan:
const plan = await opensteer . extract ({
description: 'product listing' ,
schema: {
products: [{ title: '' , price: '' }]
}
})
// Plan contains:
// - fields: Element counter mappings
// - paths: Cached element paths
// - data: Initial extracted data
Phase 2: Execute Plan
Reuse the plan for fast extraction:
const data = await opensteer . extractFromPlan ({
description: 'product listing' ,
schema: {
products: [{ title: '' , price: '' }]
},
plan: plan
})
extractFromPlan() skips AI inference and uses cached paths directly. This is significantly faster for repeated extractions.
Custom Snapshot
Provide snapshot options:
const data = await opensteer . extract ({
description: 'product data' ,
schema: { title: '' , price: '' },
snapshot: {
mode: 'extraction' ,
withCounters: true
}
})
Custom Prompt
Add instructions for the AI:
const data = await opensteer . extract ({
description: 'product prices' ,
schema: { prices: [ '' ] },
prompt: 'Extract only regular prices, ignore sale prices'
})
Always take a snapshot before extraction:
// Take snapshot
await opensteer . snapshot ({ mode: 'extraction' })
// Then extract
const data = await opensteer . extract ({
description: 'product data' ,
schema: { title: '' , price: '' }
})
2. Use Descriptive Names
Provide clear descriptions for caching:
// Good - descriptive
await opensteer . extract ({
description: 'product listing with name, price, and image' ,
schema: { /* ... */ }
})
// Bad - vague
await opensteer . extract ({
description: 'data' ,
schema: { /* ... */ }
})
3. Cache All Page Types
During CLI exploration, cache extraction for every page type your scraper will visit:
# List page
opensteer snapshot extraction
opensteer extract '{"products":[{"name":"","price":""}]}' \
--description "product listing"
# Detail page
opensteer click 1 --description "first product"
opensteer snapshot extraction
opensteer extract '{"title":"","description":"","specs":[""]}' \
--description "product detail page"
4. Handle Missing Data
Some fields may not exist on all pages:
const schema = {
title: '' ,
price: '' ,
salePrice: null , // May not exist
badge: null // May not exist
}
const data = await opensteer . extract ({
description: 'product' ,
schema
})
// Check for null values
if ( data . salePrice !== null ) {
console . log ( 'On sale:' , data . salePrice )
}
5. Structure Arrays Properly
For arrays, include representative items in the schema:
// Good - shows all fields
const schema = {
products: [
{
title: '' ,
price: '' ,
imageUrl: ''
}
]
}
// OpenSteer caches the pattern and finds all matching items
6. Use Type Hints
Use appropriate primitive types as defaults:
const schema = {
name: '' , // String
price: 0 , // Number
inStock: true , // Boolean
badge: null , // Nullable
specs: [ '' ], // String array
metadata: {} // Object
}
When extraction produces wrong or missing data:
Check timing
Ensure SPA content has loaded: await opensteer . waitForText ( 'Products loaded' )
await opensteer . snapshot ({ mode: 'extraction' })
const data = await opensteer . extract ({ /* ... */ })
Verify cache exists
Make sure you cached the extraction during CLI exploration for this page type.
Handle obstacles
Remove cookie banners, modals, or login walls before extraction: await opensteer . click ({ description: 'close cookie banner' })
await opensteer . snapshot ({ mode: 'extraction' })
Check for missing data
Some pages genuinely lack certain fields. Use null defaults and handle missing data: const schema = { optional: null }
const data = await opensteer . extract ({ schema })
if ( data . optional === null ) {
console . log ( 'Field not found on page' )
}
Do NOT replace opensteer.extract() with page.evaluate() + querySelectorAll when debugging. Fix timing, caching, or obstacles instead.
OpenSteer Extraction
AI-powered element detection
Automatic path caching
Works across page structure changes
Deterministic replay
Type-safe schemas
Manual Parsing
Brittle CSS selectors
No caching
Breaks on DOM changes
Requires maintenance
Error-prone
Next Steps
Browser Automation Learn core automation features and navigation
AI Agents Integrate extraction with AI agent workflows
Cloud Integration Scale extraction with cloud mode
Skills Install OpenSteer skills for AI assistants