Signature
extract < TData = unknown >( options : ExtractOptions < TSchema >) : Promise < TData >
Extract structured data from the current page using a schema with element references, CSS selectors, or AI-driven extraction. The method automatically resolves field targets from the schema and extracts values from the DOM.
Parameters
options
ExtractOptions<TSchema>
required
Extraction configuration object Schema defining the data structure to extract. Fields can reference DOM elements via element counters, CSS selector strings, or special sources like current_url. Nested objects and arrays are supported.
Optional description for caching the extraction paths. When provided, resolved element paths are persisted to disk for deterministic replay on subsequent runs.
Additional prompt text to guide AI extraction when the schema alone is insufficient.
Options for the HTML snapshot used during AI extraction planning. Defaults to { mode: 'extraction', withCounters: true }.
Counter value from a previous snapshot. Overrides persisted paths and schema resolution.
CSS selector to locate the extraction root. Overrides persisted paths.
wait
false | ActionWaitOptions
Post-action wait configuration. Not typically used for extraction, but available for consistency.
Returns
The extracted data matching the schema structure. Nested objects and arrays are fully resolved.
Examples
import { Opensteer } from 'opensteer'
const opensteer = new Opensteer ()
await opensteer . launch ()
await opensteer . goto ( 'https://example.com/product' )
const html = await opensteer . snapshot ({ mode: 'extraction' })
// Review HTML to find element counters (c="3", c="5", etc.)
const product = await opensteer . extract ({
schema: {
title: { element: 3 },
price: { element: 5 , attribute: 'data-price' },
url: { source: 'current_url' },
},
})
console . log ( product )
// { title: 'Premium Widget', price: '29.99', url: 'https://example.com/product' }
const article = await opensteer . extract ({
schema: {
headline: { selector: 'h1.article-title' },
author: { selector: '.author-name' },
publishDate: { selector: 'time' , attribute: 'datetime' },
body: { selector: '.article-content' },
},
})
const listing = await opensteer . extract ({
description: 'property-listing' ,
schema: {
address: {
street: { element: 10 },
city: { element: 11 },
zip: { element: 12 },
},
pricing: {
current: { element: 20 },
original: { element: 21 },
},
features: [
{ name: { element: 30 }, value: { element: 31 } },
],
},
})
// First run: resolves paths and persists to .opensteer/selectors/
const data = await opensteer . extract ({
description: 'product-info' ,
schema: {
title: { selector: 'h1.product-title' },
price: { selector: '.price' },
stock: { selector: '.stock-status' },
},
})
// Subsequent runs: loads cached paths from disk
// Works even if element counters or DOM structure changes slightly
const updatedData = await opensteer . extract ({
description: 'product-info' ,
schema: {
title: { selector: 'h1.product-title' },
price: { selector: '.price' },
stock: { selector: '.stock-status' },
},
})
// AI extracts data based on page content and prompt
const summary = await opensteer . extract ({
prompt: 'Extract the main product details including name, price, and availability' ,
})
console . log ( summary )
// AI returns structured data matching the prompt
const results = await opensteer . extract ({
description: 'search-results' ,
schema: {
items: [
{
title: { selector: 'h2.result-title' },
link: { selector: 'a.result-link' , attribute: 'href' },
snippet: { selector: '.result-snippet' },
},
],
},
})
console . log ( results . items )
// Array of objects with title, link, and snippet
Schema Structure
The ExtractSchema supports multiple field types:
Element counter field
{ element : 3 , attribute ?: 'href' }
CSS selector field
{ selector : '.price' , attribute ?: 'data-value' }
Special source field
{ source : 'current_url' }
Nested object
{
user : {
name : { element : 5 },
email : { element : 6 },
},
}
Array of objects
{
results : [
{
title: { selector: '.title' },
link: { selector: 'a' , attribute: 'href' },
},
],
}
Literal values
{
type : 'product' ,
version : 2 ,
available : true ,
}
Resolution Chain
The extraction method follows this resolution order:
Persisted paths - If description is provided and matching paths exist in .opensteer/selectors/, those are used
Schema hints - Element counters, selectors, and sources in the schema are resolved directly
AI planning - If no deterministic targets are found, the AI analyzes the page and generates an extraction plan
Field extraction - Resolved targets are used to extract values from the DOM
Caching and Persistence
When description is provided:
Element paths are persisted to .opensteer/selectors/{namespace}/{description}.json
Schema hash is stored to detect changes
Subsequent runs with matching description and schema load cached paths
Delete the cached file to force re-extraction
Type Safety
interface Product {
title : string
price : string
url : string
}
const product = await opensteer . extract < Product >({
schema: {
title: { element: 3 },
price: { element: 5 },
url: { source: 'current_url' },
},
})
// product is typed as Product
console . log ( product . title )
Error Handling
try {
const data = await opensteer . extract ({
description: 'product-data' ,
schema: {
title: { selector: '.missing-element' },
},
})
} catch ( error ) {
// Extraction may fail if:
// - Required elements are not found
// - Selectors are invalid
// - AI extraction planning fails
// - Cached selector is incompatible with current schema
console . error ( 'Extraction failed:' , error . message )
}
See Schema Types for detailed schema field options and ExtractionPlan for two-phase extraction.
See Also