ExtractSchema
TheExtractSchema defines the structure of data to extract from a page. Schemas support nested objects, arrays, and multiple field types for flexible data extraction.
Schema Field Types
ExtractSchemaField
A field descriptor that references a DOM element or special source.
Element Counter Field
References an element by its counter from a snapshot:textContent of the element with c="3".
href attribute of element c="5".
CSS Selector Field
References an element by CSS selector:src attribute.
Current URL Field
Extracts the current page URL:page.url().
Nested Objects
Schemas support arbitrary nesting:Arrays
Arrays of objects are supported for extracting repeated structures:- Finds all matching parent elements
- Extracts each field relative to each parent
- Returns an array of objects
Literal Values
Schemas can include literal values:Complete Example
ExtractionPlan
AnExtractionPlan is an intermediate representation returned by AI extraction or used for two-phase extraction with extractFromPlan().
ExtractionFieldPlan
ExtractSchemaField, but used in plans generated by AI or built programmatically.
extractFromPlan()
Extract data using a pre-built extraction plan with explicit field mappings and element paths.Signature
Parameters
Returns
Example: Two-Phase Extraction
Example: Using Pre-Resolved Paths
Type Definitions
Complete TypeScript types:Best Practices
Use element counters for dynamic content
Use selectors for stable structures
Cache with descriptions
Type your results
See Also
- extract() - Main extraction method
- snapshot() - Generate HTML with element counters
- ElementPath types - Source code reference