Skip to main content

Method Signature

agent.execute(instructionOrOptions: string | OpensteerAgentExecuteOptions): Promise<OpensteerAgentResult>
Execute an agent task with a natural language instruction. The agent will analyze the browser state, decide on actions, and execute them until the task is complete or max steps is reached.

Parameters

You can pass either a string instruction or an options object:

String Parameter

await agent.execute('Search for OpenSteer documentation')

Options Object

instruction
string
required
Natural language instruction describing the task to complete. Must be a non-empty string.
maxSteps
number
default:20
Maximum number of actions the agent can take. Must be a positive integer. Prevents infinite loops and controls execution cost.
highlightCursor
boolean
default:false
When true, displays a red cursor overlay at the action coordinates during execution. Useful for debugging and demonstrations.

Return Type

Returns a Promise<OpensteerAgentResult> with the following fields:
success
boolean
required
Whether the agent execution completed without errors. false indicates a failure (API error, invalid action, etc.).
completed
boolean
required
Whether the agent finished the task successfully. false when max steps reached before completion.
message
string
required
Human-readable message describing the result. Contains completion message or error details.
actions
OpensteerAgentAction[]
required
Array of actions taken by the agent during execution. Each action includes type, coordinates, reasoning, and metadata.
usage
OpensteerAgentUsage | undefined
Token and time usage statistics for the execution.
provider
OpensteerAgentProvider
required
The provider used for execution: 'openai', 'anthropic', or 'google'.
model
string
required
Full model name in provider/model format (e.g., 'openai/computer-use-preview').

Action Types

Each action in the actions array includes:
type
string
required
Action type: 'click', 'type', 'scroll', 'navigate', 'wait', 'screenshot', 'finish'
reasoning
string | undefined
Agent’s reasoning for taking this action.
x
number | undefined
X coordinate for click actions.
y
number | undefined
Y coordinate for click actions.
text
string | undefined
Text input for type actions.
scrollX
number | undefined
Horizontal scroll delta.
scrollY
number | undefined
Vertical scroll delta.
url
string | undefined
URL for navigate actions.
timeMs
number | undefined
Wait duration in milliseconds.

Examples

Basic Usage

import { Opensteer } from 'opensteer'

const browser = new Opensteer()
await browser.launch()
await browser.goto('https://example.com')

const agent = browser.agent({
  mode: 'cua',
  model: 'openai/computer-use-preview'
})

const result = await agent.execute('Click the login button')

if (result.success && result.completed) {
  console.log('✓ Task completed:', result.message)
  console.log('Actions taken:', result.actions.length)
} else if (result.success && !result.completed) {
  console.warn('⚠ Max steps reached before completion')
} else {
  console.error('✗ Execution failed:', result.message)
}

await browser.close()

With Options

const result = await agent.execute({
  instruction: 'Fill out the contact form with test data',
  maxSteps: 30,
  highlightCursor: true
})

console.log('Provider:', result.provider)
console.log('Model:', result.model)

if (result.usage) {
  console.log('Input tokens:', result.usage.inputTokens)
  console.log('Output tokens:', result.usage.outputTokens)
  console.log('Inference time:', result.usage.inferenceTimeMs, 'ms')
}

for (const action of result.actions) {
  console.log(`${action.type}: ${action.reasoning || 'N/A'}`)
}

Error Handling

try {
  const result = await agent.execute('Complete checkout process')
  
  if (!result.success) {
    throw new Error(`Agent failed: ${result.message}`)
  }
  
  if (!result.completed) {
    console.warn('Task incomplete - may need more steps')
  }
  
  return result
} catch (error) {
  console.error('Execution error:', error)
  throw error
}

Analyzing Actions

const result = await agent.execute('Search for products')

const clickActions = result.actions.filter(a => a.type === 'click')
const typeActions = result.actions.filter(a => a.type === 'type')
const navigateActions = result.actions.filter(a => a.type === 'navigate')

console.log(`Clicks: ${clickActions.length}`)
console.log(`Text inputs: ${typeActions.length}`)
console.log(`Navigations: ${navigateActions.length}`)

// Print reasoning for each action
for (const action of result.actions) {
  if (action.reasoning) {
    console.log(`[${action.type}] ${action.reasoning}`)
  }
}

Notes

  • Agent execution is slower and more expensive than direct actions
  • Always set appropriate maxSteps to prevent runaway execution
  • Check both success and completed flags in the result
  • The usage field may be undefined depending on the provider
  • Actions are executed sequentially with waitBetweenActionsMs delay between them
  • The agent cannot be executed concurrently - only one execution at a time per agent instance