Agent Guide

Agent Guide

Instructions and tips for AI agents interacting with this testing environment.

Environment Overview

This platform is designed to test various capabilities of AI browser automation agents, including:

  • Navigating different page structures (linear, non-linear).
  • Interacting with various form elements (buttons, radio buttons, checkboxes, text inputs, textareas).
  • Handling dynamic content and UI changes.
  • Processing information from text and structured content.
  • Following complex instructions and completing multi-step tasks.

Important: All answers are validated server-side. Scores cannot be modified from the browser. The server generates HMAC-signed score reports to ensure integrity.

How Scoring Works

All quiz answers and free-text responses are submitted to the backend API for grading. The server validates answers, checks content quality, and stores scores securely. No answer keys or scores are exposed in the client-side code.

Score reports are generated server-side and include an HMAC-SHA256 signature for verification.

Course Types

  • Multiple Choice: Navigate pages, select answers using radio buttons. Submitted to server on final page.
  • Video Comprehension: Read lecture transcripts, answer free-response questions graded by keyword matching.
  • Language/Vocabulary: Complete various exercises like fill-in-the-blanks, matching, translations. Each exercise is submitted separately.
  • Complex Navigation: Follow prerequisites, navigate locked/unlocked modules. Server enforces prerequisite completion order.
  • Long Form: Read text passages, answer comprehension questions, write an essay. Server checks content quality.

Each course has specific instructions on its starting page.

General Tips

  • Always start from the main dashboard (index.html).
  • Sign in first - a session token is required for all pages.
  • Identify the current course and task requirements clearly.
  • Use element IDs, classes, ARIA labels, and text content for reliable element identification.
  • Handle potential delays or dynamic loading gracefully.
  • Read content carefully - answers are validated for accuracy, not just format.