THE ORCHESTRATOR (#600)

* feat(background-agent): add ConcurrencyManager for model-based limits

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* fix(background-agent): set default concurrency to 5

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(background-agent): support 0 as unlimited concurrency

Setting concurrency to 0 means unlimited (Infinity).
Works for defaultConcurrency, providerConcurrency, and modelConcurrency.

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): use auto flag for session resumption after compaction

- executor.ts: Added `auto: true` to summarize body, removed subsequent prompt_async call
- preemptive-compaction/index.ts: Added `auto: true` to summarize body, removed subsequent promptAsync call
- executor.test.ts: Updated test expectation to include `auto: true`

Instead of sending 'Continue' prompt after compaction, use SessionCompaction's `auto: true` feature which auto-resumes the session.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(agents): update sisyphus orchestrator

Update Sisyphus agent orchestrator with latest changes.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(features): update background agent manager

Update background agent manager with latest configuration changes.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(features): update init-deep template

Update initialization template with latest configuration.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(hooks): update hook constants and configuration

Update hook constants and configuration across agent-usage-reminder, keyword-detector, and claude-code-hooks.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(tools): remove background-task tool

Remove background-task tool module completely:
- src/tools/background-task/constants.ts
- src/tools/background-task/index.ts
- src/tools/background-task/tools.ts
- src/tools/background-task/types.ts

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(tools): update tool exports and main plugin entry

Update tool index exports and main plugin entry point after background-task tool removal.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(auth): update constants to match CLIProxyAPI (50min buffer, 2 endpoints)

- Changed ANTIGRAVITY_TOKEN_REFRESH_BUFFER_MS from 60,000ms (1min) to 3,000,000ms (50min)
- Removed autopush endpoint from ANTIGRAVITY_ENDPOINT_FALLBACKS (now 2 endpoints: daily → prod)
- Added comprehensive test suite with 6 tests covering all updated constants
- Updated comments to reflect CLIProxyAPI parity

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(auth): remove PKCE to match CLIProxyAPI

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* feat(auth): implement port 51121 with OS fallback

Add port fallback logic to OAuth callback server:
- Try port 51121 (ANTIGRAVITY_CALLBACK_PORT) first
- Fallback to OS-assigned port on EADDRINUSE
- Add redirectUri property to CallbackServerHandle
- Return actual bound port in handle.port

Add comprehensive port handling tests (5 new tests):
- Should prefer port 51121
- Should return actual bound port
- Should fallback when port occupied
- Should cleanup and release port on close
- Should provide redirect URI with actual port

All 16 tests passing (11 existing + 5 new).

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* test(auth): add token expiry tests for 50-min buffer

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* feat(agents): add Prometheus system prompt and planner methodology

Add prometheus-prompt.ts with comprehensive planner agent system prompt.
Update plan-prompt.ts with streamlined Prometheus workflow including:
- Context gathering via explore/librarian agents
- Metis integration for AI slop guardrails
- Structured plan output format

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): add Metis plan consultant agent

Add Metis agent for pre-planning analysis that identifies:
- Hidden requirements and implicit constraints
- AI failure points and common mistakes
- Clarifying questions before planning begins

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): add Momus plan reviewer agent

Add Momus agent for rigorous plan review against:
- Clarity and verifiability standards
- Completeness checks
- AI slop detection

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): add Sisyphus-Junior focused executor agent

Add Sisyphus-Junior agent for focused task execution:
- Same discipline as Sisyphus, no delegation capability
- Used for category-based task spawning via sisyphus_task tool

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): add orchestrator-sisyphus agent

Add orchestrator-sisyphus agent for complex workflow orchestration:
- Manages multi-agent workflows
- Coordinates between specialized agents
- Handles start-work command execution

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(skill-loader): add skill-content resolver for agent skills

Add resolveMultipleSkills() for resolving skill content to prepend to agent prompts.
Includes test coverage for resolution logic.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): add category and skills support to buildAgent

Extend buildAgent() to support:
- category: inherit model/temperature from DEFAULT_CATEGORIES
- skills: prepend resolved skill content to agent prompt

Includes comprehensive test coverage for new functionality.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): register new agents in index and types

- Export Metis, Momus, orchestrator-sisyphus in builtinAgents
- Add new agent names to BuiltinAgentName type
- Update AGENTS.md documentation with new agents

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(features): add boulder-state persistence

Add boulder-state feature for persisting workflow state:
- storage.ts: File I/O operations for state persistence
- types.ts: State interfaces
- Includes test coverage

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(skills): add frontend-ui-ux builtin skill

Add frontend-ui-ux skill for designer-turned-developer UI work:
- SKILL.md with comprehensive design principles
- skills.ts updated with skill template

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(tools): add sisyphus_task tool for category-based delegation

Add sisyphus_task tool supporting:
- Category-based task delegation (visual, business-logic, etc.)
- Direct agent targeting
- Background execution with resume capability
- DEFAULT_CATEGORIES configuration

Includes test coverage.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(background-agent): add resume capability and model field

- Add resume() method for continuing existing agent sessions
- Add model field to BackgroundTask and LaunchInput types
- Update launch() to pass model to session.prompt()
- Comprehensive test coverage for resume functionality

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): add task-resume-info hook

Add hook for injecting task resume information into tool outputs.
Enables seamless continuation of background agent sessions.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): add prometheus-md-only write restriction hook

Add hook that restricts Prometheus planner to writing only .md files
in the .sisyphus/ directory. Prevents planners from implementing.
Includes test coverage.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): add start-work hook for Sisyphus workflow

Add hook for detecting /start-work command and triggering
orchestrator-sisyphus agent for plan execution.
Includes test coverage.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): add sisyphus-orchestrator hook

Add hook for orchestrating Sisyphus agent workflows:
- Coordinates task execution between agents
- Manages workflow state persistence
- Handles agent handoffs

Includes comprehensive test coverage.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): export new hooks in index

Export new hooks:
- createPrometheusMdOnlyHook
- createTaskResumeInfoHook
- createStartWorkHook
- createSisyphusOrchestratorHook

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(todo-enforcer): add skipAgents option and improve permission check

- Add skipAgents option to skip continuation for specified agents
- Default skip: Prometheus (Planner)
- Improve tool permission check to handle 'allow'/'deny' string values
- Add agent name detection from session messages

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(config): add categories, new agents and hooks to schema

Update Zod schema with:
- CategoryConfigSchema for task delegation categories
- CategoriesConfigSchema for user category overrides
- New agents: Metis (Plan Consultant)
- New hooks: prometheus-md-only, start-work, sisyphus-orchestrator
- New commands: start-work
- Agent category and skills fields

Includes schema test coverage.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(commands): add start-work command

Add /start-work command for executing Prometheus plans:
- start-work.ts: Command template for orchestrator-sisyphus
- commands.ts: Register command with agent binding
- types.ts: Add command name to type union

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(migration): add backup creation and category migration

- Create timestamped backup before migration writes
- Add migrateAgentConfigToCategory() for model→category migration
- Add shouldDeleteAgentConfig() for cleanup when matching defaults
- Add Prometheus and Metis to agent name map
- Comprehensive test coverage for new functionality

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(config-handler): add Sisyphus-Junior and orchestrator support

- Add Sisyphus-Junior agent creation
- Add orchestrator-sisyphus tool restrictions
- Rename Planner-Sisyphus to Prometheus (Planner)
- Use PROMETHEUS_SYSTEM_PROMPT and PROMETHEUS_PERMISSION

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(cli): add categories config for Antigravity auth

Add category model overrides for Gemini Antigravity authentication:
- visual: gemini-3-pro-high
- artistry: gemini-3-pro-high
- writing: gemini-3-pro-high

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(sisyphus): update to use sisyphus_task and add resume docs

- Update example code from background_task to sisyphus_task
- Add 'Resume Previous Agent' documentation section
- Remove model name from Oracle section heading
- Disable call_omo_agent tool for Sisyphus

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor: update tool references from background_task to sisyphus_task

Update all references across:
- agent-usage-reminder: Update AGENT_TOOLS and REMINDER_MESSAGE
- claude-code-hooks: Update comment
- call-omo-agent: Update constants and tool restrictions
- init-deep template: Update example code
- tools/index.ts: Export sisyphus_task, remove background_task

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hook-message-injector): add ToolPermission type support

Add ToolPermission type union: boolean | 'allow' | 'deny' | 'ask'
Update StoredMessage and related interfaces for new permission format.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(main): wire up new tools, hooks and agents

Wire up in main plugin entry:
- Import and create sisyphus_task tool
- Import and wire taskResumeInfo, startWork, sisyphusOrchestrator hooks
- Update tool restrictions from background_task to sisyphus_task
- Pass userCategories to createSisyphusTask

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* docs: update documentation for Prometheus and new features

Update documentation across all language versions:
- Rename Planner-Sisyphus to Prometheus (Planner)
- Add Metis (Plan Consultant) agent documentation
- Add Categories section with usage examples
- Add sisyphus_task tool documentation
- Update AGENTS.md with new structure and complexity hotspots
- Update src/tools/AGENTS.md with sisyphus_task category

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* build: regenerate schema.json with new types

Update JSON schema with:
- New agents: Prometheus (Planner), Metis (Plan Consultant)
- New hooks: prometheus-md-only, start-work, sisyphus-orchestrator
- New commands: start-work
- New skills: frontend-ui-ux
- CategoryConfigSchema for task delegation
- Agent category and skills fields

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* skill

* feat: add toast notifications for task execution

- Display toast when background task starts in BackgroundManager
- Display toast when sisyphus_task sync task starts
- Wire up prometheus-md-only hook initialization in main plugin

This provides user feedback in OpenCode TUI where task TUI is not visible.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): add read-only warning injection for Prometheus task delegation

When Prometheus (Planner) spawns subagents via task tools (sisyphus_task, task, call_omo_agent), a system directive is injected into the prompt to ensure subagents understand they are in a planning consultation context and must NOT modify files.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(hooks): add mandatory hands-on verification enforcement for orchestrated tasks

- sisyphus-orchestrator: Add verification reminder with tool matrix (playwright/interactive_bash/curl)

- start-work: Inject detailed verification workflow with deliverable-specific guidance

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance

* docs(agents): clarify oracle and metis agent descriptions emphasizing read-only consultation roles

- Oracle: high-IQ reasoning specialist for debugging and architecture (read-only)
- Metis: updated description to align with oracle's consultation-only model
- Updated AGENTS.md with clarified agent responsibilities

* docs(orchestrator): emphasize oracle as read-only consultation agent

- Updated orchestrator-sisyphus agent descriptions
- Updated sisyphus-prompt-builder to highlight oracle's read-only consultation role
- Clarified that oracle provides high-IQ reasoning without write operations

* docs(refactor,root): update oracle consultation model in feature templates and root docs

- Updated refactor command template to emphasize oracle's read-only role
- Updated root AGENTS.md with oracle agent description emphasizing high-IQ debugging and architecture consultation
- Clarified oracle as non-write agent for design and debugging support

* feat(features): add TaskToastManager for consolidated task notifications

- Create task-toast-manager feature with singleton pattern

- Show running task list (newest first) when new task starts

- Track queued tasks status from ConcurrencyManager

- Integrate with BackgroundManager and sisyphus-task tool

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance

* feat(hooks): add resume session_id to verification reminders for orchestrator subagent work

When subagent work fails verification, show exact sisyphus_task(resume="...")
command with session_id for immediate retry. Consolidates verification workflow
across boulder and standalone modes.

* refactor(hooks): remove duplicate verification enforcement from start-work hook

Verification reminders are now centralized in sisyphus-orchestrator hook,
eliminating redundant code in start-work. The orchestrator hook handles all
verification messaging across both boulder and standalone modes.

* test(hooks): update prometheus-md-only test assertions and formatting

Updated test structure and assertions to match current output format.
Improved test clarity while maintaining complete coverage of markdown
validation and write restriction behavior.

* orchestrator

* feat(skills): add git-master skill for atomic commits and history management

- Add comprehensive git-master skill for commit, rebase, and history operations
- Implements atomic commit strategy with multi-file splitting rules
- Includes style detection, branch analysis, and history search capabilities
- Provides three modes: COMMIT, REBASE, HISTORY_SEARCH

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* docs(agents): add pre-delegation planning section to Sisyphus prompt

- Add SISYPHUS_PRE_DELEGATION_PLANNING section with mandatory declaration rules
- Implements 3-step decision tree: Identify → Select → Declare
- Forces explicit category/agent/skill declaration before every sisyphus_task call
- Includes mandatory 4-part format: Category/Agent, Reason, Skills, Expected Outcome
- Provides examples (CORRECT vs WRONG) and enforcement rules
- Follows prompt engineering best practices: Clear, CoT, Structured, Examples

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(tools): rename agent parameter to subagent_type in sisyphus_task

- Update parameter name from 'agent' to 'subagent_type' for consistency with call_omo_agent
- Update all references and error messages
- Remove deprecated 'agent' field from SisyphusTaskArgs interface
- Update git-master skill documentation to reflect parameter name change

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(agents): change orchestrator-sisyphus default model to claude-sonnet-4-5

- Update orchestrator-sisyphus model from opus-4-5 to sonnet-4-5 for better cost efficiency
- Keep Prometheus using opus-4-5 for planning tasks

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* refactor(config): make Prometheus model independent from plan agent config

- Prometheus no longer inherits model from plan agent configuration
- Fallback chain: session default model -> claude-opus-4-5
- Removes coupling between Prometheus and legacy plan agent settings

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* fix(momus): allow system directives in input validation

System directives (XML tags like <system-reminder>) are automatically
injected and should be ignored during input validation. Only reject
when there's actual user text besides the file path.

🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(prometheus): enhance high accuracy mode with mandatory Momus loop

When user requests high accuracy:
- Momus review loop is now mandatory until 'OKAY'
- No excuses allowed - must fix ALL issues
- No maximum retry limit - keep looping until approved
- Added clear explanation of what 'OKAY' means

🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(prometheus): enhance reference section with detailed guidance

References now include:
- Pattern references (existing code to follow)
- API/Type references (contracts to implement)
- Test references (testing patterns)
- Documentation references (specs and requirements)
- External references (libraries and frameworks)
- Explanation of WHY each reference matters

The executor has no interview context - references are their only guide.

🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

* feat(git-master): add configurable commit footer and co-author options

Add git_master config with commit_footer and include_co_authored_by flags.
Users can disable Sisyphus attribution in commits via oh-my-opencode.json.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* feat(hooks): add single-task directive and system-reminder tags to orchestrator

Inject SINGLE_TASK_DIRECTIVE when orchestrator calls sisyphus_task to enforce
atomic task delegation. Wrap verification reminders in <system-reminder> tags
for better LLM attention.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* refactor: use ContextCollector for hook injection and remove unused background tools

Split changes:
- Replace injectHookMessage with ContextCollector.register() pattern for improved hook content injection
- Remove unused background task tools infrastructure (createBackgroundOutput, createBackgroundCancel)

🤖 Generated with assistance of OhMyOpenCode (https://github.com/code-yeongyu/oh-my-opencode)

* chore(context-injector): add debug logging for context injection tracing

Add DEBUG log statements to trace context injection flow:
- Log message transform hook invocations
- Log sessionID extraction from message info
- Log hasPending checks for context collector
- Log hook content registration to contextCollector

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance

* fix(context-injector): prepend to user message instead of separate synthetic message

- Change from creating separate synthetic user message to prepending context
  directly to last user message's text part
- Separate synthetic messages were ignored by model (treated as previous turn)
- Prepending to clone ensures: UI shows original, model receives prepended content
- Update tests to reflect new behavior

* feat(prometheus): enforce mandatory todo registration on plan generation trigger

* fix(sisyphus-task): add proper error handling for sync mode and implement BackgroundManager.resume()

- Add try-catch for session.prompt() in sync mode with detailed error messages
- Sort assistant messages by time to get the most recent response
- Add 'No assistant response found' error handling
- Implement BackgroundManager.resume() method for task resumption
- Fix ConcurrencyManager type mismatch (model → concurrencyKey)

* docs(sisyphus-task): clarify resume usage with session_id and add when-to-use guidance

- Fix terminology: 'Task ID' → 'Session ID' in resume parameter docs
- Add clear 'WHEN TO USE resume' section with concrete scenarios
- Add example usage pattern in Sisyphus agent prompt
- Emphasize token savings and context preservation benefits

* fix(agents): block task/sisyphus_task/call_omo_agent from explore and librarian

Exploration agents should not spawn other agents - they are leaf nodes
in the agent hierarchy for codebase search only.

* refactor(oracle): change default model from GPT-5.2 to Claude Opus 4.5

* feat(oracle): change default model to claude-opus-4-5

* fix(sisyphus-orchestrator): check boulder session_ids before filtering sessions

Bug: continuation was not triggered even when boulder.json existed with
session_ids because the session filter ran BEFORE reading boulder state.

Fix: Read boulder state first, then include boulder sessions in the
allowed sessions for continuation.

* feat(task-toast): display skills and concurrency info in toast

- Add skills field to TrackedTask and LaunchInput types
- Show skills in task list message as [skill1, skill2]
- Add concurrency slot info [running/limit] in Running header
- Pass skills from sisyphus_task to toast manager (sync & background)
- Add unit tests for new toast features

* refactor(categories): rename high-iq to ultrabrain

* feat(sisyphus-task): add skillContent support to background agent launching

- Add optional skillContent field to LaunchInput type
- Implement buildSystemContent utility to combine skill and category prompts
- Update BackgroundManager to pass skillContent as system parameter
- Add comprehensive tests for skillContent optionality and buildSystemContent logic

🤖 Generated with assistance of oh-my-opencode

* Revert "refactor(tools): remove background-task tool"

This reverts commit 6dbc4c095badd400e024510554a42a0dc018ae42.

* refactor(sisyphus-task): rename background to run_in_background

* fix(oracle): use gpt-5.2 as default model

* test(sisyphus-task): add resume with background parameter tests

* feat(start-work): auto-select single incomplete plan and use system-reminder format

- Auto-select when only one incomplete plan exists among multiple
- Wrap multiple plans message in <system-reminder> tag
- Change prompt to 'ask user' style for agent guidance
- Add 'All Plans Complete' state handling

* feat(sisyphus-task): make skills parameter required

- Add validation for skills parameter (must be provided, use [] if empty)
- Update schema to remove .optional()
- Update type definition to make skills non-optional
- Fix existing tests to include skills parameter

* fix: prevent session model change when sending notifications

- background-agent: use only parentModel, remove prevMessage fallback
- todo-continuation: don't pass model to preserve session's lastModel
- Remove unused imports (findNearestMessageWithFields, fs, path)

Root cause: session.prompt with model param changes session's lastModel

* fix(sisyphus-orchestrator): register handler in event loop for boulder continuation

* fix(sisyphus_task): use promptAsync for sync mode to preserve main session

- session.prompt() changes the active session, causing UI model switch
- Switch to promptAsync + polling to avoid main session state change
- Matches background-agent pattern for consistency

* fix(sisyphus-orchestrator): only trigger boulder continuation for orchestrator-sisyphus agent

* feat(background-agent): add parentAgent tracking to preserve agent context in background tasks

- Add parentAgent field to BackgroundTask, LaunchInput, and ResumeInput interfaces
- Pass parentAgent through background task manager to preserve agent identity
- Update sisyphus-orchestrator to set orchestrator-sisyphus agent context
- Add session tracking for background agents to prevent context loss
- Propagate agent context in background-task and sisyphus-task tools

This ensures background/subagent spawned tasks maintain proper agent context for notifications and continuity.

🤖 Generated with assistance of oh-my-opencode

* fix(antigravity): sync plugin.ts with PKCE-removed oauth.ts API

Remove decodeState import and update OAuth flow to use simple state
string comparison for CSRF protection instead of PKCE verifier.
Update exchangeCode calls to match new signature (code, redirectUri,
clientId, clientSecret).

* fix(hook-message-injector): preserve agent info with two-pass message lookup

findNearestMessageWithFields now has a fallback pass that returns
messages with ANY useful field (agent OR model) instead of requiring
ALL fields. This prevents parentAgent from being lost when stored
messages don't have complete model info.

* fix(sisyphus-task): use SDK session.messages API for parent agent lookup

Background task notifications were showing 'build' agent instead of the
actual parent agent (e.g., 'Sisyphus'). The hook-injected message storage
only contains limited info; the actual agent name is in the SDK session.

Changes:
- Add getParentAgentFromSdk() to query SDK messages API
- Look up agent from SDK first, fallback to hook-injected messages
- Ensures background tasks correctly preserve parent agent context

* fix(sisyphus-task): use ctx.agent directly for parentAgent

The tool context already provides the agent name via ctx.agent.
The previous SDK session.messages lookup was completely wrong -
SDK messages don't store agent info per message.

Removes useless getParentAgentFromSdk function.

* feat(prometheus-md-only): allow .md files anywhere, only block code files

Prometheus (Planner) can now write .md files anywhere, not just .sisyphus/.
Still blocks non-.md files (code) to enforce read-only planning for code.

This allows planners to write commentary and analysis in markdown format.

* Revert "feat(prometheus-md-only): allow .md files anywhere, only block code files"

This reverts commit c600111597591e1862696ee0b92051e587aa1a6b.

* fix(momus): accept bracket-style system directives in input validation

Momus was rejecting inputs with bracket-style directives like [analyze-mode]
and [SYSTEM DIRECTIVE...] because it only recognized XML-style tags.

Now accepts:
- XML tags: <system-reminder>, <context>, etc.
- Bracket blocks: [analyze-mode], [SYSTEM DIRECTIVE...], [SYSTEM REMINDER...], etc.

* fix(sisyphus-orchestrator): inject delegation warning before Write/Edit outside .sisyphus

- Add ORCHESTRATOR_DELEGATION_REQUIRED strong warning in tool.execute.before
- Fix tool.execute.after filePath detection using pendingFilePaths Map
- before stores filePath by callID, after retrieves and deletes it
- Fixes bug where output.metadata.filePath was undefined

* docs: add orchestration, category-skill, and CLI guides

* fix(cli): correct category names in Antigravity migration (visual → visual-engineering)

* fix(sisyphus-task): prevent infinite polling when session removed from status

* fix(tests): update outdated test expectations

- constants.test.ts: Update endpoint count (2→3) and token buffer (50min→60sec)
- token.test.ts: Update expiry tests to use 60-second buffer
- sisyphus-orchestrator: Add fallback to output.metadata.filePath when callID missing

---------

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
YeonGyu-Kim
2026-01-09 02:24:43 +09:00
committed by GitHub
parent 8394926fe1
commit 768ecd928b
92 changed files with 13771 additions and 672 deletions

View File

@@ -2,26 +2,33 @@
## OVERVIEW
Custom tools: 11 LSP tools, AST-aware search/replace, file ops with timeouts, background task management, session navigation.
Custom tools extending agent capabilities: LSP integration (11 tools), AST-aware code search/replace, file operations with timeouts, background task management.
## STRUCTURE
```
tools/
├── ast-grep/ # AST-aware code search/replace (25 languages)
│ ├── napi.ts # @ast-grep/napi binding (preferred)
── cli.ts # @ast-grep/cli fallback
│ ├── cli.ts # @ast-grep/cli subprocess
── napi.ts # @ast-grep/napi native binding (preferred)
│ ├── constants.ts, types.ts, tools.ts, utils.ts
├── background-task/ # Async agent task management
├── call-omo-agent/ # Spawn explore/librarian agents
├── glob/ # File pattern matching (60s timeout)
├── grep/ # Content search (60s timeout)
├── glob/ # File pattern matching (timeout-safe)
├── grep/ # Content search (timeout-safe)
├── interactive-bash/ # Tmux session management
├── look-at/ # Multimodal analysis (PDF, images)
├── lsp/ # 11 LSP tools (611 lines client.ts)
├── lsp/ # 11 LSP tools
│ ├── client.ts # LSP connection lifecycle
│ ├── config.ts # Server configurations
│ ├── tools.ts # Tool implementations
│ └── types.ts
├── session-manager/ # OpenCode session file management
│ ├── constants.ts # Storage paths, descriptions
│ ├── types.ts # Session data interfaces
│ ├── storage.ts # File I/O operations
│ ├── utils.ts # Formatting, filtering
│ └── tools.ts # Tool implementations
├── session-manager/ # OpenCode session file ops
├── skill/ # Skill loading and execution
├── skill-mcp/ # Skill-embedded MCP invocation
├── slashcommand/ # Slash command execution
@@ -30,39 +37,47 @@ tools/
## TOOL CATEGORIES
| Category | Tools |
|----------|-------|
| LSP | lsp_hover, lsp_goto_definition, lsp_find_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_servers, lsp_prepare_rename, lsp_rename, lsp_code_actions, lsp_code_action_resolve |
| AST | ast_grep_search, ast_grep_replace |
| File Search | grep, glob |
| Session | session_list, session_read, session_search, session_info |
| Background | background_task, background_output, background_cancel |
| Multimodal | look_at |
| Terminal | interactive_bash |
| Skills | skill, skill_mcp |
| Agents | call_omo_agent |
| Category | Tools | Purpose |
|----------|-------|---------|
| LSP | lsp_hover, lsp_goto_definition, lsp_find_references, lsp_document_symbols, lsp_workspace_symbols, lsp_diagnostics, lsp_servers, lsp_prepare_rename, lsp_rename, lsp_code_actions, lsp_code_action_resolve | IDE-like code intelligence |
| AST | ast_grep_search, ast_grep_replace | Pattern-based code search/replace |
| File Search | grep, glob | Content and file pattern matching |
| Session | session_list, session_read, session_search, session_info | OpenCode session file management |
| Background | sisyphus_task, background_output, background_cancel | Async agent orchestration |
| Multimodal | look_at | PDF/image analysis via Gemini |
| Terminal | interactive_bash | Tmux session control |
| Commands | slashcommand | Execute slash commands |
| Skills | skill, skill_mcp | Load skills, invoke skill-embedded MCPs |
| Agents | call_omo_agent | Spawn explore/librarian |
## HOW TO ADD
## HOW TO ADD A TOOL
1. Create `src/tools/my-tool/`
2. Files: `constants.ts`, `types.ts`, `tools.ts`, `index.ts`
1. Create directory: `src/tools/my-tool/`
2. Create files:
- `constants.ts`: `TOOL_NAME`, `TOOL_DESCRIPTION`
- `types.ts`: Parameter/result interfaces
- `tools.ts`: Tool implementation (returns OpenCode tool object)
- `index.ts`: Barrel export
- `utils.ts`: Helpers (optional)
3. Add to `builtinTools` in `src/tools/index.ts`
## LSP SPECIFICS
- Lazy init on first use, auto-shutdown on idle
- Config priority: opencode.json > oh-my-opencode.json > defaults
- Servers: typescript-language-server, pylsp, gopls, rust-analyzer
- **Client lifecycle**: Lazy init on first use, auto-shutdown on idle
- **Config priority**: opencode.json > oh-my-opencode.json > defaults
- **Supported servers**: typescript-language-server, pylsp, gopls, rust-analyzer, etc.
- **Custom servers**: Add via `lsp` config in oh-my-opencode.json
## AST-GREP SPECIFICS
- Meta-variables: `$VAR` (single), `$$$` (multiple)
- Pattern must be valid AST node, not fragment
- Prefers napi binding for performance
- **Meta-variables**: `$VAR` (single node), `$$$` (multiple nodes)
- **Languages**: 25 supported (typescript, tsx, python, rust, go, etc.)
- **Binding**: Prefers @ast-grep/napi (native), falls back to @ast-grep/cli
- **Pattern must be valid AST**: `export async function $NAME($$$) { $$$ }` not fragments
## ANTI-PATTERNS
## ANTI-PATTERNS (TOOLS)
- No timeout on file ops (always use, default 60s)
- Sync file operations (use async/await)
- Ignoring LSP errors (graceful handling required)
- Raw subprocess for ast-grep (prefer napi)
- **No timeout**: Always use timeout for file operations (default 60s)
- **Blocking main thread**: Use async/await, never sync file ops
- **Ignoring LSP errors**: Gracefully handle server not found/crashed
- **Raw subprocess for ast-grep**: Prefer napi binding for performance

View File

@@ -1,5 +1,4 @@
export {
createBackgroundTask,
createBackgroundOutput,
createBackgroundCancel,
} from "./tools"

View File

@@ -74,6 +74,7 @@ export function createBackgroundTask(manager: BackgroundManager): ToolDefinition
parentSessionID: ctx.sessionID,
parentMessageID: ctx.messageID,
parentModel,
parentAgent: prevMessage?.agent,
})
ctx.metadata?.({

View File

@@ -4,4 +4,4 @@ export const CALL_OMO_AGENT_DESCRIPTION = `Spawn explore/librarian agent. run_in
Available: {agents}
Prompts MUST be in English. Use \`background_output\` for async results.`
Pass \`resume=session_id\` to continue previous agent with full context. Prompts MUST be in English. Use \`background_output\` for async results.`

View File

@@ -142,7 +142,7 @@ async function executeSync(
tools: {
task: false,
call_omo_agent: false,
background_task: false,
sisyphus_task: false,
},
parts: [{ type: "text", text: args.prompt }],
},

View File

@@ -36,7 +36,6 @@ export { getTmuxPath } from "./interactive-bash/utils"
export { createSkillMcpTool } from "./skill-mcp"
import {
createBackgroundTask,
createBackgroundOutput,
createBackgroundCancel,
} from "./background-task"
@@ -48,10 +47,10 @@ type OpencodeClient = PluginInput["client"]
export { createCallOmoAgent } from "./call-omo-agent"
export { createLookAt } from "./look-at"
export { createSisyphusTask, type SisyphusTaskToolOptions, DEFAULT_CATEGORIES, CATEGORY_PROMPT_APPENDS } from "./sisyphus-task"
export function createBackgroundTools(manager: BackgroundManager, client: OpencodeClient): Record<string, ToolDefinition> {
return {
background_task: createBackgroundTask(manager),
background_output: createBackgroundOutput(manager, client),
background_cancel: createBackgroundCancel(manager, client),
}

View File

@@ -0,0 +1,254 @@
import type { CategoryConfig } from "../../config/schema"
export const VISUAL_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on VISUAL/UI tasks.
Design-first mindset:
- Bold aesthetic choices over safe defaults
- Unexpected layouts, asymmetry, grid-breaking elements
- Distinctive typography (avoid: Arial, Inter, Roboto, Space Grotesk)
- Cohesive color palettes with sharp accents
- High-impact animations with staggered reveals
- Atmosphere: gradient meshes, noise textures, layered transparencies
AVOID: Generic fonts, purple gradients on white, predictable layouts, cookie-cutter patterns.
</Category_Context>`
export const STRATEGIC_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on BUSINESS LOGIC / ARCHITECTURE tasks.
Strategic advisor mindset:
- Bias toward simplicity: least complex solution that fulfills requirements
- Leverage existing code/patterns over new components
- Prioritize developer experience and maintainability
- One clear recommendation with effort estimate (Quick/Short/Medium/Large)
- Signal when advanced approach warranted
Response format:
- Bottom line (2-3 sentences)
- Action plan (numbered steps)
- Risks and mitigations (if relevant)
</Category_Context>`
export const ARTISTRY_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on HIGHLY CREATIVE / ARTISTIC tasks.
Artistic genius mindset:
- Push far beyond conventional boundaries
- Explore radical, unconventional directions
- Surprise and delight: unexpected twists, novel combinations
- Rich detail and vivid expression
- Break patterns deliberately when it serves the creative vision
Approach:
- Generate diverse, bold options first
- Embrace ambiguity and wild experimentation
- Balance novelty with coherence
- This is for tasks requiring exceptional creativity
</Category_Context>`
export const QUICK_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on SMALL / QUICK tasks.
Efficient execution mindset:
- Fast, focused, minimal overhead
- Get to the point immediately
- No over-engineering
- Simple solutions for simple problems
Approach:
- Minimal viable implementation
- Skip unnecessary abstractions
- Direct and concise
</Category_Context>
<Caller_Warning>
⚠️ THIS CATEGORY USES A LESS CAPABLE MODEL (claude-haiku-4-5).
The model executing this task has LIMITED reasoning capacity. Your prompt MUST be:
**EXHAUSTIVELY EXPLICIT** - Leave NOTHING to interpretation:
1. MUST DO: List every required action as atomic, numbered steps
2. MUST NOT DO: Explicitly forbid likely mistakes and deviations
3. EXPECTED OUTPUT: Describe exact success criteria with concrete examples
**WHY THIS MATTERS:**
- Less capable models WILL deviate without explicit guardrails
- Vague instructions → unpredictable results
- Implicit expectations → missed requirements
**PROMPT STRUCTURE (MANDATORY):**
\`\`\`
TASK: [One-sentence goal]
MUST DO:
1. [Specific action with exact details]
2. [Another specific action]
...
MUST NOT DO:
- [Forbidden action + why]
- [Another forbidden action]
...
EXPECTED OUTPUT:
- [Exact deliverable description]
- [Success criteria / verification method]
\`\`\`
If your prompt lacks this structure, REWRITE IT before delegating.
</Caller_Warning>`
export const MOST_CAPABLE_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on COMPLEX / MOST-CAPABLE tasks.
Maximum capability mindset:
- Bring full reasoning power to bear
- Consider all edge cases and implications
- Deep analysis before action
- Quality over speed
Approach:
- Thorough understanding first
- Comprehensive solution design
- Meticulous execution
- This is for the most challenging problems
</Category_Context>`
export const WRITING_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on WRITING / PROSE tasks.
Wordsmith mindset:
- Clear, flowing prose
- Appropriate tone and voice
- Engaging and readable
- Proper structure and organization
Approach:
- Understand the audience
- Draft with care
- Polish for clarity and impact
- Documentation, READMEs, articles, technical writing
</Category_Context>`
export const GENERAL_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on GENERAL tasks.
Balanced execution mindset:
- Practical, straightforward approach
- Good enough is good enough
- Focus on getting things done
Approach:
- Standard best practices
- Reasonable trade-offs
- Efficient completion
</Category_Context>
<Caller_Warning>
⚠️ THIS CATEGORY USES A MID-TIER MODEL (claude-sonnet-4-5).
While capable, this model benefits significantly from EXPLICIT instructions.
**PROVIDE CLEAR STRUCTURE:**
1. MUST DO: Enumerate required actions explicitly - don't assume inference
2. MUST NOT DO: State forbidden actions to prevent scope creep or wrong approaches
3. EXPECTED OUTPUT: Define concrete success criteria and deliverables
**COMMON PITFALLS WITHOUT EXPLICIT INSTRUCTIONS:**
- Model may take shortcuts that miss edge cases
- Implicit requirements get overlooked
- Output format may not match expectations
- Scope may expand beyond intended boundaries
**RECOMMENDED PROMPT PATTERN:**
\`\`\`
TASK: [Clear, single-purpose goal]
CONTEXT: [Relevant background the model needs]
MUST DO:
- [Explicit requirement 1]
- [Explicit requirement 2]
MUST NOT DO:
- [Boundary/constraint 1]
- [Boundary/constraint 2]
EXPECTED OUTPUT:
- [What success looks like]
- [How to verify completion]
\`\`\`
The more explicit your prompt, the better the results.
</Caller_Warning>`
export const DEFAULT_CATEGORIES: Record<string, CategoryConfig> = {
"visual-engineering": {
model: "google/gemini-3-pro-preview",
temperature: 0.7,
},
ultrabrain: {
model: "openai/gpt-5.2",
temperature: 0.1,
},
artistry: {
model: "google/gemini-3-pro-preview",
temperature: 0.9,
},
quick: {
model: "anthropic/claude-haiku-4-5",
temperature: 0.3,
},
"most-capable": {
model: "anthropic/claude-opus-4-5",
temperature: 0.1,
},
writing: {
model: "google/gemini-3-flash-preview",
temperature: 0.5,
},
general: {
model: "anthropic/claude-sonnet-4-5",
temperature: 0.3,
},
}
export const CATEGORY_PROMPT_APPENDS: Record<string, string> = {
"visual-engineering": VISUAL_CATEGORY_PROMPT_APPEND,
ultrabrain: STRATEGIC_CATEGORY_PROMPT_APPEND,
artistry: ARTISTRY_CATEGORY_PROMPT_APPEND,
quick: QUICK_CATEGORY_PROMPT_APPEND,
"most-capable": MOST_CAPABLE_CATEGORY_PROMPT_APPEND,
writing: WRITING_CATEGORY_PROMPT_APPEND,
general: GENERAL_CATEGORY_PROMPT_APPEND,
}
export const CATEGORY_DESCRIPTIONS: Record<string, string> = {
"visual-engineering": "Frontend, UI/UX, design, styling, animation",
ultrabrain: "Strict architecture design, very complex business logic",
artistry: "Highly creative/artistic tasks, novel ideas",
quick: "Cheap & fast - small tasks with minimal overhead, budget-friendly",
"most-capable": "Complex tasks requiring maximum capability",
writing: "Documentation, prose, technical writing",
general: "General purpose tasks",
}
const BUILTIN_CATEGORIES = Object.keys(DEFAULT_CATEGORIES).join(", ")
export const SISYPHUS_TASK_DESCRIPTION = `Spawn agent task with category-based or direct agent selection.
MUTUALLY EXCLUSIVE: Provide EITHER category OR agent, not both (unless resuming).
- category: Use predefined category (${BUILTIN_CATEGORIES}) → Spawns Sisyphus-Junior with category config
- agent: Use specific agent directly (e.g., "oracle", "explore")
- background: true=async (returns task_id), false=sync (waits for result). Default: false. Use background=true ONLY for parallel exploration with 5+ independent queries.
- resume: Session ID to resume (from previous task output). Continues agent with FULL CONTEXT PRESERVED - saves tokens, maintains continuity.
- skills: Array of skill names to prepend to prompt (e.g., ["playwright", "frontend-ui-ux"]). Skills will be resolved and their content prepended with a separator. Empty array = no prepending.
**WHEN TO USE resume:**
- Task failed/incomplete → resume with "fix: [specific issue]"
- Need follow-up on previous result → resume with additional question
- Multi-turn conversation with same agent → always resume instead of new task
Prompts MUST be in English.`

View File

@@ -0,0 +1,3 @@
export { createSisyphusTask, type SisyphusTaskToolOptions } from "./tools"
export type * from "./types"
export * from "./constants"

View File

@@ -0,0 +1,430 @@
import { describe, test, expect } from "bun:test"
import { DEFAULT_CATEGORIES, CATEGORY_PROMPT_APPENDS, CATEGORY_DESCRIPTIONS, SISYPHUS_TASK_DESCRIPTION } from "./constants"
import type { CategoryConfig } from "../../config/schema"
function resolveCategoryConfig(
categoryName: string,
userCategories?: Record<string, CategoryConfig>
): { config: CategoryConfig; promptAppend: string } | null {
const defaultConfig = DEFAULT_CATEGORIES[categoryName]
const userConfig = userCategories?.[categoryName]
const defaultPromptAppend = CATEGORY_PROMPT_APPENDS[categoryName] ?? ""
if (!defaultConfig && !userConfig) {
return null
}
const config: CategoryConfig = {
...defaultConfig,
...userConfig,
model: userConfig?.model ?? defaultConfig?.model ?? "anthropic/claude-sonnet-4-5",
}
let promptAppend = defaultPromptAppend
if (userConfig?.prompt_append) {
promptAppend = defaultPromptAppend
? defaultPromptAppend + "\n\n" + userConfig.prompt_append
: userConfig.prompt_append
}
return { config, promptAppend }
}
describe("sisyphus-task", () => {
describe("DEFAULT_CATEGORIES", () => {
test("visual-engineering category has gemini model", () => {
// #given
const category = DEFAULT_CATEGORIES["visual-engineering"]
// #when / #then
expect(category).toBeDefined()
expect(category.model).toBe("google/gemini-3-pro-preview")
expect(category.temperature).toBe(0.7)
})
test("ultrabrain category has gpt model", () => {
// #given
const category = DEFAULT_CATEGORIES["ultrabrain"]
// #when / #then
expect(category).toBeDefined()
expect(category.model).toBe("openai/gpt-5.2")
expect(category.temperature).toBe(0.1)
})
})
describe("CATEGORY_PROMPT_APPENDS", () => {
test("visual-engineering category has design-focused prompt", () => {
// #given
const promptAppend = CATEGORY_PROMPT_APPENDS["visual-engineering"]
// #when / #then
expect(promptAppend).toContain("VISUAL/UI")
expect(promptAppend).toContain("Design-first")
})
test("ultrabrain category has strategic prompt", () => {
// #given
const promptAppend = CATEGORY_PROMPT_APPENDS["ultrabrain"]
// #when / #then
expect(promptAppend).toContain("BUSINESS LOGIC")
expect(promptAppend).toContain("Strategic advisor")
})
})
describe("CATEGORY_DESCRIPTIONS", () => {
test("has description for all default categories", () => {
// #given
const defaultCategoryNames = Object.keys(DEFAULT_CATEGORIES)
// #when / #then
for (const name of defaultCategoryNames) {
expect(CATEGORY_DESCRIPTIONS[name]).toBeDefined()
expect(CATEGORY_DESCRIPTIONS[name].length).toBeGreaterThan(0)
}
})
test("most-capable category exists and has description", () => {
// #given / #when
const description = CATEGORY_DESCRIPTIONS["most-capable"]
// #then
expect(description).toBeDefined()
expect(description).toContain("Complex")
})
})
describe("SISYPHUS_TASK_DESCRIPTION", () => {
test("documents background parameter as required with default false", () => {
// #given / #when / #then
expect(SISYPHUS_TASK_DESCRIPTION).toContain("background")
expect(SISYPHUS_TASK_DESCRIPTION).toContain("Default: false")
})
test("warns about parallel exploration usage", () => {
// #given / #when / #then
expect(SISYPHUS_TASK_DESCRIPTION).toContain("5+")
})
})
describe("resolveCategoryConfig", () => {
test("returns null for unknown category without user config", () => {
// #given
const categoryName = "unknown-category"
// #when
const result = resolveCategoryConfig(categoryName)
// #then
expect(result).toBeNull()
})
test("returns default config for builtin category", () => {
// #given
const categoryName = "visual-engineering"
// #when
const result = resolveCategoryConfig(categoryName)
// #then
expect(result).not.toBeNull()
expect(result!.config.model).toBe("google/gemini-3-pro-preview")
expect(result!.promptAppend).toContain("VISUAL/UI")
})
test("user config overrides default model", () => {
// #given
const categoryName = "visual-engineering"
const userCategories = {
"visual-engineering": { model: "anthropic/claude-opus-4-5" },
}
// #when
const result = resolveCategoryConfig(categoryName, userCategories)
// #then
expect(result).not.toBeNull()
expect(result!.config.model).toBe("anthropic/claude-opus-4-5")
})
test("user prompt_append is appended to default", () => {
// #given
const categoryName = "visual-engineering"
const userCategories = {
"visual-engineering": {
model: "google/gemini-3-pro-preview",
prompt_append: "Custom instructions here",
},
}
// #when
const result = resolveCategoryConfig(categoryName, userCategories)
// #then
expect(result).not.toBeNull()
expect(result!.promptAppend).toContain("VISUAL/UI")
expect(result!.promptAppend).toContain("Custom instructions here")
})
test("user can define custom category", () => {
// #given
const categoryName = "my-custom"
const userCategories = {
"my-custom": {
model: "openai/gpt-5.2",
temperature: 0.5,
prompt_append: "You are a custom agent",
},
}
// #when
const result = resolveCategoryConfig(categoryName, userCategories)
// #then
expect(result).not.toBeNull()
expect(result!.config.model).toBe("openai/gpt-5.2")
expect(result!.config.temperature).toBe(0.5)
expect(result!.promptAppend).toBe("You are a custom agent")
})
test("user category overrides temperature", () => {
// #given
const categoryName = "visual-engineering"
const userCategories = {
"visual-engineering": {
model: "google/gemini-3-pro-preview",
temperature: 0.3,
},
}
// #when
const result = resolveCategoryConfig(categoryName, userCategories)
// #then
expect(result).not.toBeNull()
expect(result!.config.temperature).toBe(0.3)
})
})
describe("skills parameter", () => {
test("SISYPHUS_TASK_DESCRIPTION documents skills parameter", () => {
// #given / #when / #then
expect(SISYPHUS_TASK_DESCRIPTION).toContain("skills")
expect(SISYPHUS_TASK_DESCRIPTION).toContain("Array of skill names")
})
test("skills parameter is required - returns error when not provided", async () => {
// #given
const { createSisyphusTask } = require("./tools")
const mockManager = { launch: async () => ({}) }
const mockClient = {
app: { agents: async () => ({ data: [] }) },
session: {
create: async () => ({ data: { id: "test-session" } }),
prompt: async () => ({ data: {} }),
messages: async () => ({ data: [] }),
},
}
const tool = createSisyphusTask({
manager: mockManager,
client: mockClient,
})
const toolContext = {
sessionID: "parent-session",
messageID: "parent-message",
agent: "Sisyphus",
abort: new AbortController().signal,
}
// #when - skills not provided (undefined)
const result = await tool.execute(
{
description: "Test task",
prompt: "Do something",
category: "ultrabrain",
run_in_background: false,
},
toolContext
)
// #then - should return error about missing skills
expect(result).toContain("skills")
expect(result).toContain("REQUIRED")
})
})
describe("resume with background parameter", () => {
test("resume with background=false should wait for result and return content", async () => {
// #given
const { createSisyphusTask } = require("./tools")
const mockTask = {
id: "task-123",
sessionID: "ses_resume_test",
description: "Resumed task",
agent: "explore",
status: "running",
}
const mockManager = {
resume: async () => mockTask,
launch: async () => mockTask,
}
const mockClient = {
session: {
prompt: async () => ({ data: {} }),
messages: async () => ({
data: [
{
info: { role: "assistant", time: { created: Date.now() } },
parts: [{ type: "text", text: "This is the resumed task result" }],
},
],
}),
},
app: {
agents: async () => ({ data: [] }),
},
}
const tool = createSisyphusTask({
manager: mockManager,
client: mockClient,
})
const toolContext = {
sessionID: "parent-session",
messageID: "parent-message",
agent: "Sisyphus",
abort: new AbortController().signal,
}
// #when
const result = await tool.execute(
{
description: "Resume test",
prompt: "Continue the task",
resume: "ses_resume_test",
run_in_background: false,
skills: [],
},
toolContext
)
// #then - should contain actual result, not just "Background task resumed"
expect(result).toContain("This is the resumed task result")
expect(result).not.toContain("Background task resumed")
})
test("resume with background=true should return immediately without waiting", async () => {
// #given
const { createSisyphusTask } = require("./tools")
const mockTask = {
id: "task-456",
sessionID: "ses_bg_resume",
description: "Background resumed task",
agent: "explore",
status: "running",
}
const mockManager = {
resume: async () => mockTask,
}
const mockClient = {
session: {
prompt: async () => ({ data: {} }),
messages: async () => ({
data: [],
}),
},
}
const tool = createSisyphusTask({
manager: mockManager,
client: mockClient,
})
const toolContext = {
sessionID: "parent-session",
messageID: "parent-message",
agent: "Sisyphus",
abort: new AbortController().signal,
}
// #when
const result = await tool.execute(
{
description: "Resume bg test",
prompt: "Continue in background",
resume: "ses_bg_resume",
run_in_background: true,
skills: [],
},
toolContext
)
// #then - should return background message
expect(result).toContain("Background task resumed")
expect(result).toContain("task-456")
})
})
describe("buildSystemContent", () => {
test("returns undefined when no skills and no category promptAppend", () => {
// #given
const { buildSystemContent } = require("./tools")
// #when
const result = buildSystemContent({ skills: undefined, categoryPromptAppend: undefined })
// #then
expect(result).toBeUndefined()
})
test("returns skill content only when skills provided without category", () => {
// #given
const { buildSystemContent } = require("./tools")
const skillContent = "You are a playwright expert"
// #when
const result = buildSystemContent({ skillContent, categoryPromptAppend: undefined })
// #then
expect(result).toBe(skillContent)
})
test("returns category promptAppend only when no skills", () => {
// #given
const { buildSystemContent } = require("./tools")
const categoryPromptAppend = "Focus on visual design"
// #when
const result = buildSystemContent({ skillContent: undefined, categoryPromptAppend })
// #then
expect(result).toBe(categoryPromptAppend)
})
test("combines skill content and category promptAppend with separator", () => {
// #given
const { buildSystemContent } = require("./tools")
const skillContent = "You are a playwright expert"
const categoryPromptAppend = "Focus on visual design"
// #when
const result = buildSystemContent({ skillContent, categoryPromptAppend })
// #then
expect(result).toContain(skillContent)
expect(result).toContain(categoryPromptAppend)
expect(result).toContain("\n\n")
})
})
})

View File

@@ -0,0 +1,493 @@
import { tool, type PluginInput, type ToolDefinition } from "@opencode-ai/plugin"
import { existsSync, readdirSync } from "node:fs"
import { join } from "node:path"
import type { BackgroundManager } from "../../features/background-agent"
import type { SisyphusTaskArgs } from "./types"
import type { CategoryConfig, CategoriesConfig } from "../../config/schema"
import { SISYPHUS_TASK_DESCRIPTION, DEFAULT_CATEGORIES, CATEGORY_PROMPT_APPENDS } from "./constants"
import { findNearestMessageWithFields, MESSAGE_STORAGE } from "../../features/hook-message-injector"
import { resolveMultipleSkills } from "../../features/opencode-skill-loader/skill-content"
import { createBuiltinSkills } from "../../features/builtin-skills/skills"
import { getTaskToastManager } from "../../features/task-toast-manager"
import { subagentSessions } from "../../features/claude-code-session-state"
type OpencodeClient = PluginInput["client"]
const SISYPHUS_JUNIOR_AGENT = "Sisyphus-Junior"
const CATEGORY_EXAMPLES = Object.keys(DEFAULT_CATEGORIES).map(k => `'${k}'`).join(", ")
function parseModelString(model: string): { providerID: string; modelID: string } | undefined {
const parts = model.split("/")
if (parts.length >= 2) {
return { providerID: parts[0], modelID: parts.slice(1).join("/") }
}
return undefined
}
function getMessageDir(sessionID: string): string | null {
if (!existsSync(MESSAGE_STORAGE)) return null
const directPath = join(MESSAGE_STORAGE, sessionID)
if (existsSync(directPath)) return directPath
for (const dir of readdirSync(MESSAGE_STORAGE)) {
const sessionPath = join(MESSAGE_STORAGE, dir, sessionID)
if (existsSync(sessionPath)) return sessionPath
}
return null
}
function formatDuration(start: Date, end?: Date): string {
const duration = (end ?? new Date()).getTime() - start.getTime()
const seconds = Math.floor(duration / 1000)
const minutes = Math.floor(seconds / 60)
const hours = Math.floor(minutes / 60)
if (hours > 0) return `${hours}h ${minutes % 60}m ${seconds % 60}s`
if (minutes > 0) return `${minutes}m ${seconds % 60}s`
return `${seconds}s`
}
type ToolContextWithMetadata = {
sessionID: string
messageID: string
agent: string
abort: AbortSignal
metadata?: (input: { title?: string; metadata?: Record<string, unknown> }) => void
}
function resolveCategoryConfig(
categoryName: string,
userCategories?: CategoriesConfig
): { config: CategoryConfig; promptAppend: string } | null {
const defaultConfig = DEFAULT_CATEGORIES[categoryName]
const userConfig = userCategories?.[categoryName]
const defaultPromptAppend = CATEGORY_PROMPT_APPENDS[categoryName] ?? ""
if (!defaultConfig && !userConfig) {
return null
}
const config: CategoryConfig = {
...defaultConfig,
...userConfig,
model: userConfig?.model ?? defaultConfig?.model ?? "anthropic/claude-sonnet-4-5",
}
let promptAppend = defaultPromptAppend
if (userConfig?.prompt_append) {
promptAppend = defaultPromptAppend
? defaultPromptAppend + "\n\n" + userConfig.prompt_append
: userConfig.prompt_append
}
return { config, promptAppend }
}
export interface SisyphusTaskToolOptions {
manager: BackgroundManager
client: OpencodeClient
userCategories?: CategoriesConfig
}
export interface BuildSystemContentInput {
skillContent?: string
categoryPromptAppend?: string
}
export function buildSystemContent(input: BuildSystemContentInput): string | undefined {
const { skillContent, categoryPromptAppend } = input
if (!skillContent && !categoryPromptAppend) {
return undefined
}
if (skillContent && categoryPromptAppend) {
return `${skillContent}\n\n${categoryPromptAppend}`
}
return skillContent || categoryPromptAppend
}
export function createSisyphusTask(options: SisyphusTaskToolOptions): ToolDefinition {
const { manager, client, userCategories } = options
return tool({
description: SISYPHUS_TASK_DESCRIPTION,
args: {
description: tool.schema.string().describe("Short task description"),
prompt: tool.schema.string().describe("Full detailed prompt for the agent"),
category: tool.schema.string().optional().describe(`Category name (e.g., ${CATEGORY_EXAMPLES}). Mutually exclusive with subagent_type.`),
subagent_type: tool.schema.string().optional().describe("Agent name directly (e.g., 'oracle', 'explore'). Mutually exclusive with category."),
run_in_background: tool.schema.boolean().describe("Run in background. MUST be explicitly set. Use false for task delegation, true only for parallel exploration."),
resume: tool.schema.string().optional().describe("Session ID to resume - continues previous agent session with full context"),
skills: tool.schema.array(tool.schema.string()).describe("Array of skill names to prepend to the prompt. Use [] if no skills needed."),
},
async execute(args: SisyphusTaskArgs, toolContext) {
const ctx = toolContext as ToolContextWithMetadata
if (args.run_in_background === undefined) {
return `❌ Invalid arguments: 'run_in_background' parameter is REQUIRED. Use run_in_background=false for task delegation, run_in_background=true only for parallel exploration.`
}
if (args.skills === undefined) {
return `❌ Invalid arguments: 'skills' parameter is REQUIRED. Use skills=[] if no skills needed.`
}
const runInBackground = args.run_in_background === true
let skillContent: string | undefined
if (args.skills.length > 0) {
const { resolved, notFound } = resolveMultipleSkills(args.skills)
if (notFound.length > 0) {
const available = createBuiltinSkills().map(s => s.name).join(", ")
return `❌ Skills not found: ${notFound.join(", ")}. Available: ${available}`
}
skillContent = Array.from(resolved.values()).join("\n\n")
}
const messageDir = getMessageDir(ctx.sessionID)
const prevMessage = messageDir ? findNearestMessageWithFields(messageDir) : null
const parentAgent = ctx.agent ?? prevMessage?.agent
const parentModel = prevMessage?.model?.providerID && prevMessage?.model?.modelID
? { providerID: prevMessage.model.providerID, modelID: prevMessage.model.modelID }
: undefined
if (args.resume) {
if (runInBackground) {
try {
const task = await manager.resume({
sessionId: args.resume,
prompt: args.prompt,
parentSessionID: ctx.sessionID,
parentMessageID: ctx.messageID,
parentModel,
parentAgent,
})
ctx.metadata?.({
title: `Resume: ${task.description}`,
metadata: { sessionId: task.sessionID },
})
return `Background task resumed.
Task ID: ${task.id}
Session ID: ${task.sessionID}
Description: ${task.description}
Agent: ${task.agent}
Status: ${task.status}
Agent continues with full previous context preserved.
Use \`background_output\` with task_id="${task.id}" to check progress.`
} catch (error) {
const message = error instanceof Error ? error.message : String(error)
return `❌ Failed to resume task: ${message}`
}
}
const toastManager = getTaskToastManager()
const taskId = `resume_sync_${args.resume.slice(0, 8)}`
const startTime = new Date()
if (toastManager) {
toastManager.addTask({
id: taskId,
description: args.description,
agent: "resume",
isBackground: false,
})
}
ctx.metadata?.({
title: `Resume: ${args.description}`,
metadata: { sessionId: args.resume, sync: true },
})
try {
await client.session.prompt({
path: { id: args.resume },
body: {
tools: {
task: false,
sisyphus_task: false,
},
parts: [{ type: "text", text: args.prompt }],
},
})
} catch (promptError) {
if (toastManager) {
toastManager.removeTask(taskId)
}
const errorMessage = promptError instanceof Error ? promptError.message : String(promptError)
return `❌ Failed to send resume prompt: ${errorMessage}\n\nSession ID: ${args.resume}`
}
const messagesResult = await client.session.messages({
path: { id: args.resume },
})
if (messagesResult.error) {
if (toastManager) {
toastManager.removeTask(taskId)
}
return `❌ Error fetching result: ${messagesResult.error}\n\nSession ID: ${args.resume}`
}
const messages = ((messagesResult as { data?: unknown }).data ?? messagesResult) as Array<{
info?: { role?: string; time?: { created?: number } }
parts?: Array<{ type?: string; text?: string }>
}>
const assistantMessages = messages
.filter((m) => m.info?.role === "assistant")
.sort((a, b) => (b.info?.time?.created ?? 0) - (a.info?.time?.created ?? 0))
const lastMessage = assistantMessages[0]
if (toastManager) {
toastManager.removeTask(taskId)
}
if (!lastMessage) {
return `❌ No assistant response found.\n\nSession ID: ${args.resume}`
}
const textParts = lastMessage?.parts?.filter((p) => p.type === "text") ?? []
const textContent = textParts.map((p) => p.text ?? "").filter(Boolean).join("\n")
const duration = formatDuration(startTime)
return `Task resumed and completed in ${duration}.
Session ID: ${args.resume}
---
${textContent || "(No text output)"}`
}
if (args.category && args.subagent_type) {
return `❌ Invalid arguments: Provide EITHER category OR subagent_type, not both.`
}
if (!args.category && !args.subagent_type) {
return `❌ Invalid arguments: Must provide either category or subagent_type.`
}
let agentToUse: string
let categoryModel: { providerID: string; modelID: string } | undefined
let categoryPromptAppend: string | undefined
if (args.category) {
const resolved = resolveCategoryConfig(args.category, userCategories)
if (!resolved) {
return `❌ Unknown category: "${args.category}". Available: ${Object.keys({ ...DEFAULT_CATEGORIES, ...userCategories }).join(", ")}`
}
agentToUse = SISYPHUS_JUNIOR_AGENT
categoryModel = parseModelString(resolved.config.model)
categoryPromptAppend = resolved.promptAppend || undefined
} else {
agentToUse = args.subagent_type!.trim()
if (!agentToUse) {
return `❌ Agent name cannot be empty.`
}
// Validate agent exists and is callable (not a primary agent)
try {
const agentsResult = await client.app.agents()
type AgentInfo = { name: string; mode?: "subagent" | "primary" | "all" }
const agents = (agentsResult as { data?: AgentInfo[] }).data ?? agentsResult as unknown as AgentInfo[]
const callableAgents = agents.filter((a) => a.mode !== "primary")
const callableNames = callableAgents.map((a) => a.name)
if (!callableNames.includes(agentToUse)) {
const isPrimaryAgent = agents.some((a) => a.name === agentToUse && a.mode === "primary")
if (isPrimaryAgent) {
return `❌ Cannot call primary agent "${agentToUse}" via sisyphus_task. Primary agents are top-level orchestrators.`
}
const availableAgents = callableNames
.sort()
.join(", ")
return `❌ Unknown agent: "${agentToUse}". Available agents: ${availableAgents}`
}
} catch {
// If we can't fetch agents, proceed anyway - the session.prompt will fail with a clearer error
}
}
const systemContent = buildSystemContent({ skillContent, categoryPromptAppend })
if (runInBackground) {
try {
const task = await manager.launch({
description: args.description,
prompt: args.prompt,
agent: agentToUse,
parentSessionID: ctx.sessionID,
parentMessageID: ctx.messageID,
parentModel,
parentAgent,
model: categoryModel,
skills: args.skills,
skillContent: systemContent,
})
ctx.metadata?.({
title: args.description,
metadata: { sessionId: task.sessionID, category: args.category },
})
return `Background task launched.
Task ID: ${task.id}
Session ID: ${task.sessionID}
Description: ${task.description}
Agent: ${task.agent}${args.category ? ` (category: ${args.category})` : ""}
Status: ${task.status}
System notifies on completion. Use \`background_output\` with task_id="${task.id}" to check.`
} catch (error) {
const message = error instanceof Error ? error.message : String(error)
return `❌ Failed to launch task: ${message}`
}
}
const toastManager = getTaskToastManager()
let taskId: string | undefined
let syncSessionID: string | undefined
try {
const createResult = await client.session.create({
body: {
parentID: ctx.sessionID,
title: `Task: ${args.description}`,
},
})
if (createResult.error) {
return `❌ Failed to create session: ${createResult.error}`
}
const sessionID = createResult.data.id
syncSessionID = sessionID
subagentSessions.add(sessionID)
taskId = `sync_${sessionID.slice(0, 8)}`
const startTime = new Date()
if (toastManager) {
toastManager.addTask({
id: taskId,
description: args.description,
agent: agentToUse,
isBackground: false,
skills: args.skills,
})
}
ctx.metadata?.({
title: args.description,
metadata: { sessionId: sessionID, category: args.category, sync: true },
})
// Use promptAsync to avoid changing main session's active state
let promptError: Error | undefined
await client.session.promptAsync({
path: { id: sessionID },
body: {
agent: agentToUse,
model: categoryModel,
system: systemContent,
tools: {
task: false,
sisyphus_task: false,
},
parts: [{ type: "text", text: args.prompt }],
},
}).catch((error) => {
promptError = error instanceof Error ? error : new Error(String(error))
})
if (promptError) {
if (toastManager && taskId !== undefined) {
toastManager.removeTask(taskId)
}
const errorMessage = promptError.message
if (errorMessage.includes("agent.name") || errorMessage.includes("undefined")) {
return `❌ Agent "${agentToUse}" not found. Make sure the agent is registered in your opencode.json or provided by a plugin.\n\nSession ID: ${sessionID}`
}
return `❌ Failed to send prompt: ${errorMessage}\n\nSession ID: ${sessionID}`
}
// Poll for session completion
const POLL_INTERVAL_MS = 500
const MAX_POLL_TIME_MS = 10 * 60 * 1000
const pollStart = Date.now()
while (Date.now() - pollStart < MAX_POLL_TIME_MS) {
await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL_MS))
const statusResult = await client.session.status()
const allStatuses = (statusResult.data ?? {}) as Record<string, { type: string }>
const sessionStatus = allStatuses[sessionID]
// Break if session is idle OR no longer in status (completed and removed)
if (!sessionStatus || sessionStatus.type === "idle") {
break
}
}
const messagesResult = await client.session.messages({
path: { id: sessionID },
})
if (messagesResult.error) {
return `❌ Error fetching result: ${messagesResult.error}\n\nSession ID: ${sessionID}`
}
const messages = ((messagesResult as { data?: unknown }).data ?? messagesResult) as Array<{
info?: { role?: string; time?: { created?: number } }
parts?: Array<{ type?: string; text?: string }>
}>
const assistantMessages = messages
.filter((m) => m.info?.role === "assistant")
.sort((a, b) => (b.info?.time?.created ?? 0) - (a.info?.time?.created ?? 0))
const lastMessage = assistantMessages[0]
if (!lastMessage) {
return `❌ No assistant response found.\n\nSession ID: ${sessionID}`
}
const textParts = lastMessage?.parts?.filter((p) => p.type === "text") ?? []
const textContent = textParts.map((p) => p.text ?? "").filter(Boolean).join("\n")
const duration = formatDuration(startTime)
if (toastManager) {
toastManager.removeTask(taskId)
}
subagentSessions.delete(sessionID)
return `Task completed in ${duration}.
Agent: ${agentToUse}${args.category ? ` (category: ${args.category})` : ""}
Session ID: ${sessionID}
---
${textContent || "(No text output)"}`
} catch (error) {
if (toastManager && taskId !== undefined) {
toastManager.removeTask(taskId)
}
if (syncSessionID) {
subagentSessions.delete(syncSessionID)
}
const message = error instanceof Error ? error.message : String(error)
return `❌ Task failed: ${message}`
}
},
})
}

View File

@@ -0,0 +1,9 @@
export interface SisyphusTaskArgs {
description: string
prompt: string
category?: string
subagent_type?: string
run_in_background: boolean
resume?: string
skills: string[]
}