THE ORCHESTRATOR (#600)

* feat(background-agent): add ConcurrencyManager for model-based limits 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * fix(background-agent): set default concurrency to 5 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(background-agent): support 0 as unlimited concurrency Setting concurrency to 0 means unlimited (Infinity). Works for defaultConcurrency, providerConcurrency, and modelConcurrency. 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): use auto flag for session resumption after compaction - executor.ts: Added `auto: true` to summarize body, removed subsequent prompt_async call - preemptive-compaction/index.ts: Added `auto: true` to summarize body, removed subsequent promptAsync call - executor.test.ts: Updated test expectation to include `auto: true` Instead of sending 'Continue' prompt after compaction, use SessionCompaction's `auto: true` feature which auto-resumes the session. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(agents): update sisyphus orchestrator Update Sisyphus agent orchestrator with latest changes. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(features): update background agent manager Update background agent manager with latest configuration changes. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(features): update init-deep template Update initialization template with latest configuration. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(hooks): update hook constants and configuration Update hook constants and configuration across agent-usage-reminder, keyword-detector, and claude-code-hooks. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(tools): remove background-task tool Remove background-task tool module completely: - src/tools/background-task/constants.ts - src/tools/background-task/index.ts - src/tools/background-task/tools.ts - src/tools/background-task/types.ts 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(tools): update tool exports and main plugin entry Update tool index exports and main plugin entry point after background-task tool removal. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(auth): update constants to match CLIProxyAPI (50min buffer, 2 endpoints) - Changed ANTIGRAVITY_TOKEN_REFRESH_BUFFER_MS from 60,000ms (1min) to 3,000,000ms (50min) - Removed autopush endpoint from ANTIGRAVITY_ENDPOINT_FALLBACKS (now 2 endpoints: daily → prod) - Added comprehensive test suite with 6 tests covering all updated constants - Updated comments to reflect CLIProxyAPI parity 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(auth): remove PKCE to match CLIProxyAPI Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * feat(auth): implement port 51121 with OS fallback Add port fallback logic to OAuth callback server: - Try port 51121 (ANTIGRAVITY_CALLBACK_PORT) first - Fallback to OS-assigned port on EADDRINUSE - Add redirectUri property to CallbackServerHandle - Return actual bound port in handle.port Add comprehensive port handling tests (5 new tests): - Should prefer port 51121 - Should return actual bound port - Should fallback when port occupied - Should cleanup and release port on close - Should provide redirect URI with actual port All 16 tests passing (11 existing + 5 new). 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * test(auth): add token expiry tests for 50-min buffer Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * feat(agents): add Prometheus system prompt and planner methodology Add prometheus-prompt.ts with comprehensive planner agent system prompt. Update plan-prompt.ts with streamlined Prometheus workflow including: - Context gathering via explore/librarian agents - Metis integration for AI slop guardrails - Structured plan output format 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): add Metis plan consultant agent Add Metis agent for pre-planning analysis that identifies: - Hidden requirements and implicit constraints - AI failure points and common mistakes - Clarifying questions before planning begins 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): add Momus plan reviewer agent Add Momus agent for rigorous plan review against: - Clarity and verifiability standards - Completeness checks - AI slop detection 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): add Sisyphus-Junior focused executor agent Add Sisyphus-Junior agent for focused task execution: - Same discipline as Sisyphus, no delegation capability - Used for category-based task spawning via sisyphus_task tool 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): add orchestrator-sisyphus agent Add orchestrator-sisyphus agent for complex workflow orchestration: - Manages multi-agent workflows - Coordinates between specialized agents - Handles start-work command execution 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(skill-loader): add skill-content resolver for agent skills Add resolveMultipleSkills() for resolving skill content to prepend to agent prompts. Includes test coverage for resolution logic. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): add category and skills support to buildAgent Extend buildAgent() to support: - category: inherit model/temperature from DEFAULT_CATEGORIES - skills: prepend resolved skill content to agent prompt Includes comprehensive test coverage for new functionality. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): register new agents in index and types - Export Metis, Momus, orchestrator-sisyphus in builtinAgents - Add new agent names to BuiltinAgentName type - Update AGENTS.md documentation with new agents 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(features): add boulder-state persistence Add boulder-state feature for persisting workflow state: - storage.ts: File I/O operations for state persistence - types.ts: State interfaces - Includes test coverage 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(skills): add frontend-ui-ux builtin skill Add frontend-ui-ux skill for designer-turned-developer UI work: - SKILL.md with comprehensive design principles - skills.ts updated with skill template 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(tools): add sisyphus_task tool for category-based delegation Add sisyphus_task tool supporting: - Category-based task delegation (visual, business-logic, etc.) - Direct agent targeting - Background execution with resume capability - DEFAULT_CATEGORIES configuration Includes test coverage. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(background-agent): add resume capability and model field - Add resume() method for continuing existing agent sessions - Add model field to BackgroundTask and LaunchInput types - Update launch() to pass model to session.prompt() - Comprehensive test coverage for resume functionality 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): add task-resume-info hook Add hook for injecting task resume information into tool outputs. Enables seamless continuation of background agent sessions. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): add prometheus-md-only write restriction hook Add hook that restricts Prometheus planner to writing only .md files in the .sisyphus/ directory. Prevents planners from implementing. Includes test coverage. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): add start-work hook for Sisyphus workflow Add hook for detecting /start-work command and triggering orchestrator-sisyphus agent for plan execution. Includes test coverage. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): add sisyphus-orchestrator hook Add hook for orchestrating Sisyphus agent workflows: - Coordinates task execution between agents - Manages workflow state persistence - Handles agent handoffs Includes comprehensive test coverage. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): export new hooks in index Export new hooks: - createPrometheusMdOnlyHook - createTaskResumeInfoHook - createStartWorkHook - createSisyphusOrchestratorHook 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(todo-enforcer): add skipAgents option and improve permission check - Add skipAgents option to skip continuation for specified agents - Default skip: Prometheus (Planner) - Improve tool permission check to handle 'allow'/'deny' string values - Add agent name detection from session messages 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(config): add categories, new agents and hooks to schema Update Zod schema with: - CategoryConfigSchema for task delegation categories - CategoriesConfigSchema for user category overrides - New agents: Metis (Plan Consultant) - New hooks: prometheus-md-only, start-work, sisyphus-orchestrator - New commands: start-work - Agent category and skills fields Includes schema test coverage. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(commands): add start-work command Add /start-work command for executing Prometheus plans: - start-work.ts: Command template for orchestrator-sisyphus - commands.ts: Register command with agent binding - types.ts: Add command name to type union 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(migration): add backup creation and category migration - Create timestamped backup before migration writes - Add migrateAgentConfigToCategory() for model→category migration - Add shouldDeleteAgentConfig() for cleanup when matching defaults - Add Prometheus and Metis to agent name map - Comprehensive test coverage for new functionality 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(config-handler): add Sisyphus-Junior and orchestrator support - Add Sisyphus-Junior agent creation - Add orchestrator-sisyphus tool restrictions - Rename Planner-Sisyphus to Prometheus (Planner) - Use PROMETHEUS_SYSTEM_PROMPT and PROMETHEUS_PERMISSION 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(cli): add categories config for Antigravity auth Add category model overrides for Gemini Antigravity authentication: - visual: gemini-3-pro-high - artistry: gemini-3-pro-high - writing: gemini-3-pro-high 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(sisyphus): update to use sisyphus_task and add resume docs - Update example code from background_task to sisyphus_task - Add 'Resume Previous Agent' documentation section - Remove model name from Oracle section heading - Disable call_omo_agent tool for Sisyphus 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor: update tool references from background_task to sisyphus_task Update all references across: - agent-usage-reminder: Update AGENT_TOOLS and REMINDER_MESSAGE - claude-code-hooks: Update comment - call-omo-agent: Update constants and tool restrictions - init-deep template: Update example code - tools/index.ts: Export sisyphus_task, remove background_task 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hook-message-injector): add ToolPermission type support Add ToolPermission type union: boolean | 'allow' | 'deny' | 'ask' Update StoredMessage and related interfaces for new permission format. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(main): wire up new tools, hooks and agents Wire up in main plugin entry: - Import and create sisyphus_task tool - Import and wire taskResumeInfo, startWork, sisyphusOrchestrator hooks - Update tool restrictions from background_task to sisyphus_task - Pass userCategories to createSisyphusTask 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * docs: update documentation for Prometheus and new features Update documentation across all language versions: - Rename Planner-Sisyphus to Prometheus (Planner) - Add Metis (Plan Consultant) agent documentation - Add Categories section with usage examples - Add sisyphus_task tool documentation - Update AGENTS.md with new structure and complexity hotspots - Update src/tools/AGENTS.md with sisyphus_task category 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * build: regenerate schema.json with new types Update JSON schema with: - New agents: Prometheus (Planner), Metis (Plan Consultant) - New hooks: prometheus-md-only, start-work, sisyphus-orchestrator - New commands: start-work - New skills: frontend-ui-ux - CategoryConfigSchema for task delegation - Agent category and skills fields 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * skill * feat: add toast notifications for task execution - Display toast when background task starts in BackgroundManager - Display toast when sisyphus_task sync task starts - Wire up prometheus-md-only hook initialization in main plugin This provides user feedback in OpenCode TUI where task TUI is not visible. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): add read-only warning injection for Prometheus task delegation When Prometheus (Planner) spawns subagents via task tools (sisyphus_task, task, call_omo_agent), a system directive is injected into the prompt to ensure subagents understand they are in a planning consultation context and must NOT modify files. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(hooks): add mandatory hands-on verification enforcement for orchestrated tasks - sisyphus-orchestrator: Add verification reminder with tool matrix (playwright/interactive_bash/curl) - start-work: Inject detailed verification workflow with deliverable-specific guidance 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance * docs(agents): clarify oracle and metis agent descriptions emphasizing read-only consultation roles - Oracle: high-IQ reasoning specialist for debugging and architecture (read-only) - Metis: updated description to align with oracle's consultation-only model - Updated AGENTS.md with clarified agent responsibilities * docs(orchestrator): emphasize oracle as read-only consultation agent - Updated orchestrator-sisyphus agent descriptions - Updated sisyphus-prompt-builder to highlight oracle's read-only consultation role - Clarified that oracle provides high-IQ reasoning without write operations * docs(refactor,root): update oracle consultation model in feature templates and root docs - Updated refactor command template to emphasize oracle's read-only role - Updated root AGENTS.md with oracle agent description emphasizing high-IQ debugging and architecture consultation - Clarified oracle as non-write agent for design and debugging support * feat(features): add TaskToastManager for consolidated task notifications - Create task-toast-manager feature with singleton pattern - Show running task list (newest first) when new task starts - Track queued tasks status from ConcurrencyManager - Integrate with BackgroundManager and sisyphus-task tool 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance * feat(hooks): add resume session_id to verification reminders for orchestrator subagent work When subagent work fails verification, show exact sisyphus_task(resume="...") command with session_id for immediate retry. Consolidates verification workflow across boulder and standalone modes. * refactor(hooks): remove duplicate verification enforcement from start-work hook Verification reminders are now centralized in sisyphus-orchestrator hook, eliminating redundant code in start-work. The orchestrator hook handles all verification messaging across both boulder and standalone modes. * test(hooks): update prometheus-md-only test assertions and formatting Updated test structure and assertions to match current output format. Improved test clarity while maintaining complete coverage of markdown validation and write restriction behavior. * orchestrator * feat(skills): add git-master skill for atomic commits and history management - Add comprehensive git-master skill for commit, rebase, and history operations - Implements atomic commit strategy with multi-file splitting rules - Includes style detection, branch analysis, and history search capabilities - Provides three modes: COMMIT, REBASE, HISTORY_SEARCH 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * docs(agents): add pre-delegation planning section to Sisyphus prompt - Add SISYPHUS_PRE_DELEGATION_PLANNING section with mandatory declaration rules - Implements 3-step decision tree: Identify → Select → Declare - Forces explicit category/agent/skill declaration before every sisyphus_task call - Includes mandatory 4-part format: Category/Agent, Reason, Skills, Expected Outcome - Provides examples (CORRECT vs WRONG) and enforcement rules - Follows prompt engineering best practices: Clear, CoT, Structured, Examples 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(tools): rename agent parameter to subagent_type in sisyphus_task - Update parameter name from 'agent' to 'subagent_type' for consistency with call_omo_agent - Update all references and error messages - Remove deprecated 'agent' field from SisyphusTaskArgs interface - Update git-master skill documentation to reflect parameter name change 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(agents): change orchestrator-sisyphus default model to claude-sonnet-4-5 - Update orchestrator-sisyphus model from opus-4-5 to sonnet-4-5 for better cost efficiency - Keep Prometheus using opus-4-5 for planning tasks 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * refactor(config): make Prometheus model independent from plan agent config - Prometheus no longer inherits model from plan agent configuration - Fallback chain: session default model -> claude-opus-4-5 - Removes coupling between Prometheus and legacy plan agent settings 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * fix(momus): allow system directives in input validation System directives (XML tags like <system-reminder>) are automatically injected and should be ignored during input validation. Only reject when there's actual user text besides the file path. 🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(prometheus): enhance high accuracy mode with mandatory Momus loop When user requests high accuracy: - Momus review loop is now mandatory until 'OKAY' - No excuses allowed - must fix ALL issues - No maximum retry limit - keep looping until approved - Added clear explanation of what 'OKAY' means 🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(prometheus): enhance reference section with detailed guidance References now include: - Pattern references (existing code to follow) - API/Type references (contracts to implement) - Test references (testing patterns) - Documentation references (specs and requirements) - External references (libraries and frameworks) - Explanation of WHY each reference matters The executor has no interview context - references are their only guide. 🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) * feat(git-master): add configurable commit footer and co-author options Add git_master config with commit_footer and include_co_authored_by flags. Users can disable Sisyphus attribution in commits via oh-my-opencode.json. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * feat(hooks): add single-task directive and system-reminder tags to orchestrator Inject SINGLE_TASK_DIRECTIVE when orchestrator calls sisyphus_task to enforce atomic task delegation. Wrap verification reminders in <system-reminder> tags for better LLM attention. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * refactor: use ContextCollector for hook injection and remove unused background tools Split changes: - Replace injectHookMessage with ContextCollector.register() pattern for improved hook content injection - Remove unused background task tools infrastructure (createBackgroundOutput, createBackgroundCancel) 🤖 Generated with assistance of OhMyOpenCode (https://github.com/code-yeongyu/oh-my-opencode) * chore(context-injector): add debug logging for context injection tracing Add DEBUG log statements to trace context injection flow: - Log message transform hook invocations - Log sessionID extraction from message info - Log hasPending checks for context collector - Log hook content registration to contextCollector 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance * fix(context-injector): prepend to user message instead of separate synthetic message - Change from creating separate synthetic user message to prepending context directly to last user message's text part - Separate synthetic messages were ignored by model (treated as previous turn) - Prepending to clone ensures: UI shows original, model receives prepended content - Update tests to reflect new behavior * feat(prometheus): enforce mandatory todo registration on plan generation trigger * fix(sisyphus-task): add proper error handling for sync mode and implement BackgroundManager.resume() - Add try-catch for session.prompt() in sync mode with detailed error messages - Sort assistant messages by time to get the most recent response - Add 'No assistant response found' error handling - Implement BackgroundManager.resume() method for task resumption - Fix ConcurrencyManager type mismatch (model → concurrencyKey) * docs(sisyphus-task): clarify resume usage with session_id and add when-to-use guidance - Fix terminology: 'Task ID' → 'Session ID' in resume parameter docs - Add clear 'WHEN TO USE resume' section with concrete scenarios - Add example usage pattern in Sisyphus agent prompt - Emphasize token savings and context preservation benefits * fix(agents): block task/sisyphus_task/call_omo_agent from explore and librarian Exploration agents should not spawn other agents - they are leaf nodes in the agent hierarchy for codebase search only. * refactor(oracle): change default model from GPT-5.2 to Claude Opus 4.5 * feat(oracle): change default model to claude-opus-4-5 * fix(sisyphus-orchestrator): check boulder session_ids before filtering sessions Bug: continuation was not triggered even when boulder.json existed with session_ids because the session filter ran BEFORE reading boulder state. Fix: Read boulder state first, then include boulder sessions in the allowed sessions for continuation. * feat(task-toast): display skills and concurrency info in toast - Add skills field to TrackedTask and LaunchInput types - Show skills in task list message as [skill1, skill2] - Add concurrency slot info [running/limit] in Running header - Pass skills from sisyphus_task to toast manager (sync & background) - Add unit tests for new toast features * refactor(categories): rename high-iq to ultrabrain * feat(sisyphus-task): add skillContent support to background agent launching - Add optional skillContent field to LaunchInput type - Implement buildSystemContent utility to combine skill and category prompts - Update BackgroundManager to pass skillContent as system parameter - Add comprehensive tests for skillContent optionality and buildSystemContent logic 🤖 Generated with assistance of oh-my-opencode * Revert "refactor(tools): remove background-task tool" This reverts commit 6dbc4c095badd400e024510554a42a0dc018ae42. * refactor(sisyphus-task): rename background to run_in_background * fix(oracle): use gpt-5.2 as default model * test(sisyphus-task): add resume with background parameter tests * feat(start-work): auto-select single incomplete plan and use system-reminder format - Auto-select when only one incomplete plan exists among multiple - Wrap multiple plans message in <system-reminder> tag - Change prompt to 'ask user' style for agent guidance - Add 'All Plans Complete' state handling * feat(sisyphus-task): make skills parameter required - Add validation for skills parameter (must be provided, use [] if empty) - Update schema to remove .optional() - Update type definition to make skills non-optional - Fix existing tests to include skills parameter * fix: prevent session model change when sending notifications - background-agent: use only parentModel, remove prevMessage fallback - todo-continuation: don't pass model to preserve session's lastModel - Remove unused imports (findNearestMessageWithFields, fs, path) Root cause: session.prompt with model param changes session's lastModel * fix(sisyphus-orchestrator): register handler in event loop for boulder continuation * fix(sisyphus_task): use promptAsync for sync mode to preserve main session - session.prompt() changes the active session, causing UI model switch - Switch to promptAsync + polling to avoid main session state change - Matches background-agent pattern for consistency * fix(sisyphus-orchestrator): only trigger boulder continuation for orchestrator-sisyphus agent * feat(background-agent): add parentAgent tracking to preserve agent context in background tasks - Add parentAgent field to BackgroundTask, LaunchInput, and ResumeInput interfaces - Pass parentAgent through background task manager to preserve agent identity - Update sisyphus-orchestrator to set orchestrator-sisyphus agent context - Add session tracking for background agents to prevent context loss - Propagate agent context in background-task and sisyphus-task tools This ensures background/subagent spawned tasks maintain proper agent context for notifications and continuity. 🤖 Generated with assistance of oh-my-opencode * fix(antigravity): sync plugin.ts with PKCE-removed oauth.ts API Remove decodeState import and update OAuth flow to use simple state string comparison for CSRF protection instead of PKCE verifier. Update exchangeCode calls to match new signature (code, redirectUri, clientId, clientSecret). * fix(hook-message-injector): preserve agent info with two-pass message lookup findNearestMessageWithFields now has a fallback pass that returns messages with ANY useful field (agent OR model) instead of requiring ALL fields. This prevents parentAgent from being lost when stored messages don't have complete model info. * fix(sisyphus-task): use SDK session.messages API for parent agent lookup Background task notifications were showing 'build' agent instead of the actual parent agent (e.g., 'Sisyphus'). The hook-injected message storage only contains limited info; the actual agent name is in the SDK session. Changes: - Add getParentAgentFromSdk() to query SDK messages API - Look up agent from SDK first, fallback to hook-injected messages - Ensures background tasks correctly preserve parent agent context * fix(sisyphus-task): use ctx.agent directly for parentAgent The tool context already provides the agent name via ctx.agent. The previous SDK session.messages lookup was completely wrong - SDK messages don't store agent info per message. Removes useless getParentAgentFromSdk function. * feat(prometheus-md-only): allow .md files anywhere, only block code files Prometheus (Planner) can now write .md files anywhere, not just .sisyphus/. Still blocks non-.md files (code) to enforce read-only planning for code. This allows planners to write commentary and analysis in markdown format. * Revert "feat(prometheus-md-only): allow .md files anywhere, only block code files" This reverts commit c600111597591e1862696ee0b92051e587aa1a6b. * fix(momus): accept bracket-style system directives in input validation Momus was rejecting inputs with bracket-style directives like [analyze-mode] and [SYSTEM DIRECTIVE...] because it only recognized XML-style tags. Now accepts: - XML tags: <system-reminder>, <context>, etc. - Bracket blocks: [analyze-mode], [SYSTEM DIRECTIVE...], [SYSTEM REMINDER...], etc. * fix(sisyphus-orchestrator): inject delegation warning before Write/Edit outside .sisyphus - Add ORCHESTRATOR_DELEGATION_REQUIRED strong warning in tool.execute.before - Fix tool.execute.after filePath detection using pendingFilePaths Map - before stores filePath by callID, after retrieves and deletes it - Fixes bug where output.metadata.filePath was undefined * docs: add orchestration, category-skill, and CLI guides * fix(cli): correct category names in Antigravity migration (visual → visual-engineering) * fix(sisyphus-task): prevent infinite polling when session removed from status * fix(tests): update outdated test expectations - constants.test.ts: Update endpoint count (2→3) and token buffer (50min→60sec) - token.test.ts: Update expiry tests to use 60-second buffer - sisyphus-orchestrator: Add fallback to output.metadata.filePath when callID missing --------- Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-01-09 02:24:43 +09:00
parent 8394926fe1
commit 768ecd928b
92 changed files with 13771 additions and 672 deletions
--- a/src/tools/sisyphus-task/constants.ts
+++ b/src/tools/sisyphus-task/constants.ts
@@ -0,0 +1,254 @@
+import type { CategoryConfig } from "../../config/schema"
+
+export const VISUAL_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on VISUAL/UI tasks.
+
+Design-first mindset:
+- Bold aesthetic choices over safe defaults
+- Unexpected layouts, asymmetry, grid-breaking elements
+- Distinctive typography (avoid: Arial, Inter, Roboto, Space Grotesk)
+- Cohesive color palettes with sharp accents
+- High-impact animations with staggered reveals
+- Atmosphere: gradient meshes, noise textures, layered transparencies
+
+AVOID: Generic fonts, purple gradients on white, predictable layouts, cookie-cutter patterns.
+</Category_Context>`
+
+export const STRATEGIC_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on BUSINESS LOGIC / ARCHITECTURE tasks.
+
+Strategic advisor mindset:
+- Bias toward simplicity: least complex solution that fulfills requirements
+- Leverage existing code/patterns over new components
+- Prioritize developer experience and maintainability
+- One clear recommendation with effort estimate (Quick/Short/Medium/Large)
+- Signal when advanced approach warranted
+
+Response format:
+- Bottom line (2-3 sentences)
+- Action plan (numbered steps)
+- Risks and mitigations (if relevant)
+</Category_Context>`
+
+export const ARTISTRY_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on HIGHLY CREATIVE / ARTISTIC tasks.
+
+Artistic genius mindset:
+- Push far beyond conventional boundaries
+- Explore radical, unconventional directions
+- Surprise and delight: unexpected twists, novel combinations
+- Rich detail and vivid expression
+- Break patterns deliberately when it serves the creative vision
+
+Approach:
+- Generate diverse, bold options first
+- Embrace ambiguity and wild experimentation
+- Balance novelty with coherence
+- This is for tasks requiring exceptional creativity
+</Category_Context>`
+
+export const QUICK_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on SMALL / QUICK tasks.
+
+Efficient execution mindset:
+- Fast, focused, minimal overhead
+- Get to the point immediately
+- No over-engineering
+- Simple solutions for simple problems
+
+Approach:
+- Minimal viable implementation
+- Skip unnecessary abstractions
+- Direct and concise
+</Category_Context>
+
+<Caller_Warning>
+⚠️ THIS CATEGORY USES A LESS CAPABLE MODEL (claude-haiku-4-5).
+
+The model executing this task has LIMITED reasoning capacity. Your prompt MUST be:
+
+**EXHAUSTIVELY EXPLICIT** - Leave NOTHING to interpretation:
+1. MUST DO: List every required action as atomic, numbered steps
+2. MUST NOT DO: Explicitly forbid likely mistakes and deviations
+3. EXPECTED OUTPUT: Describe exact success criteria with concrete examples
+
+**WHY THIS MATTERS:**
+- Less capable models WILL deviate without explicit guardrails
+- Vague instructions → unpredictable results
+- Implicit expectations → missed requirements
+
+**PROMPT STRUCTURE (MANDATORY):**
+\`\`\`
+TASK: [One-sentence goal]
+
+MUST DO:
+1. [Specific action with exact details]
+2. [Another specific action]
+...
+
+MUST NOT DO:
+- [Forbidden action + why]
+- [Another forbidden action]
+...
+
+EXPECTED OUTPUT:
+- [Exact deliverable description]
+- [Success criteria / verification method]
+\`\`\`
+
+If your prompt lacks this structure, REWRITE IT before delegating.
+</Caller_Warning>`
+
+export const MOST_CAPABLE_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on COMPLEX / MOST-CAPABLE tasks.
+
+Maximum capability mindset:
+- Bring full reasoning power to bear
+- Consider all edge cases and implications
+- Deep analysis before action
+- Quality over speed
+
+Approach:
+- Thorough understanding first
+- Comprehensive solution design
+- Meticulous execution
+- This is for the most challenging problems
+</Category_Context>`
+
+export const WRITING_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on WRITING / PROSE tasks.
+
+Wordsmith mindset:
+- Clear, flowing prose
+- Appropriate tone and voice
+- Engaging and readable
+- Proper structure and organization
+
+Approach:
+- Understand the audience
+- Draft with care
+- Polish for clarity and impact
+- Documentation, READMEs, articles, technical writing
+</Category_Context>`
+
+export const GENERAL_CATEGORY_PROMPT_APPEND = `<Category_Context>
+You are working on GENERAL tasks.
+
+Balanced execution mindset:
+- Practical, straightforward approach
+- Good enough is good enough
+- Focus on getting things done
+
+Approach:
+- Standard best practices
+- Reasonable trade-offs
+- Efficient completion
+</Category_Context>
+
+<Caller_Warning>
+⚠️ THIS CATEGORY USES A MID-TIER MODEL (claude-sonnet-4-5).
+
+While capable, this model benefits significantly from EXPLICIT instructions.
+
+**PROVIDE CLEAR STRUCTURE:**
+1. MUST DO: Enumerate required actions explicitly - don't assume inference
+2. MUST NOT DO: State forbidden actions to prevent scope creep or wrong approaches
+3. EXPECTED OUTPUT: Define concrete success criteria and deliverables
+
+**COMMON PITFALLS WITHOUT EXPLICIT INSTRUCTIONS:**
+- Model may take shortcuts that miss edge cases
+- Implicit requirements get overlooked
+- Output format may not match expectations
+- Scope may expand beyond intended boundaries
+
+**RECOMMENDED PROMPT PATTERN:**
+\`\`\`
+TASK: [Clear, single-purpose goal]
+
+CONTEXT: [Relevant background the model needs]
+
+MUST DO:
+- [Explicit requirement 1]
+- [Explicit requirement 2]
+
+MUST NOT DO:
+- [Boundary/constraint 1]
+- [Boundary/constraint 2]
+
+EXPECTED OUTPUT:
+- [What success looks like]
+- [How to verify completion]
+\`\`\`
+
+The more explicit your prompt, the better the results.
+</Caller_Warning>`
+
+export const DEFAULT_CATEGORIES: Record<string, CategoryConfig> = {
+  "visual-engineering": {
+    model: "google/gemini-3-pro-preview",
+    temperature: 0.7,
+  },
+  ultrabrain: {
+    model: "openai/gpt-5.2",
+    temperature: 0.1,
+  },
+  artistry: {
+    model: "google/gemini-3-pro-preview",
+    temperature: 0.9,
+  },
+  quick: {
+    model: "anthropic/claude-haiku-4-5",
+    temperature: 0.3,
+  },
+  "most-capable": {
+    model: "anthropic/claude-opus-4-5",
+    temperature: 0.1,
+  },
+  writing: {
+    model: "google/gemini-3-flash-preview",
+    temperature: 0.5,
+  },
+  general: {
+    model: "anthropic/claude-sonnet-4-5",
+    temperature: 0.3,
+  },
+}
+
+export const CATEGORY_PROMPT_APPENDS: Record<string, string> = {
+  "visual-engineering": VISUAL_CATEGORY_PROMPT_APPEND,
+  ultrabrain: STRATEGIC_CATEGORY_PROMPT_APPEND,
+  artistry: ARTISTRY_CATEGORY_PROMPT_APPEND,
+  quick: QUICK_CATEGORY_PROMPT_APPEND,
+  "most-capable": MOST_CAPABLE_CATEGORY_PROMPT_APPEND,
+  writing: WRITING_CATEGORY_PROMPT_APPEND,
+  general: GENERAL_CATEGORY_PROMPT_APPEND,
+}
+
+export const CATEGORY_DESCRIPTIONS: Record<string, string> = {
+  "visual-engineering": "Frontend, UI/UX, design, styling, animation",
+  ultrabrain: "Strict architecture design, very complex business logic",
+  artistry: "Highly creative/artistic tasks, novel ideas",
+  quick: "Cheap & fast - small tasks with minimal overhead, budget-friendly",
+  "most-capable": "Complex tasks requiring maximum capability",
+  writing: "Documentation, prose, technical writing",
+  general: "General purpose tasks",
+}
+
+const BUILTIN_CATEGORIES = Object.keys(DEFAULT_CATEGORIES).join(", ")
+
+export const SISYPHUS_TASK_DESCRIPTION = `Spawn agent task with category-based or direct agent selection.
+
+MUTUALLY EXCLUSIVE: Provide EITHER category OR agent, not both (unless resuming).
+
+- category: Use predefined category (${BUILTIN_CATEGORIES}) → Spawns Sisyphus-Junior with category config
+- agent: Use specific agent directly (e.g., "oracle", "explore")
+- background: true=async (returns task_id), false=sync (waits for result). Default: false. Use background=true ONLY for parallel exploration with 5+ independent queries.
+- resume: Session ID to resume (from previous task output). Continues agent with FULL CONTEXT PRESERVED - saves tokens, maintains continuity.
+- skills: Array of skill names to prepend to prompt (e.g., ["playwright", "frontend-ui-ux"]). Skills will be resolved and their content prepended with a separator. Empty array = no prepending.
+
+**WHEN TO USE resume:**
+- Task failed/incomplete → resume with "fix: [specific issue]"
+- Need follow-up on previous result → resume with additional question
+- Multi-turn conversation with same agent → always resume instead of new task
+
+Prompts MUST be in English.`
--- a/src/tools/sisyphus-task/index.ts
+++ b/src/tools/sisyphus-task/index.ts
@@ -0,0 +1,3 @@
+export { createSisyphusTask, type SisyphusTaskToolOptions } from "./tools"
+export type * from "./types"
+export * from "./constants"
--- a/src/tools/sisyphus-task/tools.test.ts
+++ b/src/tools/sisyphus-task/tools.test.ts
@@ -0,0 +1,430 @@
+import { describe, test, expect } from "bun:test"
+import { DEFAULT_CATEGORIES, CATEGORY_PROMPT_APPENDS, CATEGORY_DESCRIPTIONS, SISYPHUS_TASK_DESCRIPTION } from "./constants"
+import type { CategoryConfig } from "../../config/schema"
+
+function resolveCategoryConfig(
+  categoryName: string,
+  userCategories?: Record<string, CategoryConfig>
+): { config: CategoryConfig; promptAppend: string } | null {
+  const defaultConfig = DEFAULT_CATEGORIES[categoryName]
+  const userConfig = userCategories?.[categoryName]
+  const defaultPromptAppend = CATEGORY_PROMPT_APPENDS[categoryName] ?? ""
+
+  if (!defaultConfig && !userConfig) {
+    return null
+  }
+
+  const config: CategoryConfig = {
+    ...defaultConfig,
+    ...userConfig,
+    model: userConfig?.model ?? defaultConfig?.model ?? "anthropic/claude-sonnet-4-5",
+  }
+
+  let promptAppend = defaultPromptAppend
+  if (userConfig?.prompt_append) {
+    promptAppend = defaultPromptAppend
+      ? defaultPromptAppend + "\n\n" + userConfig.prompt_append
+      : userConfig.prompt_append
+  }
+
+  return { config, promptAppend }
+}
+
+describe("sisyphus-task", () => {
+  describe("DEFAULT_CATEGORIES", () => {
+    test("visual-engineering category has gemini model", () => {
+      // #given
+      const category = DEFAULT_CATEGORIES["visual-engineering"]
+
+      // #when / #then
+      expect(category).toBeDefined()
+      expect(category.model).toBe("google/gemini-3-pro-preview")
+      expect(category.temperature).toBe(0.7)
+    })
+
+    test("ultrabrain category has gpt model", () => {
+      // #given
+      const category = DEFAULT_CATEGORIES["ultrabrain"]
+
+      // #when / #then
+      expect(category).toBeDefined()
+      expect(category.model).toBe("openai/gpt-5.2")
+      expect(category.temperature).toBe(0.1)
+    })
+  })
+
+  describe("CATEGORY_PROMPT_APPENDS", () => {
+    test("visual-engineering category has design-focused prompt", () => {
+      // #given
+      const promptAppend = CATEGORY_PROMPT_APPENDS["visual-engineering"]
+
+      // #when / #then
+      expect(promptAppend).toContain("VISUAL/UI")
+      expect(promptAppend).toContain("Design-first")
+    })
+
+    test("ultrabrain category has strategic prompt", () => {
+      // #given
+      const promptAppend = CATEGORY_PROMPT_APPENDS["ultrabrain"]
+
+      // #when / #then
+      expect(promptAppend).toContain("BUSINESS LOGIC")
+      expect(promptAppend).toContain("Strategic advisor")
+    })
+  })
+
+  describe("CATEGORY_DESCRIPTIONS", () => {
+    test("has description for all default categories", () => {
+      // #given
+      const defaultCategoryNames = Object.keys(DEFAULT_CATEGORIES)
+
+      // #when / #then
+      for (const name of defaultCategoryNames) {
+        expect(CATEGORY_DESCRIPTIONS[name]).toBeDefined()
+        expect(CATEGORY_DESCRIPTIONS[name].length).toBeGreaterThan(0)
+      }
+    })
+
+    test("most-capable category exists and has description", () => {
+      // #given / #when
+      const description = CATEGORY_DESCRIPTIONS["most-capable"]
+
+      // #then
+      expect(description).toBeDefined()
+      expect(description).toContain("Complex")
+    })
+  })
+
+  describe("SISYPHUS_TASK_DESCRIPTION", () => {
+    test("documents background parameter as required with default false", () => {
+      // #given / #when / #then
+      expect(SISYPHUS_TASK_DESCRIPTION).toContain("background")
+      expect(SISYPHUS_TASK_DESCRIPTION).toContain("Default: false")
+    })
+
+    test("warns about parallel exploration usage", () => {
+      // #given / #when / #then
+      expect(SISYPHUS_TASK_DESCRIPTION).toContain("5+")
+    })
+  })
+
+  describe("resolveCategoryConfig", () => {
+    test("returns null for unknown category without user config", () => {
+      // #given
+      const categoryName = "unknown-category"
+
+      // #when
+      const result = resolveCategoryConfig(categoryName)
+
+      // #then
+      expect(result).toBeNull()
+    })
+
+    test("returns default config for builtin category", () => {
+      // #given
+      const categoryName = "visual-engineering"
+
+      // #when
+      const result = resolveCategoryConfig(categoryName)
+
+      // #then
+      expect(result).not.toBeNull()
+      expect(result!.config.model).toBe("google/gemini-3-pro-preview")
+      expect(result!.promptAppend).toContain("VISUAL/UI")
+    })
+
+    test("user config overrides default model", () => {
+      // #given
+      const categoryName = "visual-engineering"
+      const userCategories = {
+        "visual-engineering": { model: "anthropic/claude-opus-4-5" },
+      }
+
+      // #when
+      const result = resolveCategoryConfig(categoryName, userCategories)
+
+      // #then
+      expect(result).not.toBeNull()
+      expect(result!.config.model).toBe("anthropic/claude-opus-4-5")
+    })
+
+    test("user prompt_append is appended to default", () => {
+      // #given
+      const categoryName = "visual-engineering"
+      const userCategories = {
+        "visual-engineering": {
+          model: "google/gemini-3-pro-preview",
+          prompt_append: "Custom instructions here",
+        },
+      }
+
+      // #when
+      const result = resolveCategoryConfig(categoryName, userCategories)
+
+      // #then
+      expect(result).not.toBeNull()
+      expect(result!.promptAppend).toContain("VISUAL/UI")
+      expect(result!.promptAppend).toContain("Custom instructions here")
+    })
+
+    test("user can define custom category", () => {
+      // #given
+      const categoryName = "my-custom"
+      const userCategories = {
+        "my-custom": {
+          model: "openai/gpt-5.2",
+          temperature: 0.5,
+          prompt_append: "You are a custom agent",
+        },
+      }
+
+      // #when
+      const result = resolveCategoryConfig(categoryName, userCategories)
+
+      // #then
+      expect(result).not.toBeNull()
+      expect(result!.config.model).toBe("openai/gpt-5.2")
+      expect(result!.config.temperature).toBe(0.5)
+      expect(result!.promptAppend).toBe("You are a custom agent")
+    })
+
+    test("user category overrides temperature", () => {
+      // #given
+      const categoryName = "visual-engineering"
+      const userCategories = {
+        "visual-engineering": {
+          model: "google/gemini-3-pro-preview",
+          temperature: 0.3,
+        },
+      }
+
+      // #when
+      const result = resolveCategoryConfig(categoryName, userCategories)
+
+      // #then
+      expect(result).not.toBeNull()
+      expect(result!.config.temperature).toBe(0.3)
+    })
+  })
+
+  describe("skills parameter", () => {
+    test("SISYPHUS_TASK_DESCRIPTION documents skills parameter", () => {
+      // #given / #when / #then
+      expect(SISYPHUS_TASK_DESCRIPTION).toContain("skills")
+      expect(SISYPHUS_TASK_DESCRIPTION).toContain("Array of skill names")
+    })
+
+    test("skills parameter is required - returns error when not provided", async () => {
+      // #given
+      const { createSisyphusTask } = require("./tools")
+      
+      const mockManager = { launch: async () => ({}) }
+      const mockClient = {
+        app: { agents: async () => ({ data: [] }) },
+        session: {
+          create: async () => ({ data: { id: "test-session" } }),
+          prompt: async () => ({ data: {} }),
+          messages: async () => ({ data: [] }),
+        },
+      }
+      
+      const tool = createSisyphusTask({
+        manager: mockManager,
+        client: mockClient,
+      })
+      
+      const toolContext = {
+        sessionID: "parent-session",
+        messageID: "parent-message",
+        agent: "Sisyphus",
+        abort: new AbortController().signal,
+      }
+      
+      // #when - skills not provided (undefined)
+      const result = await tool.execute(
+        {
+          description: "Test task",
+          prompt: "Do something",
+          category: "ultrabrain",
+          run_in_background: false,
+        },
+        toolContext
+      )
+      
+      // #then - should return error about missing skills
+      expect(result).toContain("skills")
+      expect(result).toContain("REQUIRED")
+    })
+  })
+
+  describe("resume with background parameter", () => {
+  test("resume with background=false should wait for result and return content", async () => {
+    // #given
+    const { createSisyphusTask } = require("./tools")
+    
+    const mockTask = {
+      id: "task-123",
+      sessionID: "ses_resume_test",
+      description: "Resumed task",
+      agent: "explore",
+      status: "running",
+    }
+    
+    const mockManager = {
+      resume: async () => mockTask,
+      launch: async () => mockTask,
+    }
+    
+    const mockClient = {
+      session: {
+        prompt: async () => ({ data: {} }),
+        messages: async () => ({
+          data: [
+            {
+              info: { role: "assistant", time: { created: Date.now() } },
+              parts: [{ type: "text", text: "This is the resumed task result" }],
+            },
+          ],
+        }),
+      },
+      app: {
+        agents: async () => ({ data: [] }),
+      },
+    }
+    
+    const tool = createSisyphusTask({
+      manager: mockManager,
+      client: mockClient,
+    })
+    
+    const toolContext = {
+      sessionID: "parent-session",
+      messageID: "parent-message",
+      agent: "Sisyphus",
+      abort: new AbortController().signal,
+    }
+    
+    // #when
+    const result = await tool.execute(
+      {
+        description: "Resume test",
+        prompt: "Continue the task",
+        resume: "ses_resume_test",
+        run_in_background: false,
+        skills: [],
+      },
+      toolContext
+    )
+    
+    // #then - should contain actual result, not just "Background task resumed"
+    expect(result).toContain("This is the resumed task result")
+    expect(result).not.toContain("Background task resumed")
+  })
+
+  test("resume with background=true should return immediately without waiting", async () => {
+    // #given
+    const { createSisyphusTask } = require("./tools")
+    
+    const mockTask = {
+      id: "task-456",
+      sessionID: "ses_bg_resume",
+      description: "Background resumed task",
+      agent: "explore",
+      status: "running",
+    }
+    
+    const mockManager = {
+      resume: async () => mockTask,
+    }
+    
+    const mockClient = {
+      session: {
+        prompt: async () => ({ data: {} }),
+        messages: async () => ({
+          data: [],
+        }),
+      },
+    }
+    
+    const tool = createSisyphusTask({
+      manager: mockManager,
+      client: mockClient,
+    })
+    
+    const toolContext = {
+      sessionID: "parent-session",
+      messageID: "parent-message",
+      agent: "Sisyphus",
+      abort: new AbortController().signal,
+    }
+    
+    // #when
+    const result = await tool.execute(
+      {
+        description: "Resume bg test",
+        prompt: "Continue in background",
+        resume: "ses_bg_resume",
+        run_in_background: true,
+        skills: [],
+      },
+      toolContext
+    )
+    
+    // #then - should return background message
+    expect(result).toContain("Background task resumed")
+    expect(result).toContain("task-456")
+  })
+})
+
+describe("buildSystemContent", () => {
+    test("returns undefined when no skills and no category promptAppend", () => {
+      // #given
+      const { buildSystemContent } = require("./tools")
+
+      // #when
+      const result = buildSystemContent({ skills: undefined, categoryPromptAppend: undefined })
+
+      // #then
+      expect(result).toBeUndefined()
+    })
+
+    test("returns skill content only when skills provided without category", () => {
+      // #given
+      const { buildSystemContent } = require("./tools")
+      const skillContent = "You are a playwright expert"
+
+      // #when
+      const result = buildSystemContent({ skillContent, categoryPromptAppend: undefined })
+
+      // #then
+      expect(result).toBe(skillContent)
+    })
+
+    test("returns category promptAppend only when no skills", () => {
+      // #given
+      const { buildSystemContent } = require("./tools")
+      const categoryPromptAppend = "Focus on visual design"
+
+      // #when
+      const result = buildSystemContent({ skillContent: undefined, categoryPromptAppend })
+
+      // #then
+      expect(result).toBe(categoryPromptAppend)
+    })
+
+    test("combines skill content and category promptAppend with separator", () => {
+      // #given
+      const { buildSystemContent } = require("./tools")
+      const skillContent = "You are a playwright expert"
+      const categoryPromptAppend = "Focus on visual design"
+
+      // #when
+      const result = buildSystemContent({ skillContent, categoryPromptAppend })
+
+      // #then
+      expect(result).toContain(skillContent)
+      expect(result).toContain(categoryPromptAppend)
+      expect(result).toContain("\n\n")
+    })
+  })
+})
--- a/src/tools/sisyphus-task/tools.ts
+++ b/src/tools/sisyphus-task/tools.ts
@@ -0,0 +1,493 @@
+import { tool, type PluginInput, type ToolDefinition } from "@opencode-ai/plugin"
+import { existsSync, readdirSync } from "node:fs"
+import { join } from "node:path"
+import type { BackgroundManager } from "../../features/background-agent"
+import type { SisyphusTaskArgs } from "./types"
+import type { CategoryConfig, CategoriesConfig } from "../../config/schema"
+import { SISYPHUS_TASK_DESCRIPTION, DEFAULT_CATEGORIES, CATEGORY_PROMPT_APPENDS } from "./constants"
+import { findNearestMessageWithFields, MESSAGE_STORAGE } from "../../features/hook-message-injector"
+import { resolveMultipleSkills } from "../../features/opencode-skill-loader/skill-content"
+import { createBuiltinSkills } from "../../features/builtin-skills/skills"
+import { getTaskToastManager } from "../../features/task-toast-manager"
+import { subagentSessions } from "../../features/claude-code-session-state"
+
+type OpencodeClient = PluginInput["client"]
+
+const SISYPHUS_JUNIOR_AGENT = "Sisyphus-Junior"
+const CATEGORY_EXAMPLES = Object.keys(DEFAULT_CATEGORIES).map(k => `'${k}'`).join(", ")
+
+function parseModelString(model: string): { providerID: string; modelID: string } | undefined {
+  const parts = model.split("/")
+  if (parts.length >= 2) {
+    return { providerID: parts[0], modelID: parts.slice(1).join("/") }
+  }
+  return undefined
+}
+
+function getMessageDir(sessionID: string): string | null {
+  if (!existsSync(MESSAGE_STORAGE)) return null
+
+  const directPath = join(MESSAGE_STORAGE, sessionID)
+  if (existsSync(directPath)) return directPath
+
+  for (const dir of readdirSync(MESSAGE_STORAGE)) {
+    const sessionPath = join(MESSAGE_STORAGE, dir, sessionID)
+    if (existsSync(sessionPath)) return sessionPath
+  }
+
+  return null
+}
+
+function formatDuration(start: Date, end?: Date): string {
+  const duration = (end ?? new Date()).getTime() - start.getTime()
+  const seconds = Math.floor(duration / 1000)
+  const minutes = Math.floor(seconds / 60)
+  const hours = Math.floor(minutes / 60)
+
+  if (hours > 0) return `${hours}h ${minutes % 60}m ${seconds % 60}s`
+  if (minutes > 0) return `${minutes}m ${seconds % 60}s`
+  return `${seconds}s`
+}
+
+type ToolContextWithMetadata = {
+  sessionID: string
+  messageID: string
+  agent: string
+  abort: AbortSignal
+  metadata?: (input: { title?: string; metadata?: Record<string, unknown> }) => void
+}
+
+function resolveCategoryConfig(
+  categoryName: string,
+  userCategories?: CategoriesConfig
+): { config: CategoryConfig; promptAppend: string } | null {
+  const defaultConfig = DEFAULT_CATEGORIES[categoryName]
+  const userConfig = userCategories?.[categoryName]
+  const defaultPromptAppend = CATEGORY_PROMPT_APPENDS[categoryName] ?? ""
+
+  if (!defaultConfig && !userConfig) {
+    return null
+  }
+
+  const config: CategoryConfig = {
+    ...defaultConfig,
+    ...userConfig,
+    model: userConfig?.model ?? defaultConfig?.model ?? "anthropic/claude-sonnet-4-5",
+  }
+
+  let promptAppend = defaultPromptAppend
+  if (userConfig?.prompt_append) {
+    promptAppend = defaultPromptAppend
+      ? defaultPromptAppend + "\n\n" + userConfig.prompt_append
+      : userConfig.prompt_append
+  }
+
+  return { config, promptAppend }
+}
+
+export interface SisyphusTaskToolOptions {
+  manager: BackgroundManager
+  client: OpencodeClient
+  userCategories?: CategoriesConfig
+}
+
+export interface BuildSystemContentInput {
+  skillContent?: string
+  categoryPromptAppend?: string
+}
+
+export function buildSystemContent(input: BuildSystemContentInput): string | undefined {
+  const { skillContent, categoryPromptAppend } = input
+
+  if (!skillContent && !categoryPromptAppend) {
+    return undefined
+  }
+
+  if (skillContent && categoryPromptAppend) {
+    return `${skillContent}\n\n${categoryPromptAppend}`
+  }
+
+  return skillContent || categoryPromptAppend
+}
+
+export function createSisyphusTask(options: SisyphusTaskToolOptions): ToolDefinition {
+  const { manager, client, userCategories } = options
+
+  return tool({
+    description: SISYPHUS_TASK_DESCRIPTION,
+    args: {
+      description: tool.schema.string().describe("Short task description"),
+      prompt: tool.schema.string().describe("Full detailed prompt for the agent"),
+      category: tool.schema.string().optional().describe(`Category name (e.g., ${CATEGORY_EXAMPLES}). Mutually exclusive with subagent_type.`),
+      subagent_type: tool.schema.string().optional().describe("Agent name directly (e.g., 'oracle', 'explore'). Mutually exclusive with category."),
+      run_in_background: tool.schema.boolean().describe("Run in background. MUST be explicitly set. Use false for task delegation, true only for parallel exploration."),
+      resume: tool.schema.string().optional().describe("Session ID to resume - continues previous agent session with full context"),
+      skills: tool.schema.array(tool.schema.string()).describe("Array of skill names to prepend to the prompt. Use [] if no skills needed."),
+    },
+    async execute(args: SisyphusTaskArgs, toolContext) {
+      const ctx = toolContext as ToolContextWithMetadata
+      if (args.run_in_background === undefined) {
+        return `❌ Invalid arguments: 'run_in_background' parameter is REQUIRED. Use run_in_background=false for task delegation, run_in_background=true only for parallel exploration.`
+      }
+      if (args.skills === undefined) {
+        return `❌ Invalid arguments: 'skills' parameter is REQUIRED. Use skills=[] if no skills needed.`
+      }
+      const runInBackground = args.run_in_background === true
+
+      let skillContent: string | undefined
+      if (args.skills.length > 0) {
+        const { resolved, notFound } = resolveMultipleSkills(args.skills)
+        if (notFound.length > 0) {
+          const available = createBuiltinSkills().map(s => s.name).join(", ")
+          return `❌ Skills not found: ${notFound.join(", ")}. Available: ${available}`
+        }
+        skillContent = Array.from(resolved.values()).join("\n\n")
+      }
+
+      const messageDir = getMessageDir(ctx.sessionID)
+      const prevMessage = messageDir ? findNearestMessageWithFields(messageDir) : null
+      const parentAgent = ctx.agent ?? prevMessage?.agent
+      const parentModel = prevMessage?.model?.providerID && prevMessage?.model?.modelID
+        ? { providerID: prevMessage.model.providerID, modelID: prevMessage.model.modelID }
+        : undefined
+
+      if (args.resume) {
+        if (runInBackground) {
+          try {
+            const task = await manager.resume({
+              sessionId: args.resume,
+              prompt: args.prompt,
+              parentSessionID: ctx.sessionID,
+              parentMessageID: ctx.messageID,
+              parentModel,
+              parentAgent,
+            })
+
+            ctx.metadata?.({
+              title: `Resume: ${task.description}`,
+              metadata: { sessionId: task.sessionID },
+            })
+
+            return `Background task resumed.
+
+Task ID: ${task.id}
+Session ID: ${task.sessionID}
+Description: ${task.description}
+Agent: ${task.agent}
+Status: ${task.status}
+
+Agent continues with full previous context preserved.
+Use \`background_output\` with task_id="${task.id}" to check progress.`
+          } catch (error) {
+            const message = error instanceof Error ? error.message : String(error)
+            return `❌ Failed to resume task: ${message}`
+          }
+        }
+
+        const toastManager = getTaskToastManager()
+        const taskId = `resume_sync_${args.resume.slice(0, 8)}`
+        const startTime = new Date()
+
+        if (toastManager) {
+          toastManager.addTask({
+            id: taskId,
+            description: args.description,
+            agent: "resume",
+            isBackground: false,
+          })
+        }
+
+        ctx.metadata?.({
+          title: `Resume: ${args.description}`,
+          metadata: { sessionId: args.resume, sync: true },
+        })
+
+        try {
+          await client.session.prompt({
+            path: { id: args.resume },
+            body: {
+              tools: {
+                task: false,
+                sisyphus_task: false,
+              },
+              parts: [{ type: "text", text: args.prompt }],
+            },
+          })
+        } catch (promptError) {
+          if (toastManager) {
+            toastManager.removeTask(taskId)
+          }
+          const errorMessage = promptError instanceof Error ? promptError.message : String(promptError)
+          return `❌ Failed to send resume prompt: ${errorMessage}\n\nSession ID: ${args.resume}`
+        }
+
+        const messagesResult = await client.session.messages({
+          path: { id: args.resume },
+        })
+
+        if (messagesResult.error) {
+          if (toastManager) {
+            toastManager.removeTask(taskId)
+          }
+          return `❌ Error fetching result: ${messagesResult.error}\n\nSession ID: ${args.resume}`
+        }
+
+        const messages = ((messagesResult as { data?: unknown }).data ?? messagesResult) as Array<{
+          info?: { role?: string; time?: { created?: number } }
+          parts?: Array<{ type?: string; text?: string }>
+        }>
+
+        const assistantMessages = messages
+          .filter((m) => m.info?.role === "assistant")
+          .sort((a, b) => (b.info?.time?.created ?? 0) - (a.info?.time?.created ?? 0))
+        const lastMessage = assistantMessages[0]
+
+        if (toastManager) {
+          toastManager.removeTask(taskId)
+        }
+
+        if (!lastMessage) {
+          return `❌ No assistant response found.\n\nSession ID: ${args.resume}`
+        }
+
+        const textParts = lastMessage?.parts?.filter((p) => p.type === "text") ?? []
+        const textContent = textParts.map((p) => p.text ?? "").filter(Boolean).join("\n")
+
+        const duration = formatDuration(startTime)
+
+        return `Task resumed and completed in ${duration}.
+
+Session ID: ${args.resume}
+
+---
+
+${textContent || "(No text output)"}`
+      }
+
+      if (args.category && args.subagent_type) {
+        return `❌ Invalid arguments: Provide EITHER category OR subagent_type, not both.`
+      }
+
+      if (!args.category && !args.subagent_type) {
+        return `❌ Invalid arguments: Must provide either category or subagent_type.`
+      }
+
+      let agentToUse: string
+      let categoryModel: { providerID: string; modelID: string } | undefined
+      let categoryPromptAppend: string | undefined
+
+      if (args.category) {
+        const resolved = resolveCategoryConfig(args.category, userCategories)
+        if (!resolved) {
+          return `❌ Unknown category: "${args.category}". Available: ${Object.keys({ ...DEFAULT_CATEGORIES, ...userCategories }).join(", ")}`
+        }
+
+        agentToUse = SISYPHUS_JUNIOR_AGENT
+        categoryModel = parseModelString(resolved.config.model)
+        categoryPromptAppend = resolved.promptAppend || undefined
+      } else {
+        agentToUse = args.subagent_type!.trim()
+        if (!agentToUse) {
+          return `❌ Agent name cannot be empty.`
+        }
+
+        // Validate agent exists and is callable (not a primary agent)
+        try {
+          const agentsResult = await client.app.agents()
+          type AgentInfo = { name: string; mode?: "subagent" | "primary" | "all" }
+          const agents = (agentsResult as { data?: AgentInfo[] }).data ?? agentsResult as unknown as AgentInfo[]
+
+          const callableAgents = agents.filter((a) => a.mode !== "primary")
+          const callableNames = callableAgents.map((a) => a.name)
+
+          if (!callableNames.includes(agentToUse)) {
+            const isPrimaryAgent = agents.some((a) => a.name === agentToUse && a.mode === "primary")
+            if (isPrimaryAgent) {
+              return `❌ Cannot call primary agent "${agentToUse}" via sisyphus_task. Primary agents are top-level orchestrators.`
+            }
+
+            const availableAgents = callableNames
+              .sort()
+              .join(", ")
+            return `❌ Unknown agent: "${agentToUse}". Available agents: ${availableAgents}`
+          }
+        } catch {
+          // If we can't fetch agents, proceed anyway - the session.prompt will fail with a clearer error
+        }
+      }
+
+      const systemContent = buildSystemContent({ skillContent, categoryPromptAppend })
+
+      if (runInBackground) {
+        try {
+          const task = await manager.launch({
+            description: args.description,
+            prompt: args.prompt,
+            agent: agentToUse,
+            parentSessionID: ctx.sessionID,
+            parentMessageID: ctx.messageID,
+            parentModel,
+            parentAgent,
+            model: categoryModel,
+            skills: args.skills,
+            skillContent: systemContent,
+          })
+
+          ctx.metadata?.({
+            title: args.description,
+            metadata: { sessionId: task.sessionID, category: args.category },
+          })
+
+          return `Background task launched.
+
+Task ID: ${task.id}
+Session ID: ${task.sessionID}
+Description: ${task.description}
+Agent: ${task.agent}${args.category ? ` (category: ${args.category})` : ""}
+Status: ${task.status}
+
+System notifies on completion. Use \`background_output\` with task_id="${task.id}" to check.`
+        } catch (error) {
+          const message = error instanceof Error ? error.message : String(error)
+          return `❌ Failed to launch task: ${message}`
+        }
+      }
+
+      const toastManager = getTaskToastManager()
+      let taskId: string | undefined
+      let syncSessionID: string | undefined
+
+      try {
+        const createResult = await client.session.create({
+          body: {
+            parentID: ctx.sessionID,
+            title: `Task: ${args.description}`,
+          },
+        })
+
+        if (createResult.error) {
+          return `❌ Failed to create session: ${createResult.error}`
+        }
+
+        const sessionID = createResult.data.id
+        syncSessionID = sessionID
+        subagentSessions.add(sessionID)
+        taskId = `sync_${sessionID.slice(0, 8)}`
+        const startTime = new Date()
+
+        if (toastManager) {
+          toastManager.addTask({
+            id: taskId,
+            description: args.description,
+            agent: agentToUse,
+            isBackground: false,
+            skills: args.skills,
+          })
+        }
+
+        ctx.metadata?.({
+          title: args.description,
+          metadata: { sessionId: sessionID, category: args.category, sync: true },
+        })
+
+        // Use promptAsync to avoid changing main session's active state
+        let promptError: Error | undefined
+        await client.session.promptAsync({
+          path: { id: sessionID },
+          body: {
+            agent: agentToUse,
+            model: categoryModel,
+            system: systemContent,
+            tools: {
+              task: false,
+              sisyphus_task: false,
+            },
+            parts: [{ type: "text", text: args.prompt }],
+          },
+        }).catch((error) => {
+          promptError = error instanceof Error ? error : new Error(String(error))
+        })
+
+        if (promptError) {
+          if (toastManager && taskId !== undefined) {
+            toastManager.removeTask(taskId)
+          }
+          const errorMessage = promptError.message
+          if (errorMessage.includes("agent.name") || errorMessage.includes("undefined")) {
+            return `❌ Agent "${agentToUse}" not found. Make sure the agent is registered in your opencode.json or provided by a plugin.\n\nSession ID: ${sessionID}`
+          }
+          return `❌ Failed to send prompt: ${errorMessage}\n\nSession ID: ${sessionID}`
+        }
+
+        // Poll for session completion
+        const POLL_INTERVAL_MS = 500
+        const MAX_POLL_TIME_MS = 10 * 60 * 1000
+        const pollStart = Date.now()
+
+        while (Date.now() - pollStart < MAX_POLL_TIME_MS) {
+          await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL_MS))
+
+          const statusResult = await client.session.status()
+          const allStatuses = (statusResult.data ?? {}) as Record<string, { type: string }>
+          const sessionStatus = allStatuses[sessionID]
+
+          // Break if session is idle OR no longer in status (completed and removed)
+          if (!sessionStatus || sessionStatus.type === "idle") {
+            break
+          }
+        }
+
+        const messagesResult = await client.session.messages({
+          path: { id: sessionID },
+        })
+
+        if (messagesResult.error) {
+          return `❌ Error fetching result: ${messagesResult.error}\n\nSession ID: ${sessionID}`
+        }
+
+        const messages = ((messagesResult as { data?: unknown }).data ?? messagesResult) as Array<{
+          info?: { role?: string; time?: { created?: number } }
+          parts?: Array<{ type?: string; text?: string }>
+        }>
+
+        const assistantMessages = messages
+          .filter((m) => m.info?.role === "assistant")
+          .sort((a, b) => (b.info?.time?.created ?? 0) - (a.info?.time?.created ?? 0))
+        const lastMessage = assistantMessages[0]
+        
+        if (!lastMessage) {
+          return `❌ No assistant response found.\n\nSession ID: ${sessionID}`
+        }
+        
+        const textParts = lastMessage?.parts?.filter((p) => p.type === "text") ?? []
+        const textContent = textParts.map((p) => p.text ?? "").filter(Boolean).join("\n")
+
+        const duration = formatDuration(startTime)
+
+        if (toastManager) {
+          toastManager.removeTask(taskId)
+        }
+
+        subagentSessions.delete(sessionID)
+
+        return `Task completed in ${duration}.
+
+Agent: ${agentToUse}${args.category ? ` (category: ${args.category})` : ""}
+Session ID: ${sessionID}
+
+---
+
+${textContent || "(No text output)"}`
+      } catch (error) {
+        if (toastManager && taskId !== undefined) {
+          toastManager.removeTask(taskId)
+        }
+        if (syncSessionID) {
+          subagentSessions.delete(syncSessionID)
+        }
+        const message = error instanceof Error ? error.message : String(error)
+        return `❌ Task failed: ${message}`
+      }
+    },
+  })
+}
--- a/src/tools/sisyphus-task/types.ts
+++ b/src/tools/sisyphus-task/types.ts
@@ -0,0 +1,9 @@
+export interface SisyphusTaskArgs {
+  description: string
+  prompt: string
+  category?: string
+  subagent_type?: string
+  run_in_background: boolean
+  resume?: string
+  skills: string[]
+}