Desktop Automation
Desktop Automation in Witsy allows the AI to move beyond the chat box and interact directly with your operating system. Whether it's managing files, executing terminal commands, or using "Computer Use" capabilities to control your mouse and keyboard, Witsy transforms from a chatbot into a hands-on assistant.
Enabling Computer Use via MCP
The most powerful way to automate your desktop is by using Model Context Protocol (MCP) servers. For example, you can use the Anthropic "Computer Use" implementation to allow the AI to see your screen and interact with UI elements.
How to set up Computer Use
- Install an MCP Server: Download or configure an MCP server designed for desktop interaction (like the
computer-useserver). - Add to Witsy: Navigate to Settings > MCP and add the server configuration.
- Grant Permissions: Ensure Witsy has "Accessibility" and "Screen Recording" permissions in your macOS or Windows system settings.
- Select a Capable Model: Use a model that supports vision and tool use, such as
claude-3-5-sonnet.
Managing Files via the CLI Plugin
Witsy includes a built-in CLI and Filesystem plugin that allows the AI to read, write, and organize files within a specific directory. This is safer than full system access as it restricts the AI to a "workspace."
How to automate file operations
When starting a conversation or using the API, you can define a workDir.
- Prompt Example: "Look at the CSV files in my current folder, calculate the total revenue, and create a new summary.txt file with the results."
- How it works: Witsy translates this into terminal commands and filesystem calls, executing them in your designated project folder.
Recipe: Automating Web Browser Tasks
You can use Witsy to perform repetitive tasks in your browser by combining Desktop Automation with its vision capabilities.
Goal: Automatically extract data from a local legacy application and paste it into a web form.
- Open the App: Have your target application visible on the screen.
- The Prompt:
"Take a screenshot of the 'Inventory Manager' app. Extract the SKU and Price for the first item, then open Chrome to
https://internal-portal.comand fill out the replenishment form with those details." - Execution: The AI will capture the screen, parse the text using vision, move the cursor to your browser, and type the data.
Programmatic Automation via HTTP API
Witsy exposes an HTTP API that allows you to trigger desktop automations from external scripts (Python, Bash, or JS).
How to trigger a task from the terminal
First, ensure "Enable HTTP Endpoints" is toggled on in Settings > General.
You can then send a request to the local Witsy server to run a completion that includes desktop tools:
curl -X POST http://localhost:4321/api/complete \
-H "Content-Type: application/json" \
-d '{
"engine": "anthropic",
"model": "claude-3-5-sonnet",
"thread": [{
"role": "user",
"content": "Create a new folder named 'Invoices' and move all PDFs from Downloads into it."
}]
}'
Recipe: Automated Software Testing
Use Witsy to perform "smoke tests" on your own software projects.
- Setup: Point Witsy's
workDirto your source code. - The Prompt:
"Run the command
npm run dev. Once the server is up, use the browser tool to navigate tolocalhost:3000. Verify the login button is visible. If not, checksrc/components/Login.vuefor errors." - Result: Witsy will attempt to start your app, visually verify the UI, and investigate the code if the UI check fails.
Security and Safety Best Practices
Desktop automation is powerful but requires caution. Follow these guidelines to stay safe:
- Use WorkDirs: Always restrict filesystem access to a specific folder rather than your entire home directory.
- Human-in-the-Loop: For "Computer Use" tasks involving mouse clicks or deletions, keep the Witsy window visible so you can monitor the AI's actions in real-time.
- API Security: Only enable HTTP endpoints on trusted networks, as they allow local execution of prompts.
- Read-Only Mode: When exploring new MCP servers, check if they offer a "read-only" configuration to prevent accidental data modification.