Using Vision and File Attachments
Witsy's support for multi-modal interactions allows you to go beyond simple text. You can analyze UI mockups, summarize long PDF documents, or extract data from spreadsheets by simply attaching them to your chat.
Analyzing Images (Vision)
Vision-capable models (like GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro) can "see" and describe the contents of images.
How to analyze an image
- Select a Vision Model: Ensure your current chat is using a model that supports vision.
- Add the Image:
- Drag and Drop: Drag an image file directly into the Witsy chat window.
- Clipboard: Copy an image or take a screenshot and paste it (
Cmd+VorCtrl+V) directly into the input field. - Attachment Icon: Click the paperclip icon in the chat input area to browse your files.
- Enter your Prompt: Ask a question about the image (e.g., "What is written on this sign?" or "Convert this hand-drawn chart into a Markdown table").
Recipe: UI to Code
One of the most powerful uses of Vision is converting designs to code.
- Action: Take a screenshot of a website or app UI.
- Attachment: Paste the screenshot into Witsy.
- Prompt: "Write the React and Tailwind CSS code to recreate this navigation bar exactly as it appears in the image."
Working with File Attachments
Witsy supports a wide range of file types, including PDFs, text files, and code scripts. Depending on the model and the file type, Witsy will either extract the text content or provide the file as context for the LLM.
How to attach files
- Click the Attachment icon (paperclip) in the chat input bar.
- Select one or multiple files from your computer.
- The files will appear as "chips" above the input area. You can click the
Xon any chip to remove it before sending.
Recipe: Summarizing long documents
Instead of reading a 50-page PDF, you can have Witsy extract the key points.
- Attachment: Upload a PDF document.
- Prompt: "Give me a bulleted summary of the executive summary and the financial projections found in this document."
Recipe: Debugging code files
You can upload entire source files to give the AI full context of your logic.
- Attachment: Attach
api_service.ts. - Prompt: "I'm getting a timeout error in the
connectWithTimeoutfunction. Based on the attached code, what is the most likely cause?"
Supported File Types
| Type | Action | | :--- | :--- | | Images (PNG, JPG, WebP) | Analyzed via Vision (requires a Vision-capable model). | | Documents (PDF, DOCX) | Text is extracted and sent as context. | | Data (CSV, JSON) | Sent as structured text for analysis or transformation. | | Code (.ts, .py, .js, etc.) | Sent as raw text, allowing the AI to read and debug. |
Tips for Best Results
Use "Describe an Image" for Complex Visuals
If an image is very complex, start by asking: "Describe this image in detail." Once the model has generated a text description, subsequent questions about the image often become more accurate because the model can refer back to its own detailed description.
Watch the Context Window
Large files (like long books or huge log files) consume "tokens." If you attach multiple large files, you might hit the model's context limit. If the model seems to "forget" the beginning of the conversation, try starting a new chat with only the most relevant files.
OCR (Optical Character Recognition)
If you have a PDF that is just a scan (images of text), use a Vision-capable model. Witsy can use the model's vision capabilities to perform OCR and extract text from images that standard PDF readers might miss.