Generating Images and Video

Setting Up Image and Video Providers

Witsy follows a "Bring Your Own Key" (BYOK) model. Before generating visual content, you must configure at least one supported provider in the settings.

Open Settings in Witsy.
Navigate to the AI Providers tab.
Enter your API keys for your preferred service:
- OpenAI: For DALL-E 3 (Images) and Sora (Video).
- Replicate / fal.ai: Excellent for specialized models like Flux, Stable Diffusion XL, or CogVideo.
- Google: For Imagen models.
- Stable Diffusion WebUI: For local generation if you have a powerful GPU.
Ensure the "Image Creation" or "Video Creation" capability is enabled for that provider.

How to Generate Images from Text

Generating images in Witsy is as simple as asking the assistant.

Basic Image Generation

Start a new Chat.
Ensure you have an image-capable engine selected (or a model with the Image Plugin enabled).
Type your prompt directly:

"Generate an image of a futuristic library built inside a giant oak tree, digital art style."
Witsy will process the request and display the generated image directly in the chat window.

Tips for Better Prompts

Be Descriptive: Instead of "a cat," try "a fluffy Maine Coon cat wearing a space helmet, cinematic lighting, 8k resolution."
Specify Style: Mention styles like "oil painting," "vector art," "photography," or "3D render."
Aspect Ratios: If your provider supports it (like fal.ai or Replicate), you can specify ratios like --ar 16:9.

How to Edit Images (Image-to-Image)

Witsy supports image-to-image workflows, allowing you to modify existing visuals.

Upload the Source: Drag and drop an image into the chat input or click the attachment icon.
Provide Instructions: Tell the AI what to change.
- Example: "Change the background of this photo to a snowy mountain range."
- Example: "Turn this sketch into a realistic professional logo."
The assistant will use the uploaded image as a reference to generate the new version.

How to Generate Video

You can create short cinematic clips using text or existing images.

Text-to-Video

Select a provider that supports video (e.g., OpenAI, Replicate, or fal.ai).
Use a prompt describing the movement:

"Create a 5-second video of waves crashing against a lighthouse during a thunderstorm, slow motion."
Witsy will return a video file that you can play or save locally.

Image-to-Video (Animating Images)

Attach an image to the chat.
Ask the assistant to animate it:

"Animate this portrait so the person is smiling and nodding." "Make the clouds in this landscape move slowly to the left."

Using Local Stable Diffusion

If you prefer to run generations locally to save on API costs or for privacy:

Run your Stable Diffusion WebUI (Automatic1111) with the --api flag enabled.
In Witsy Settings > AI Providers, select Stable Diffusion WebUI.
Enter the local URL (usually http://127.0.0.1:7860).
Now, when you prompt for images, Witsy will route the request to your local machine instead of a cloud provider.

Common Recipes

Recipe: Generating Consistent Characters

To keep a character consistent across multiple images:

Generate the first image.
Copy the "Seed" number (if provided by the engine in the message details).
In your next prompt, refer to the previous image and include the seed:

"Using the same character from the previous image, show them now sitting in a coffee shop, same style, seed: [12345]."

Recipe: Creating Assets for Presentations

Ask for a specific style: "Create a set of 4 flat-design icons for 'Cloud Computing', 'Security', 'Database', and 'Global Network'."
Once generated, right-click the images to Save As or Copy to Clipboard for use in your slides.