> For the complete documentation index, see [llms.txt](https://docs.eyepop.ai/developer-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.eyepop.ai/developer-documentation/eyepop.ai-visual-intelligence/abilities.md).

# Abilities

An **Ability** is a preconfigured AI task that analyzes visual media and returns structured output.

Abilities allow developers to add visual intelligence to applications without training or deploying their own models.

Typical workflow:

* Create your ability on EyePop.ai
* Image/Video/Live Stream → Ability → Structured Output

**For examples of Abilities and their output, see the** [**Abilities Hub**](https://eyepop.ai/abilities)

***

## What is an Ability?

An Ability is a specific visual analysis capability such as:

* Detecting objects
* Classifying scenes
* Extracting structured information from images
* Understanding events in video

Developers call an Ability using the EyePop API and receive structured results such as bounding boxes, classifications, or extracted text.

### Anatomy of an Ability

An Ability is defined by a small set of configuration parameters that control how visual media is analyzed. These parameters determine how the model interprets the task, how much compute is used, and how frequently media is analyzed.

#### Ability Components

| Field       | Description                                                                                                                           |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| name        | Unique identifier for the Ability. Used when calling the API.                                                                         |
| description | Human-readable explanation of the task the Ability performs. This is used by the Prompt Creation Agent to generate or refine prompts. |
| image\_size | The resolution images are resized to before being analyzed. Smaller sizes reduce compute cost and increase speed.                     |
| prompt      | The instructions given to the vision-language model describing what it should detect or classify.                                     |
| model       | The underlying AI model used to perform the analysis.                                                                                 |
| fps         | Frames per second to analyze when processing video or livestreams. Controls how frequently frames are sampled.                        |

#### Name

The name uniquely identifies the Ability.

Example:

```
<your namespace>.structured-ocr.read-drivers-license
```

Names are used in API calls and should clearly reflect the task being performed.

#### Description

The description explains the purpose of the Ability.

This field is especially important because it is used by the Prompt Creation Agent to help generate or improve prompts.

Example:

```
Detect whether a person in the image is wearing a construction helmet.
```

Good descriptions are:

* clear
* specific
* task-oriented

Avoid vague descriptions such as:

```
Analyze the image
```

#### Image Size

The image\_size parameter controls how images are resized before inference.

Example:

```
640
```

Reducing image size:

* decreases compute usage
* increases inference speed
* often improves detection consistency

Typical production sizes:

| Use Case            | Image Size |
| ------------------- | ---------- |
| Object detection    | 512-640    |
| Find event in video | 512-640    |
| Document analysis   | 768-1024   |

#### Prompt

The prompt defines what the model should analyze in the image or video frame. For examples of Ability prompts, see the [Abilities Hub](https://eyepop.ai/abilities).

Example:

```
Determine whether a person in the frame is wearing a safety helmet.
Return exactly one label from: ["helmet", "no_helmet", "NO"].
```

Prompts should:

* clearly define the task
* restrict possible outputs
* avoid unnecessary complexity

#### Model

The model specifies which AI model runs the Ability.

Example:

```
qwen3-instruct
```

Different models may vary in:

* reasoning ability
* speed
* compute cost

Abilities typically use a model optimized for vision-language tasks.

#### FPS (Frames Per Second)

The fps parameter determines how frequently frames are analyzed when processing video or livestreams.

Example:

```
fps: 5
```

This means the Ability will analyze 5 frames per second of video.

Choosing the correct FPS helps balance detection accuracy and compute cost.

| Use Case              | Recommended FPS |
| --------------------- | --------------- |
| Security monitoring   | 2–5             |
| Sports analytics      | 5–10            |
| Industrial monitoring | 1–3             |

Lower FPS reduces compute usage while still capturing most events.

#### Helpful Mental Model

You can think of an Ability as three main parts:

```
Ability = Model + Prompt + Media Sampling
```

Where:

* Model determines reasoning capability
* Prompt defines the task
* Sampling (fps + image\_size) controls performance and compute usage

***

## What Tasks Are Abilities Good For?

Abilities are designed for real-world visual intelligence workloads.

Common use cases include:

#### Security and Surveillance

* Person detection
* Intrusion alerts
* PPE detection

#### Retail and Commerce

* Product recognition
* Shelf monitoring
* Customer analytics

#### Sports Analytics

* Player detection
* Action classification
* Event segmentation

#### Document Processing

* Driver's license extraction
* Receipt parsing
* Title or invoice extraction

#### Industrial Automation

* Quality inspection
* Object counting
* Safety compliance monitoring

***

## What is a Compute Unit?

A **Compute Unit (CU)** represents the cost of running one AI inference task.

The resolution of the image or video does not effect the compute units used as it's resized to the image scaling resolution you define in your ability. The bigger you make this resolution, the more compute units will be used to achieve your task. Typically, accuracy will increase with larger window sizes.

Each time an Ability processes an image or frame, compute resources are consumed.

Video workloads consume compute based on the number of frames analyzed.

### Estimating Compute Unit Usage

| Model | Media | Task       | Resolution / Spec     | Est. Compute Units (CU) | Est. Cost ($) |
| ----- | ----- | ---------- | --------------------- | ----------------------- | ------------- |
| QWEN3 | Image | Classify   | 512 x 512             | 0.16                    | $0.008        |
| QWEN3 | Image | Classify   | 640 x 640             | 0.25                    | $0.013        |
| QWEN3 | Image | Classify   | 1000 x 1000           | 0.61                    | $0.030        |
| QWEN3 | Image | Describe   | 512 x 512             | 0.18                    | $0.009        |
| QWEN3 | Image | Describe   | 640 x 640             | 0.27                    | $0.014        |
| QWEN3 | Image | Describe   | 1000 x 1000           | 0.63                    | $0.031        |
| QWEN3 | Image | OCR        | 1000 x 1000           | 0.71                    | $0.035        |
| QWEN3 | Video | Find Event | 640 x 640 (1s @ 1fps) | 0.25                    | $0.013        |
| QWEN3 | Video | Find Event | 640 x 640 (1s @ 5fps) | 1.25                    | $0.063        |
| QWEN3 | Video | Describe   | 640 x 640 (1s @ 1fps) | 0.27                    | $0.014        |
| QWEN3 | Video | Describe   | 640 x 640 (1s @ 5fps) | 1.35                    | $0.068        |

### Overage Costs

Each EyePop plan includes a monthly allocation of compute units.

If usage exceeds the included amount, additional compute is billed automatically.

Example:

<table><thead><tr><th>Included CU</th><th width="249">Used CU</th><th>Overage</th></tr></thead><tbody><tr><td>4,000</td><td>5,200</td><td>1,200 units billed at the end of the month (+$60)</td></tr></tbody></table>

This allows applications to scale without interruption.

### Optimizing Compute unit Usage

You can reduce compute costs and improve performance by optimizing inputs.

#### Reduce Image Size

Recommended image width:

640px – 1280px

Very large images increase compute cost.

#### Reduce Video FPS

Typical production settings:

| Use Case              | FPS  |
| --------------------- | ---- |
| Security monitoring   | 2–5  |
| Sports analytics      | 5–10 |
| Industrial monitoring | 1–3  |

#### Crop Regions of Interest

Instead of analyzing the entire frame, crop the relevant region before sending it to the Ability.

This reduces compute and improves accuracy. Region of interest (ROI) is supported through the SDK.

***

## Ability Prompt Creation Agent

The **Ability Prompt Creation Agent** helps developers generate reliable prompts for visual AI tasks.

Prompt design is one of the hardest parts of working with vision-language models. Small wording changes can significantly affect accuracy, consistency, and cost.

The Prompt Creation Agent analyzes the task you want to perform and generates a production-ready prompt optimized for EyePop Abilities.

### Why It Exists

Vision models are extremely sensitive to how instructions are written.

Poor prompts can cause issues such as:

* inconsistent classifications
* overly verbose outputs
* hallucinated results
* unpredictable formatting
* increased compute cost

The Prompt Creation Agent helps avoid these problems by generating prompts that follow tested patterns.

### How to run the Ability Prompt Creation Agent

*Coming soon to the Dashboard*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.eyepop.ai/developer-documentation/eyepop.ai-visual-intelligence/abilities.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.