Abilities

An Ability is a preconfigured AI task that analyzes visual media and returns structured output.

Abilities allow developers to add visual intelligence to applications without training or deploying their own models.

Typical workflow:

Create your ability on EyePop.ai
Image/Video/Live Stream → Ability → Structured Output

For examples of Abilities and their output, see the Abilities Hub

What is an Ability?

An Ability is a specific visual analysis capability such as:

Detecting objects
Classifying scenes
Extracting structured information from images
Understanding events in video

Developers call an Ability using the EyePop API and receive structured results such as bounding boxes, classifications, or extracted text.

Anatomy of an Ability

An Ability is defined by a small set of configuration parameters that control how visual media is analyzed. These parameters determine how the model interprets the task, how much compute is used, and how frequently media is analyzed.

Ability Components

Field

Description

name

Unique identifier for the Ability. Used when calling the API.

description

Human-readable explanation of the task the Ability performs. This is used by the Prompt Creation Agent to generate or refine prompts.

image_size

The resolution images are resized to before being analyzed. Smaller sizes reduce compute cost and increase speed.

prompt

The instructions given to the vision-language model describing what it should detect or classify.

model

The underlying AI model used to perform the analysis.

fps

Frames per second to analyze when processing video or livestreams. Controls how frequently frames are sampled.

Name

The name uniquely identifies the Ability.

Example:

<your namespace>.structured-ocr.read-drivers-license

Names are used in API calls and should clearly reflect the task being performed.

Description

The description explains the purpose of the Ability.

This field is especially important because it is used by the Prompt Creation Agent to help generate or improve prompts.

Example:

Detect whether a person in the image is wearing a construction helmet.

Good descriptions are:

clear
specific
task-oriented

Avoid vague descriptions such as:

Analyze the image

Image Size

The image_size parameter controls how images are resized before inference.

Example:

Reducing image size:

decreases compute usage
increases inference speed
often improves detection consistency

Typical production sizes:

Use Case

Image Size

Object detection

512-640

Find event in video

512-640

Document analysis

768-1024

Prompt

The prompt defines what the model should analyze in the image or video frame. For examples of Ability prompts, see the Abilities Hub.

Example:

Determine whether a person in the frame is wearing a safety helmet.
Return exactly one label from: ["helmet", "no_helmet", "NO"].

Prompts should:

clearly define the task
restrict possible outputs
avoid unnecessary complexity

Model

The model specifies which AI model runs the Ability.

Example:

qwen3-instruct

Different models may vary in:

reasoning ability
speed
compute cost

Abilities typically use a model optimized for vision-language tasks.

FPS (Frames Per Second)

The fps parameter determines how frequently frames are analyzed when processing video or livestreams.

Example:

fps: 5

This means the Ability will analyze 5 frames per second of video.

Choosing the correct FPS helps balance detection accuracy and compute cost.

Use Case

Recommended FPS

Security monitoring

2–5

Sports analytics

5–10

Industrial monitoring

1–3

Lower FPS reduces compute usage while still capturing most events.

Helpful Mental Model

You can think of an Ability as three main parts:

Ability = Model + Prompt + Media Sampling

Where:

Model determines reasoning capability
Prompt defines the task
Sampling (fps + image_size) controls performance and compute usage

What Tasks Are Abilities Good For?

Abilities are designed for real-world visual intelligence workloads.

Common use cases include:

Security and Surveillance

Person detection
Intrusion alerts
PPE detection

Retail and Commerce

Product recognition
Shelf monitoring
Customer analytics

Sports Analytics

Player detection
Action classification
Event segmentation

Document Processing

Driver's license extraction
Receipt parsing
Title or invoice extraction

Industrial Automation

Quality inspection
Object counting
Safety compliance monitoring

What is a Compute Unit?

A Compute Unit (CU) represents the cost of running one AI inference task.

The resolution of the image or video does not effect the compute units used as it's resized to the image scaling resolution you define in your ability. The bigger you make this resolution, the more compute units will be used to achieve your task. Typically, accuracy will increase with larger window sizes.

Each time an Ability processes an image or frame, compute resources are consumed.

Video workloads consume compute based on the number of frames analyzed.

Estimating Compute Unit Usage

Estimate pricing coming soon

Overage Costs

Each EyePop plan includes a monthly allocation of compute units.

If usage exceeds the included amount, additional compute is billed automatically.

Example:

Included CU

Used CU

Overage

4,000

5,200

1,200 units billed at the end of the month (+$60)

This allows applications to scale without interruption.

Optimizing Compute unit Usage

You can reduce compute costs and improve performance by optimizing inputs.

Reduce Image Size

Recommended image width:

640px – 1280px

Very large images increase compute cost.

Reduce Video FPS

Typical production settings:

Use Case

FPS

Security monitoring

2–5

Sports analytics

5–10

Industrial monitoring

1–3

Crop Regions of Interest

Instead of analyzing the entire frame, crop the relevant region before sending it to the Ability.

This reduces compute and improves accuracy. Region of interest (ROI) is supported through the SDK.

Ability Prompt Creation Agent

The Ability Prompt Creation Agent helps developers generate reliable prompts for visual AI tasks.

Prompt design is one of the hardest parts of working with vision-language models. Small wording changes can significantly affect accuracy, consistency, and cost.

The Prompt Creation Agent analyzes the task you want to perform and generates a production-ready prompt optimized for EyePop Abilities.

Why It Exists

Vision models are extremely sensitive to how instructions are written.

Poor prompts can cause issues such as:

inconsistent classifications
overly verbose outputs
hallucinated results
unpredictable formatting
increased compute cost

The Prompt Creation Agent helps avoid these problems by generating prompts that follow tested patterns.

How to run the Ability Prompt Creation Agent

Coming soon to the Dashboard

PreviousAPI Key NextVisual Intelligence

Last updated 5 days ago

hashtagWhat is an Ability?

hashtagAnatomy of an Ability

hashtagAbility Components

hashtagName

hashtagDescription

hashtagImage Size

hashtagPrompt

hashtagModel

hashtagFPS (Frames Per Second)

hashtagHelpful Mental Model

hashtagWhat Tasks Are Abilities Good For?

hashtagSecurity and Surveillance

hashtagRetail and Commerce

hashtagSports Analytics

hashtagDocument Processing

hashtagIndustrial Automation

hashtagWhat is a Compute Unit?

hashtagEstimating Compute Unit Usage

hashtagOverage Costs

hashtagOptimizing Compute unit Usage

hashtagReduce Image Size

hashtagReduce Video FPS

hashtagCrop Regions of Interest

hashtagAbility Prompt Creation Agent

hashtagWhy It Exists

hashtagHow to run the Ability Prompt Creation Agent

What is an Ability?

Anatomy of an Ability

Ability Components

Name

Description

Image Size

Prompt

Model

FPS (Frames Per Second)

Helpful Mental Model

What Tasks Are Abilities Good For?

Security and Surveillance

Retail and Commerce

Sports Analytics

Document Processing

Industrial Automation

What is a Compute Unit?

Estimating Compute Unit Usage

Overage Costs

Optimizing Compute unit Usage

Reduce Image Size

Reduce Video FPS

Crop Regions of Interest

Ability Prompt Creation Agent

Why It Exists

How to run the Ability Prompt Creation Agent