Visual Intelligence
Overview & Key Concept
Traditional computer vision gives you bounding boxes and labels like "person" or "car." EyePop's Visual Intelligence Agent goes beyond detection—it understands and describes what it sees using natural language prompts.
Instead of just knowing "there's a person here," you can ask:
"What's their age range and fashion style?"
"Are they wearing glasses?"
"What activity are they doing?"
"Describe their outfit in detail"
The Power of Visual Intelligence
Visual Intelligence combines object detection with vision-language understanding, enabling you to:
Ask any question about detected objects using natural language
Get structured responses with confidence scores
Build custom business logic around visual understanding
Create dynamic analysis that adapts to your specific needs
Key Component Distinction
Understanding when to use each component is crucial:
eyepop.localize-objects:latest
Find and label objects with bounding boxes
You need object locations and want custom bounding box labels
eyepop.image-contents:latest
Analyze and describe visual content with prompts
You need detailed analysis or descriptions of detected regions
Core Architecture Pattern
Visual Intelligence follows a simple but powerful pattern:
Detect → Crop → Analyze
{
components: [{
// 1. DETECT: Find objects in the image
type: PopComponentType.INFERENCE,
ability: "eyepop.person:latest",
// 2. CROP: Extract detected regions
forward: {
operator: {
type: ForwardOperatorType.CROP,
},
// 3. ANALYZE: Send crops to Visual Intelligence
targets: [{
type: PopComponentType.INFERENCE,
ability: 'eyepop.image-contents:latest',
params: {
prompts: [{
prompt: "What is the person's age range and fashion style?"
}]
}
}]
}
}]
}
How Component Chaining Works
Detection Component finds objects and creates bounding boxes
Forward Operator crops each detected region from the original image
Target Component receives cropped images and analyzes them with your custom prompts
Results combine detection data with natural language analysis
Essential Components
Detection Components
These components find objects and create regions for analysis:
eyepop.person:latest
- Detect peopleeyepop.person.2d-body-points
- Detect people with 2d body pointseyepop.vehicle:latest
- Detect vehicleseyepop.text:latest
- Detect texteyepop.common-objects
- Detect common objectsFor the complete list of all available models and abilities, see Composable Pops Documentation
Visual Intelligence Component
eyepop.image-contents:latest
- The core Visual Intelligence model that analyzes images based on your prompts
Forward Operators
ForwardOperatorType.CROP
- Extract detected regions and send them to target componentsForwardOperatorType.FULL
- Send the entire image to target components
Prompt Engineering for Vision
The Foundation Pattern
Always include this safety instruction in your prompts:
params: {
prompts: [{
prompt: "Analyze the image and determine [your question]. " +
"If you are unable to provide a category with a value then set its classLabel to null"
}]
}
This ensures the model returns classLabel: null
when uncertain, preventing hallucinations.
Effective Prompt Structure
✅ Good Prompts:
// Clear, specific questions
"What is the person's age range (report as 20s, 30s, etc.) and gender?"
// Multiple categories with clear formatting
"Determine the categories of: Age (report as range), Gender (Male/Female), Fashion style (Casual/Formal/Sporty)"
// Specific instructions
"Describe the person's outfit including colors and style. Be specific about clothing items."
❌ Avoid These Patterns:
// Too vague
"Tell me about this person"
// Too complex in single prompt
"What's their age, gender, emotion, clothing, activity, background, and any objects they're holding?"
// Leading questions
"Is this person wearing a red shirt?" // Better to ask "What color is their shirt?"
Understanding the Response Format
Visual Intelligence returns structured data:
{
"category": "Age and Fashion Style", // Your prompt/question
"classLabel": "20s, Casual", // The AI's answer
"confidence": 0.85, // Confidence score
"id": "unique-id" // Result identifier
}
When uncertain, you'll get:
{
"category": "Age and Fashion Style",
"classLabel": null, // Indicates uncertainty
"confidence": 0.12, // Low confidence
"id": "unique-id"
}
Complete Examples
Example 1: Person Style Analysis
Analyze fashion and demographics of detected people:
const PersonVisualIntelligence = {
components: [{
type: PopComponentType.INFERENCE,
ability: "eyepop.person:latest",
categoryName: "person",
confidenceThreshold: 0.9,
forward: {
operator: {
type: ForwardOperatorType.CROP,
},
targets: [{
type: PopComponentType.INFERENCE,
ability: 'eyepop.image-contents:latest',
params: {
prompts: [{
prompt: "Analyze the image provided and determine the categories of: " +
["Age (report as range, ex. 20s)",
"Gender (Male/Female)",
"Fashion style (Casual, Formal, Bohemian, Streetwear, Vintage, Chic, Sporty, Edgy)",
"Describe their outfit"].join(", ") +
". Report the values of the categories as classLabels. If you are unable to provide a category with a value then set its classLabel to null"
}]
}
}]
}
}]
}
What this does:
Detects people with 90% confidence threshold
Crops each detected person
Analyzes age, gender, fashion style, and outfit description
Returns structured data for each person
Example 2: Multi-Question Object Analysis
Ask multiple specific questions about detected objects:
const DetailedObjectAnalysis = {
components: [{
type: PopComponentType.INFERENCE,
ability: "eyepop.common-objects",
confidenceThreshold: 0.8,
forward: {
operator: {
type: ForwardOperatorType.CROP,
},
targets: [{
type: PopComponentType.INFERENCE,
ability: 'eyepop.image-contents:latest',
params: {
prompts: [{
prompt: "What color is this object? What is the object's condition (new, used, damaged)? What material is this object made of? If uncertain, set classLabel to null"
}]
}
}]
}
}]
}
Example 3: Activity Recognition
Understand what people are doing:
const ActivityAnalysis = {
components: [{
type: PopComponentType.INFERENCE,
ability: "eyepop.person:latest",
confidenceThreshold: 0.85,
forward: {
operator: {
type: ForwardOperatorType.CROP,
},
targets: [{
type: PopComponentType.INFERENCE,
ability: 'eyepop.image-contents:latest',
params: {
prompts: [{
prompt: "What activity is this person doing? Choose from: walking, running, sitting, standing, exercising, eating, talking, working, playing, or other. If you cannot determine the activity clearly, set classLabel to null"
}]
}
}]
}
}]
}
Custom Class Labels with Object Localization
When you need to find objects and display custom labels on bounding boxes, use eyepop.localize-objects:latest
:
const CustomLabels = {
components: [{
type: PopComponentType.INFERENCE,
ability: 'eyepop.localize-objects:latest',
params: {
prompts: [
{prompt: 'dog', label: 'Best Friend'},
{prompt: 'cat', label: 'Just a Cat'}
]
}
}]
}
What this does:
prompt
tells the model what to find ("dog", "cat")label
sets what appears on the bounding box ("Best Friend", "Just a Cat")Returns bounding boxes with your custom labels instead of generic "dog" or "cat"
Custom Business Logic Examples
Retail Analytics:
// Analyze customer demographics and shopping behavior
prompt: "Determine: Age range (teens/20s/30s/40s/50s+), Gender, Shopping bag count (0/1/2/3+), Engagement level (browsing/interested/deciding). If any category is unclear, set its classLabel to null"
Security & Safety:
// Detect safety compliance
prompt: "Is this person wearing required safety equipment: Hard hat (yes/no), Safety vest (yes/no), Safety glasses (yes/no). For each item, if you cannot clearly see it, set classLabel to null"
Healthcare:
// Patient positioning analysis
prompt: "Describe the patient's position: Sitting/Standing/Lying down, Posture (upright/slouched/leaning), Mobility aid visible (none/wheelchair/walker/cane). If uncertain about any aspect, set classLabel to null"
Common Pitfalls & Troubleshooting
When to Use Which Component
Use eyepop.localize-objects:latest
when:
You need bounding boxes with custom labels
You're building object detection with specific labeling needs
You want to find specific objects by description
Use eyepop.image-contents:latest
when:
You need detailed analysis or descriptions
You want to ask questions about visual content
You need structured responses to custom prompts
Prompt Design Mistakes
❌ Don't:
Ask leading questions ("Is the person happy?")
Make prompts too complex or long
Forget the null safety instruction
Use ambiguous terms without definition
✅ Do:
Ask open-ended analytical questions
Define categories clearly
Include the null safety pattern
Break complex analysis into multiple prompts
Confidence Considerations
Visual Intelligence returns confidence scores
Low confidence often correlates with
null
responsesUse confidence thresholds in your application logic
Consider multiple prompts for critical decisions
Getting Started
Ready to build your own Visual Intelligence Pop? Start with this template:
const MyVisualIntelligence = {
components: [{
type: PopComponentType.INFERENCE,
ability: "eyepop.[YOUR_DETECTOR]:latest", // Choose your detector
confidenceThreshold: 0.8, // Adjust as needed
forward: {
operator: {
type: ForwardOperatorType.CROP,
},
targets: [{
type: PopComponentType.INFERENCE,
ability: 'eyepop.image-contents:latest',
params: {
prompts: [{
prompt: "[YOUR QUESTION HERE]. If you are unable to provide an answer, set classLabel to null"
}]
}
}]
}
}]
}
Replace [YOUR_DETECTOR]
with your chosen detection model and [YOUR QUESTION HERE]
with your custom prompt, and you're ready to start analyzing!
Last updated