Visual Intelligence

Overview & Key Concept

Traditional computer vision gives you bounding boxes and labels like "person" or "car." EyePop's Visual Intelligence Agent goes beyond detection—it understands and describes what it sees using natural language prompts.

Instead of just knowing "there's a person here," you can ask:

"What's their age range and fashion style?"
"Are they wearing glasses?"
"What activity are they doing?"
"Describe their outfit in detail"

The Power of Visual Intelligence

Visual Intelligence combines object detection with vision-language understanding, enabling you to:

Ask any question about detected objects using natural language
Get structured responses with confidence scores
Build custom business logic around visual understanding
Create dynamic analysis that adapts to your specific needs

Key Component Distinction

Understanding when to use each component is crucial:

Component

Purpose

Use When

eyepop.localize-objects:latest

Find and label objects with bounding boxes

You need object locations and want custom bounding box labels

eyepop.image-contents:latest

Analyze and describe visual content with prompts

You need detailed analysis or descriptions of detected regions

Core Architecture Pattern

Visual Intelligence follows a simple but powerful pattern:

Detect → Crop → Analyze

{
  components: [{
    // 1. DETECT: Find objects in the image
    type: PopComponentType.INFERENCE,
    ability: "eyepop.person:latest",
    
    // 2. CROP: Extract detected regions  
    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      
      // 3. ANALYZE: Send crops to Visual Intelligence
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "What is the person's age range and fashion style?"
          }]
        }
      }]
    }
  }]
}

How Component Chaining Works

Detection Component finds objects and creates bounding boxes
Forward Operator crops each detected region from the original image
Target Component receives cropped images and analyzes them with your custom prompts
Results combine detection data with natural language analysis

Essential Components

Detection Components

These components find objects and create regions for analysis:

eyepop.person:latest - Detect people
eyepop.person.2d-body-points - Detect people with 2d body points
eyepop.vehicle:latest - Detect vehicles
eyepop.text:latest - Detect text
eyepop.common-objects - Detect common objects
For the complete list of all available models and abilities, see Composable Pops Documentation

Visual Intelligence Component

eyepop.image-contents:latest - The core Visual Intelligence model that analyzes images based on your prompts

Forward Operators

ForwardOperatorType.CROP - Extract detected regions and send them to target components
ForwardOperatorType.FULL - Send the entire image to target components

Prompt Engineering for Vision

The Foundation Pattern

Always include this safety instruction in your prompts:

params: {
  prompts: [{
    prompt: "Analyze the image and determine [your question]. " +
           "If you are unable to provide a category with a value then set its classLabel to null"
  }]
}

This ensures the model returns classLabel: null when uncertain, preventing hallucinations.

Effective Prompt Structure

✅ Good Prompts:

// Clear, specific questions
"What is the person's age range (report as 20s, 30s, etc.) and gender?"

// Multiple categories with clear formatting
"Determine the categories of: Age (report as range), Gender (Male/Female), Fashion style (Casual/Formal/Sporty)"

// Specific instructions
"Describe the person's outfit including colors and style. Be specific about clothing items."

❌ Avoid These Patterns:

// Too vague
"Tell me about this person"

// Too complex in single prompt  
"What's their age, gender, emotion, clothing, activity, background, and any objects they're holding?"

// Leading questions
"Is this person wearing a red shirt?" // Better to ask "What color is their shirt?"

Understanding the Response Format

Visual Intelligence returns structured data:

{
  "category": "Age and Fashion Style",  // Your prompt/question
  "classLabel": "20s, Casual",         // The AI's answer
  "confidence": 0.85,                  // Confidence score
  "id": "unique-id"                    // Result identifier
}

When uncertain, you'll get:

{
  "category": "Age and Fashion Style",
  "classLabel": null,                  // Indicates uncertainty
  "confidence": 0.12,                 // Low confidence
  "id": "unique-id"
}

Complete Examples

Example 1: Person Style Analysis

Analyze fashion and demographics of detected people:

const PersonVisualIntelligence = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.person:latest",
    categoryName: "person",
    confidenceThreshold: 0.9,

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "Analyze the image provided and determine the categories of: " +
              ["Age (report as range, ex. 20s)",
               "Gender (Male/Female)", 
               "Fashion style (Casual, Formal, Bohemian, Streetwear, Vintage, Chic, Sporty, Edgy)",
               "Describe their outfit"].join(", ") +
              ". Report the values of the categories as classLabels. If you are unable to provide a category with a value then set its classLabel to null"
          }]
        }
      }]
    }
  }]
}

What this does:

Detects people with 90% confidence threshold
Crops each detected person
Analyzes age, gender, fashion style, and outfit description
Returns structured data for each person

Example 2: Multi-Question Object Analysis

Ask multiple specific questions about detected objects:

const DetailedObjectAnalysis = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.common-objects",
    confidenceThreshold: 0.8,

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "What color is this object? What is the object's condition (new, used, damaged)? What material is this object made of? If uncertain, set classLabel to null"
          }]
        }
      }]
    }
  }]
}

Example 3: Activity Recognition

Understand what people are doing:

const ActivityAnalysis = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.person:latest",
    confidenceThreshold: 0.85,

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "What activity is this person doing? Choose from: walking, running, sitting, standing, exercising, eating, talking, working, playing, or other. If you cannot determine the activity clearly, set classLabel to null"
          }]
        }
      }]
    }
  }]
}

Custom Class Labels with Object Localization

When you need to find objects and display custom labels on bounding boxes, use eyepop.localize-objects:latest:

const CustomLabels = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: 'eyepop.localize-objects:latest',
    params: {
      prompts: [
        {prompt: 'dog', label: 'Best Friend'},
        {prompt: 'cat', label: 'Just a Cat'}
      ]
    }
  }]
}

What this does:

prompt tells the model what to find ("dog", "cat")
label sets what appears on the bounding box ("Best Friend", "Just a Cat")
Returns bounding boxes with your custom labels instead of generic "dog" or "cat"

Custom Business Logic Examples

Retail Analytics:

// Analyze customer demographics and shopping behavior
prompt: "Determine: Age range (teens/20s/30s/40s/50s+), Gender, Shopping bag count (0/1/2/3+), Engagement level (browsing/interested/deciding). If any category is unclear, set its classLabel to null"

Security & Safety:

// Detect safety compliance
prompt: "Is this person wearing required safety equipment: Hard hat (yes/no), Safety vest (yes/no), Safety glasses (yes/no). For each item, if you cannot clearly see it, set classLabel to null"

Healthcare:

// Patient positioning analysis  
prompt: "Describe the patient's position: Sitting/Standing/Lying down, Posture (upright/slouched/leaning), Mobility aid visible (none/wheelchair/walker/cane). If uncertain about any aspect, set classLabel to null"

Common Pitfalls & Troubleshooting

When to Use Which Component

Use eyepop.localize-objects:latest when:

You need bounding boxes with custom labels
You're building object detection with specific labeling needs
You want to find specific objects by description

Use eyepop.image-contents:latest when:

You need detailed analysis or descriptions
You want to ask questions about visual content
You need structured responses to custom prompts

Prompt Design Mistakes

❌ Don't:

Ask leading questions ("Is the person happy?")
Make prompts too complex or long
Forget the null safety instruction
Use ambiguous terms without definition

✅ Do:

Ask open-ended analytical questions
Define categories clearly
Include the null safety pattern
Break complex analysis into multiple prompts

Confidence Considerations

Visual Intelligence returns confidence scores
Low confidence often correlates with null responses
Use confidence thresholds in your application logic
Consider multiple prompts for critical decisions

Getting Started

Ready to build your own Visual Intelligence Pop? Start with this template:

const MyVisualIntelligence = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.[YOUR_DETECTOR]:latest",  // Choose your detector
    confidenceThreshold: 0.8,                 // Adjust as needed

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "[YOUR QUESTION HERE]. If you are unable to provide an answer, set classLabel to null"
          }]
        }
      }]
    }
  }]
}

Replace [YOUR_DETECTOR] with your chosen detection model and [YOUR QUESTION HERE] with your custom prompt, and you're ready to start analyzing!

PreviousFAQ NextReports

Last updated 1 month ago