📜
Developer Documentation
  • 👋EyePop.ai Introduction
  • 🎯Getting Started
    • 👨‍💻Pop Quick Start
    • 💪Low Code Examples
  • 🗝️API Key
  • 🏷️Finding People & Objects
  • SDKs
    • ☁️React/Node SDK
      • Render 2D (Visualization)
      • Composable Pops
    • 🐍Python SDK
      • Composable Pops
  • Page 2
  • Self Service Training
    • Dataset SDK (Node)
    • 🏋️How To Train a Model
      • Defining Your Computer Vision Model
      • Example Use Case: Detecting Eyeglasses
      • Preparing & Uploading Data
      • Using EyePop.ai’s AutoLabeler
      • Human Review
      • Data Augmentation Setup
      • Training in Progress
      • Deployment
      • Previewing Results
      • Iterative Training
      • Deep Dives (FAQ)
  • FAQ
  • EyePop.ai Visual Intelligence
    • Visual Intelligence
    • Reports
  • Deployment
    • On Premise IP Camera analysis
    • Windows Application Runtime
Powered by GitBook
On this page
  • Overview & Key Concept
  • Core Architecture Pattern
  • Essential Components
  • Prompt Engineering for Vision
  • Complete Examples
  • Custom Class Labels with Object Localization
  • Common Pitfalls & Troubleshooting
  • Getting Started
  1. EyePop.ai Visual Intelligence

Visual Intelligence

Overview & Key Concept

Traditional computer vision gives you bounding boxes and labels like "person" or "car." EyePop's Visual Intelligence Agent goes beyond detection—it understands and describes what it sees using natural language prompts.

Instead of just knowing "there's a person here," you can ask:

  • "What's their age range and fashion style?"

  • "Are they wearing glasses?"

  • "What activity are they doing?"

  • "Describe their outfit in detail"

The Power of Visual Intelligence

Visual Intelligence combines object detection with vision-language understanding, enabling you to:

  • Ask any question about detected objects using natural language

  • Get structured responses with confidence scores

  • Build custom business logic around visual understanding

  • Create dynamic analysis that adapts to your specific needs

Key Component Distinction

Understanding when to use each component is crucial:

Component
Purpose
Use When

eyepop.localize-objects:latest

Find and label objects with bounding boxes

You need object locations and want custom bounding box labels

eyepop.image-contents:latest

Analyze and describe visual content with prompts

You need detailed analysis or descriptions of detected regions

Core Architecture Pattern

Visual Intelligence follows a simple but powerful pattern:

Detect → Crop → Analyze

{
  components: [{
    // 1. DETECT: Find objects in the image
    type: PopComponentType.INFERENCE,
    ability: "eyepop.person:latest",
    
    // 2. CROP: Extract detected regions  
    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      
      // 3. ANALYZE: Send crops to Visual Intelligence
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "What is the person's age range and fashion style?"
          }]
        }
      }]
    }
  }]
}

How Component Chaining Works

  1. Detection Component finds objects and creates bounding boxes

  2. Forward Operator crops each detected region from the original image

  3. Target Component receives cropped images and analyzes them with your custom prompts

  4. Results combine detection data with natural language analysis

Essential Components

Detection Components

These components find objects and create regions for analysis:

  • eyepop.person:latest - Detect people

  • eyepop.person.2d-body-points - Detect people with 2d body points

  • eyepop.vehicle:latest - Detect vehicles

  • eyepop.text:latest - Detect text

  • eyepop.common-objects - Detect common objects

  • For the complete list of all available models and abilities, see Composable Pops Documentation

Visual Intelligence Component

  • eyepop.image-contents:latest - The core Visual Intelligence model that analyzes images based on your prompts

Forward Operators

  • ForwardOperatorType.CROP - Extract detected regions and send them to target components

  • ForwardOperatorType.FULL - Send the entire image to target components

Prompt Engineering for Vision

The Foundation Pattern

Always include this safety instruction in your prompts:

params: {
  prompts: [{
    prompt: "Analyze the image and determine [your question]. " +
           "If you are unable to provide a category with a value then set its classLabel to null"
  }]
}

This ensures the model returns classLabel: null when uncertain, preventing hallucinations.

Effective Prompt Structure

✅ Good Prompts:

// Clear, specific questions
"What is the person's age range (report as 20s, 30s, etc.) and gender?"

// Multiple categories with clear formatting
"Determine the categories of: Age (report as range), Gender (Male/Female), Fashion style (Casual/Formal/Sporty)"

// Specific instructions
"Describe the person's outfit including colors and style. Be specific about clothing items."

❌ Avoid These Patterns:

// Too vague
"Tell me about this person"

// Too complex in single prompt  
"What's their age, gender, emotion, clothing, activity, background, and any objects they're holding?"

// Leading questions
"Is this person wearing a red shirt?" // Better to ask "What color is their shirt?"

Understanding the Response Format

Visual Intelligence returns structured data:

{
  "category": "Age and Fashion Style",  // Your prompt/question
  "classLabel": "20s, Casual",         // The AI's answer
  "confidence": 0.85,                  // Confidence score
  "id": "unique-id"                    // Result identifier
}

When uncertain, you'll get:

{
  "category": "Age and Fashion Style",
  "classLabel": null,                  // Indicates uncertainty
  "confidence": 0.12,                 // Low confidence
  "id": "unique-id"
}

Complete Examples

Example 1: Person Style Analysis

Analyze fashion and demographics of detected people:

const PersonVisualIntelligence = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.person:latest",
    categoryName: "person",
    confidenceThreshold: 0.9,

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "Analyze the image provided and determine the categories of: " +
              ["Age (report as range, ex. 20s)",
               "Gender (Male/Female)", 
               "Fashion style (Casual, Formal, Bohemian, Streetwear, Vintage, Chic, Sporty, Edgy)",
               "Describe their outfit"].join(", ") +
              ". Report the values of the categories as classLabels. If you are unable to provide a category with a value then set its classLabel to null"
          }]
        }
      }]
    }
  }]
}

What this does:

  1. Detects people with 90% confidence threshold

  2. Crops each detected person

  3. Analyzes age, gender, fashion style, and outfit description

  4. Returns structured data for each person

Example 2: Multi-Question Object Analysis

Ask multiple specific questions about detected objects:

const DetailedObjectAnalysis = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.common-objects",
    confidenceThreshold: 0.8,

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "What color is this object? What is the object's condition (new, used, damaged)? What material is this object made of? If uncertain, set classLabel to null"
          }]
        }
      }]
    }
  }]
}

Example 3: Activity Recognition

Understand what people are doing:

const ActivityAnalysis = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.person:latest",
    confidenceThreshold: 0.85,

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "What activity is this person doing? Choose from: walking, running, sitting, standing, exercising, eating, talking, working, playing, or other. If you cannot determine the activity clearly, set classLabel to null"
          }]
        }
      }]
    }
  }]
}

Custom Class Labels with Object Localization

When you need to find objects and display custom labels on bounding boxes, use eyepop.localize-objects:latest:

const CustomLabels = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: 'eyepop.localize-objects:latest',
    params: {
      prompts: [
        {prompt: 'dog', label: 'Best Friend'},
        {prompt: 'cat', label: 'Just a Cat'}
      ]
    }
  }]
}

What this does:

  • prompt tells the model what to find ("dog", "cat")

  • label sets what appears on the bounding box ("Best Friend", "Just a Cat")

  • Returns bounding boxes with your custom labels instead of generic "dog" or "cat"

Custom Business Logic Examples

Retail Analytics:

// Analyze customer demographics and shopping behavior
prompt: "Determine: Age range (teens/20s/30s/40s/50s+), Gender, Shopping bag count (0/1/2/3+), Engagement level (browsing/interested/deciding). If any category is unclear, set its classLabel to null"

Security & Safety:

// Detect safety compliance
prompt: "Is this person wearing required safety equipment: Hard hat (yes/no), Safety vest (yes/no), Safety glasses (yes/no). For each item, if you cannot clearly see it, set classLabel to null"

Healthcare:

// Patient positioning analysis  
prompt: "Describe the patient's position: Sitting/Standing/Lying down, Posture (upright/slouched/leaning), Mobility aid visible (none/wheelchair/walker/cane). If uncertain about any aspect, set classLabel to null"

Common Pitfalls & Troubleshooting

When to Use Which Component

Use eyepop.localize-objects:latest when:

  • You need bounding boxes with custom labels

  • You're building object detection with specific labeling needs

  • You want to find specific objects by description

Use eyepop.image-contents:latest when:

  • You need detailed analysis or descriptions

  • You want to ask questions about visual content

  • You need structured responses to custom prompts

Prompt Design Mistakes

❌ Don't:

  • Ask leading questions ("Is the person happy?")

  • Make prompts too complex or long

  • Forget the null safety instruction

  • Use ambiguous terms without definition

✅ Do:

  • Ask open-ended analytical questions

  • Define categories clearly

  • Include the null safety pattern

  • Break complex analysis into multiple prompts

Confidence Considerations

  • Visual Intelligence returns confidence scores

  • Low confidence often correlates with null responses

  • Use confidence thresholds in your application logic

  • Consider multiple prompts for critical decisions

Getting Started

Ready to build your own Visual Intelligence Pop? Start with this template:

const MyVisualIntelligence = {
  components: [{
    type: PopComponentType.INFERENCE,
    ability: "eyepop.[YOUR_DETECTOR]:latest",  // Choose your detector
    confidenceThreshold: 0.8,                 // Adjust as needed

    forward: {
      operator: {
        type: ForwardOperatorType.CROP,
      },
      targets: [{
        type: PopComponentType.INFERENCE,
        ability: 'eyepop.image-contents:latest',
        params: {
          prompts: [{
            prompt: "[YOUR QUESTION HERE]. If you are unable to provide an answer, set classLabel to null"
          }]
        }
      }]
    }
  }]
}

Replace [YOUR_DETECTOR] with your chosen detection model and [YOUR QUESTION HERE] with your custom prompt, and you're ready to start analyzing!

PreviousFAQNextReports

Last updated 2 days ago