FAQ

This FAQ addresses common questions about creating high-quality datasets for object detection models, with specific focus on state-based detection (like Door Open/Closed) in real-world environments. Learn best practices for data collection, labeling, and avoiding common pitfalls that can impact model performance.

Example: Door Open / Door Closed Detection in Schools

Q: What kinds of images should we include in our dataset?

Your dataset should reflect the environments where your model will actually run. For a Door Open/Closed model in schools, that means:

Include:

School hallways, classrooms, gymnasiums, entrances, restrooms, janitorial spaces
Images taken from realistic camera placements (wall cameras, ceiling cameras, door-frame cameras)
Variations in lighting, angle, and usage (doors ajar, propped open, partially blocked)

Avoid: homes, garages, random internet images, architectural photos irrelevant to the target setting.

Q: What types of doors should we include?

Include only doors or doorways relevant to your use case.

For schools, that excludes: ornate residential doors, garage doors, barn doors, purely decorative doors, or gates.

Focus on:

Standard swinging doors
Push bar doors
Double doors
Doors with windows
Entryways and propped doors

Q: How should we handle images that contain no doors?

Include a reasonable number of negatives (images with no doors) that reflect the operational setting (school hallways, walls, entryways).

Do not include irrelevant negatives (bedrooms, outdoor scenes, kitchens, etc.).

Balance: ~20-25% negatives is typically enough. Too many will skew model behavior.

Q: How do I upload negative images?

Step-by-step process:

Prepare your negative images folder
- Create a folder named exactly negatives (lowercase, plural)
- Add approximately 100 images that do NOT contain your target object
Navigate to upload
- Go to the Models tab in the sidebar
- Find your dataset and click the three dots (...) menu
- Select Upload
Upload the folder
- Drag and drop your entire negatives folder onto the "add more images" tile
- The system will automatically recognize this as negative training data

Q: How precise do our bounding boxes need to be?

Bounding boxes should be tight around the door edges or doorway edges. Loose boxes can confuse the model, especially when subtle cues like open/closed gaps are important.

Tip: Propped doors and open doorways should be consistently handled — include guidance for your labelers.

Q: What is the definition of "door" vs. "doorway"?

You should define this clearly for your team:

Is an empty doorway without a physical door considered "open"?
Is a glass door treated differently?
How do you handle doors with windows, or doors partially blocked?

Answer these questions in a shared Labeling Guide to ensure consistency.

Q: What is the risk of missing labels?

If an image contains a door of interest that is not labeled, the model may treat it as a negative — this can lead to severe failure modes.

Recommendation: Carefully audit your dataset and label every instance of the object in scope, or remove the image.

Q: How diverse should our data be?

Diversity is good — but within your use case. You want:

Different lighting conditions
Different door types within your target environment
Occlusions (people passing through, signs on doors)
Partial views of doors

Avoid artificial diversity from irrelevant sources (non-school doors, home interiors, stock photos).

Q: How many images do we need?

It depends on your deployment goals, but a general rule:

500-1000 images per class as a starting point
More is better if your environment has high variability
Quality matters more than quantity — a smaller, well-curated dataset will outperform a large, noisy one

Common Mistakes & Best Practices

Q: What common mistakes should we avoid?

Top mistakes that hurt model performance:

Including doors that won't be seen in production (wrong door types)
Loose or imprecise bounding boxes
Poor balance of negative images
Missing labels for objects present in images
Too much "diverse" data that is not actually relevant (houses, internet photos)
Inconsistent definitions of what constitutes a door or an open/closed state

Q: What's the most important step before training?

Conduct a dataset audit pass — this step dramatically improves model quality. Review your dataset for:

Consistent labeling across all images
Proper representation of your target environment
Correct balance of positive and negative examples
Quality and relevance of all included images

Final Tip: EyePop.ai offers tools to help with dataset review and validation before training begins.

PreviousDeep Dives (FAQ)NextVisual Intelligence

Last updated 1 month ago