# FAQ

This FAQ addresses common questions about creating high-quality datasets for object detection models, with specific focus on state-based detection (like Door Open/Closed) in real-world environments. Learn best practices for data collection, labeling, and avoiding common pitfalls that can impact model performance.

*Example: Door Open / Door Closed Detection in Schools*

***

#### Q: What kinds of images should we include in our dataset?

Your dataset should reflect the environments where your model will actually run. For a Door Open/Closed model in schools, that means:

**Include:**

* School hallways, classrooms, gymnasiums, entrances, restrooms, janitorial spaces
* Images taken from realistic camera placements (wall cameras, ceiling cameras, door-frame cameras)
* Variations in lighting, angle, and usage (doors ajar, propped open, partially blocked)

**Avoid:** homes, garages, random internet images, architectural photos irrelevant to the target setting.

#### Q: What types of doors should we include?

Include only *doors or doorways relevant to your use case*.

For schools, that **excludes:** ornate residential doors, garage doors, barn doors, purely decorative doors, or gates.

**Focus on:**

* Standard swinging doors
* Push bar doors
* Double doors
* Doors with windows
* Entryways and propped doors

***

#### Q: How should we handle images that contain no doors?

Include a *reasonable number* of negatives (images with no doors) that reflect the operational setting (school hallways, walls, entryways).

**Do not include** irrelevant negatives (bedrooms, outdoor scenes, kitchens, etc.).

**Balance:** \~20-25% negatives is typically enough. Too many will skew model behavior.

#### Q: How do I upload negative images?

**Step-by-step process:**

1. **Prepare your negative images folder**
   * Create a folder named exactly `negatives` (lowercase, plural)
   * Add approximately 100 images that do NOT contain your target object
2. **Navigate to upload**
   * Go to the **Models** tab in the sidebar
   * Find your dataset and click the **three dots (...)** menu
   * Select **Upload**
3. **Upload the folder**
   * Drag and drop your entire `negatives` folder onto the "add more images" tile
   * The system will automatically recognize this as negative training data

***

Q: How precise do our bounding boxes need to be?

Bounding boxes should be **tight around the door edges or doorway edges**. Loose boxes can confuse the model, especially when subtle cues like open/closed gaps are important.

**Tip:** Propped doors and open doorways should be consistently handled — include guidance for your labelers.

#### Q: What is the definition of "door" vs. "doorway"?

You should define this clearly for your team:

* Is an empty doorway without a physical door considered "open"?
* Is a glass door treated differently?
* How do you handle doors with windows, or doors partially blocked?

**Answer these questions in a shared Labeling Guide to ensure consistency.**

#### Q: What is the risk of missing labels?

If an image contains a door of interest that is not labeled, the model may treat it as a negative — this can lead to severe failure modes.

**Recommendation:** Carefully audit your dataset and label *every instance* of the object in scope, or remove the image.

***

#### Q: How diverse should our data be?

Diversity is good — but *within your use case*. You want:

* Different lighting conditions
* Different door types within your target environment
* Occlusions (people passing through, signs on doors)
* Partial views of doors

**Avoid** artificial diversity from irrelevant sources (non-school doors, home interiors, stock photos).

#### Q: How many images do we need?

It depends on your deployment goals, but a general rule:

* **500-1000 images per class** as a starting point
* More is better if your environment has high variability
* Quality matters more than quantity — a smaller, well-curated dataset will outperform a large, noisy one

***

### Common Mistakes & Best Practices

#### Q: What common mistakes should we avoid?

**Top mistakes that hurt model performance:**

* Including doors that won't be seen in production (wrong door types)
* Loose or imprecise bounding boxes
* Poor balance of negative images
* Missing labels for objects present in images
* Too much "diverse" data that is not actually relevant (houses, internet photos)
* Inconsistent definitions of what constitutes a door or an open/closed state

#### Q: What's the most important step before training?

**Conduct a dataset audit pass** — this step dramatically improves model quality. Review your dataset for:

* Consistent labeling across all images
* Proper representation of your target environment
* Correct balance of positive and negative examples
* Quality and relevance of all included images

**Final Tip:** EyePop.ai offers tools to help with dataset review and validation before training begins.

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.eyepop.ai/developer-documentation/faq.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
