HN Reader

NewTopBestAskShowJob
Show HN: I built a render pipeline to generate 'impossible' CCTV training data
score icon3
comment icon1
5 days agoby Simuletic
Hi HN,

I’m Fredrik, the founder of Simuletic. I’m an engineer working in Computer Vision, and I kept hitting a wall with object detection models in security scenarios: they are terrible at detecting "edge cases" that are rare or dangerous in the real world.

Some of the most difficult problems I have faced is "weapon detection" and "lying / falling detection".

To train an AI to detect someone falling or lying injured, you need thousands of images. But getting real data is an ethical nightmare—you can't ask elderly people to fall down stairs, and you can't use real accident footage due to privacy.

The current industry solution is using actors. But actors instinctively protect themselves when falling; they don't capture the chaotic, dead-weight reality of a true medical incident. Models trained on actors fail on real victims.

The Solution: A Rendering Pipeline Instead of scraping the web for bad data, I built a rendering pipeline to generate physically accurate and highly realistic synthetic data specifically for CCTV angles.

By using a AI generation local pipeline (no data leaves my own server) I can simulate any scenario with high realism, unlike 3d simulation tools... including bounding boxes.

The pipeline automatically generates perfectly annotated ground truth (bounding boxes and segmentation masks) for every frame, solving the massive headache of manual labeling.

I’ve released the first batch on Kaggle for different scenarios. It’s designed specifically to bridge the "sim-to-real" gap for overhead cameras.

Kaggle Link: https://www.kaggle.com/datasets/simuletic/cctv-incident-data...

Specs: CCTV resolution and noise profiles, varied lighting (indoor/outdoor), and high-angle perspectives.

Format: Annotations are YOLO-ready out of the box.

What I want from HN: I know many of you work with YOLO and other CNNs. I’d love for you to throw this dataset into your training mix and see if it improves recall on "lying down" classes in real-world tests.

I’m here to answer questions about the rendering pipeline, the domain randomization techniques used, or the challenges of sim-to-real transfer.

Thanks, Fredrik