I’m experimenting with ESP-WHO + ESP-DL on the ESP32-S3.
Right now, I have the default human_face_detect example running successfully with models like human_face_detect_msr_s8_v1.espdl.
Now, I want to use a custom lightweight landmark model with 22 pivot points (instead of full 468 Mediapipe points). My goal is to detect just key facial features (eyes, mouth, etc.) for distraction/yawn detection.
Questions:
What is the correct workflow to train and quantize such a model for ESP32-S3?
Should I use TensorFlow Lite → int8 quantization → convert with esp-dl tools?
Or is there an Espressif-specific flow?
Any example of how to prepare .espdl models from TensorFlow/PyTorch for deployment?
Are there constraints for input resolution / ops supported that I should consider when designing the model?
Would you recommend starting from the existing fpenet or landmark model in ESP-DL repo and retraining with fewer points?
I would appreciate any guidance or workflow references