BlazePose

BlazePose TFJS uses TF.js runtime to execute the model, the preprocessing and postprocessing steps.

Three models are offered.

  • lite - our smallest model that is less accurate but smaller in model size and minimal memory footprint.
  • heavy - our largest model intended for high accuracy, regardless of size.
  • full - A middle ground between performance and accuracy.

Please try it out using the live demo. In the runtime-backend dropdown, choose ‘tfjs-webgl’.


Table of Contents

  1. Installation
  2. Usage
  3. Performance
  4. Bundle Size
  5. Model Quality

Installation

To use BlazePose, you need to first select a runtime (TensorFlow.js or MediaPipe). To understand the advantages of each runtime, check the performance and bundle size section for further details. This guide is for TensorFlow.js runtime. The guide for MediaPipe runtime can be found here.

Via script tags:

<!-- Require the peer dependencies of pose-detection. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>

<!-- You must explicitly require a TF.js backend if you're not using the TF.js union bundle. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>

<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>

Via npm:

yarn add @tensorflow-models/pose-detection
yarn add @tensorflow/tfjs-core, @tensorflow/tfjs-converter
yarn add @tensorflow/tfjs-backend-webgl

Usage

If you are using the Pose API via npm, you need to import the libraries first.

Import the libraries

import * as poseDetection from '@tensorflow-models/pose-detection';
import * as tf from '@tensorflow/tfjs-core';
// Register WebGL backend.
import '@tensorflow/tfjs-backend-webgl';

Create a detector

Pass in poseDetection.SupportedModels.BlazePose from the posedetection.SupportedModels enum list along with a detectorConfig to the createDetector method to load and initialize the model.

detectorConfig is an object that defines BlazePose specific configurations for BlazePoseTfjsModelConfig:

  • runtime: Must set to be ‘tfjs’.

  • enableSmoothing: Defaults to true. If your input is a static image, set it to false. This flag is used to indicate whether to use temporal filter to smooth the predicted keypoints.

  • modelType: specify which variant to load from BlazePoseModelType (i.e., ‘lite’, ‘full’, ‘heavy’). If unset, the default is ‘full’.

  • detectorModelUrl: An optional string that specifies custom url of the detector model. This is useful for area/countries that don’t have access to the model hosted on tf.hub. It also accepts io.IOHandler which can be used with tfjs-react-native to load model from app bundle directory using bundleResourceIO.
  • landmarkModelUrl An optional string that specifies custom url of the landmark model. This is useful for area/countries that don’t have access to the model hosted on tf.hub. It also accepts io.IOHandler which can be used with tfjs-react-native to load model from app bundle directory using bundleResourceIO.
const model = poseDetection.SupportedModels.BlazePose;
const detectorConfig = {
  runtime: 'tfjs',
  enableSmoothing: true,
  modelType: 'full'
};
detector = await poseDetection.createDetector(model, detectorConfig);

Run inference

Now you can use the detector to detect poses. The estimatePoses method accepts both image and video in many formats, including: tf.Tensor3D, HTMLVideoElement, HTMLImageElement, HTMLCanvasElement. If you want more options, you can pass in a second estimationConfig parameter.

estimationConfig is an object that defines BlazePose specific configurations for BlazePoseTfjsEstimationConfig:

  • flipHorizontal: Optional. Defaults to false. When image data comes from camera, the result has to flip horizontally.

You can also override a video’s timestamp by passing in a timestamp in milliseconds as the third parameter. This is useful when video is a tensor, which doesn’t have timestamp info. Or to override timestamp in a video.

The following code snippet demonstrates how to run the model inference:

const estimationConfig = {flipHorizontal: true};
const timestamp = performance.now();
const poses = await detector.estimatePoses(image, estimationConfig, timestamp);

Please refer to the Pose API README about the structure of the returned poses.

Performance

To quantify the inference speed of BlazePose, the model was benchmarked across multiple devices. The model latency (expressed in FPS) was measured on GPU with WebGL, as well as WebAssembly (WASM), which is the typical backend for devices with lower-end or no GPUs.

  MacBook Pro 15” 2019
Intel core i9.
AMD Radeon Pro Vega 20 Graphics.
(FPS)
iPhone12
(FPS)
Pixel5
(FPS)
Desktop
Intel i9-10900K.
Nvidia GTX 1070 GPU.
(FPS)
MediaPipe Runtime
With WASM & GPU Accel.
92 | 81 | 38 N/A 32 | 22 | N/A 160 | 140 | 98
TensorFlow.js Runtime
with WebGL backend
48 | 53 | 28 34 | 30 | N/A 12 | 11 | 5 44 | 40 | 30

To see the model’s FPS on your device, try our demo. You can switch the model type and backends live in the demo UI to see what works best for your device.

Bundle Size

Bundle size can affect initial page loading experience, such as Time-To-Interactive (TTI), UI rendering, etc. We evaluate the pose-detection API and the two runtime options. The bundle size affects file fetching time and UI smoothness, because processing the code and loading them into memory will compete with UI rendering on CPU. It also affects when the model is available to make inference.

There is a difference of how things are loaded between the two runtimes. For the MediaPipe runtime, only the @tensorflow-models/pose-detection and the @mediapipe/pose library are loaded at initial page download; the runtime and the model assets are loaded when the createDetector method is called. For the TF.js runtime with WebGL backend, the runtime is loaded at initial page download; only the model assets are loaded when the createDetector method is called. The TensorFlow.js package sizes can be further reduced with a custom bundle technique. Also, if your application is currently using TensorFlow.js, you don’t need to load those packages again, models will share the same TensorFlow.js runtime. Choose the runtime that best suits your latency and bundle size requirements. A summary of loading times and bundle sizes is provided below:

  Bundle Size
gzipped + minified
Average Loading Time
download speed 100Mbps
MediaPipe Runtime    
Initial Page Load 22.1KB 0.04s
Initial Detector Creation:    
Runtime 1.57MB  
Lite model 10.6MB 1.91s
Full model 14MB 1.91s
Heavy model 34.9MB 4.82s
TensorFlow.js Runtime    
Initial Page Load 162.6KB 0.07
Initial Detector Creation:    
Lite model 10.41MB 1.91s
Full model 13.8MB 1.91s
Heavy model 34.7MB 4.82s

Model Quality

To evaluate the quality of our models against other well-performing publicly available solutions, we use three different validation datasets, representing different verticals: Yoga, Dance and HIIT. Each image contains only a single person located 2-4 meters from the camera. To be consistent with other solutions, we perform evaluation only for 17 keypoints from COCO topology. For more detail, see the article.

Method Yoga
mAP
Yoga
PCK@0.2
Dance
mAP
Dance
PCK@0.2
HIIT
mAP
HIIT
PCK@0.2
BlazePose.Heavy 68.1 96.4 73.0 97.2 74.0 97.5
BlazePose.Full 62.6 95.5 67.4 96.3 68.0 95.7
BlazePose.Lite 45.0 90.2 53.6 92.5 53.8 93.5
AlphaPose.ResNet50 63.4 96.0 57.8 95.5 63.4 96.0
Apple.Vision 32.8 82.7 36.4 91.4 44.5 88.6

Quality Chart