Snap your fingers and make your coffee maker brew you a fresh cup. Wave a hand near your smart TV and switch on today’s weather forecast. Tap a finger near your smartwatch and set an alarm in your child’s bedroom. How great would it be to get things done just by gesturing? It’s not that unrealistic anymore: hand tracking and gesture recognition technologies are penetrating multiple industries. But do we really need capabilities like these? And what is the true value of real-time hand gesture recognition (HGR)?
Gesturing is a natural and intuitive way to interact with people and the environment. So it makes perfect sense to use hand gestures as a method of human-computer interaction (HCI). But there are quite a few challenges, starting from needing to wave your hands in front of your small smartphone screen and ending with the complex machine learning algorithms needed to recognize more than a simple thumbs up. Is the juice worth the squeeze? Let’s find out, starting from definitions and moving to the technical details.
The need for gesture recognition technology
Markets and Markets that the gesture recognition market will reach $32.3 billion in 2025, up from $9.8 billion in 2020. Today’s top producers of gesture interface products are, unsurprisingly, Intel, Apple, Microsoft, and Google. The key industries driving mass adoption of touchless tech are automotive, healthcare, and consumer electronics.
Gesture recognition market in China, 2014–2025
Source: Grand View Research
Keep in mind that hand tracking and gesture recognition are not the same things. Both technologies are supposed to use hands for human-machine interaction (HMI) without touching, switching, or employing controllers. Sometimes, systems for hand tracking and gesture recognition require the use of markers, gloves, or sensors, but the ideal system requires nothin but a human hand.
Systems employing gesture recognition technology are only capable of distinguishing specific gestures: thumbs up, wave, peace sign, rock sign, etc. Hand tracking is more complex: it provides more variability in the HMI, since it tracks hand size, finger position, and other characteristics. The number of potential interactions with digital objects is limitless, but overlapping, occlusion, and interpretation issues occur. While AI in the gesture recognition system is only trained to identify a limited number of gestures and is less flexible than hand tracking technology, it doesn’t suffer from the same issues.
Why may people want to use gestures instead of just touching or tapping a device? A desire for contactless sensing and hygiene concerns are the top drivers of demand for touchless technology. Gesture recognition can also provide better ergonomics for consumer devices. Another market driver is the rise of biometric systems in many areas of people’s lives, from cars to homes to shops.
During the coronavirus pandemic, it’s not surprising that people are reluctant to use touchscreens in public places. Moreover, for drivers, tapping a screen can be dangerous, as it distracts them from the road. In other cases, tapping small icons or accidentally clicking on the wrong field increases frustration and makes people look for a better customer experience. Real-time hand gesture recognition for computer interactions is just the next step in technological evolution, and it’s ideally suited for today’s consumer landscape. Besides using gestures when you cannot conveniently touch equipment, hand tracking can be applied in augmented and virtual reality environments, sign language recognition, gaming, and other use cases.
The high cost of touchless sensing products is one of the major challenges of this technology, along with the complexity of software development for HGR. To create a robust system that detects hand positions, a hand tracking solution requires the implementation of advanced machine learning and deep learning algorithms, among other things.
Hand tracking and gesture recognition with AI: How does it work?
Gesture recognition provides real-time data to a computer to make it fulfill the user’s commands. Motion sensors in a device can track and interpret gestures, using them as the primary source of data input. A majority of gesture recognition solutions feature a combination of 3D depth-sensing cameras and infrared cameras together with machine learning systems. Machine learning algorithms are trained based on labeled depth images of hands, allowing them to recognize hand and finger positions.
Gesture recognition consists of three basic levels:
- Detection. With the help of a camera, a device detects hand or body movements, and a machine learning algorithm segments the image to find hand edges and positions.
- Tracking. A device monitors movements frame by frame to capture every movement and provide accurate input for data analysis.
- Recognition. The system tries to find patterns based on the gathered data. When the system finds a match and interprets a gesture, it performs the action associated with this gesture. Feature extraction and classification in the scheme below implements the recognition functionality.
HGR system
Source: Research Gare
Many solutions use vision-based systems for hand tracking, but such an approach has a lot of limitations. Users have to move their hands within a restricted area, and these systems struggle when hands overlap or aren’t fully visible. With sensor-based motion tracking, however, gesture recognition systems are capable of recognizing both static and dynamic gestures in real time.
In sensor-based systems, depth sensors are used to align computer-generated images with real ones. Leap motion sensors are also used in hand tracking to detect the number and three-dimensional position of fingers, locate the center of the palm, and determine hand orientation. Processed data provides insights on fingertip angles, distance from the palm center, fingertip elevation, coordinates in 3D space, and more. The hand gesture recognition system using image processing looks for patterns using algorithms trained on data from depth and leap motion sensors:
- The system distinguishes a hand from the background using color and depth data. The hand sample is further divided into the arm, wrist, palm, and fingers. The system ignores the arm and wrist since they don’t provide gesture information.
- Next, the system obtains information about the distance from the fingertips to the center of the palm, the elevation of the fingertips, the shape of the palm, the position of the fingers, and so on.
- Lastly, the system collects all extracted features into a feature vector that represents a gesture. A hand gesture recognition solution, using AI, matches the feature vector with various gestures in the database and recognizes the user’s gesture.
Depth sensors are crucial for hand tracking technology since they allow users to put aside specialized wearables like gloves and make HCI more natural.
Intel has recently released a suite of depth and tracking technologies called RealSense, providing the developer community with open-source tools for a variety of languages and platforms. The Intel RealSense Depth Camera D455 with Lidar, stereo depth, tracking, and coded light capabilities provides a high level of gesture recognition and a longer range for HMI. With the help of a camera like this, dynamic hand gesture recognition systems can be applied to various use cases, from robotics and drones to 3D scanning and people tracking.
Things to consider while developing gesture recognition technology
Real-time hand gesture perception, while being natural for people, is quite a challenge for computer vision. Hands often get in the way of each other as seen by a camera (think of a fist or handshake) and lack high-contrast patterns.
To develop an HGR system, AI algorithms are trained to recognize labeled data and predict unknown data based on the developed model. A hand tracking database is the first step in AI training. To create a training data set, depth cameras are used to segment a specific element from the background. High-quality segmentation helps AI distinguish between left and right hands, individual fingers, etc. The higher the quality of data sets and the more annotations they include, the higher the accuracy of dynamic hand gesture recognition with computer vision.
At CVPR, Google announced a new approach to hand perception implemented in MediaPipe — a cross-platform framework for building multimodal machine learning pipelines. With this new method, real-time performance can be achieved even on mobile devices, scaling to multiple hands.
The machine learning pipeline of this hand tracking solution consists of several models:
- Palm detector
- Hand landmark
- Gesture recognizer
HGR machine learning pipeline
Source: Google AI Blog
Since this hand tracking and gesture recognition pipeline is open-source, developers have a complete stack for prototyping and innovating on top of Google’s model. Extensive datasets for AI training will help increase the number of gestures recognized accurately and make the system more robust.
Applications of hand gesture recognition technology
In recent years, HGR technology has started to penetrate various industries as advances in computer vision, sensors, machine learning, and deep learning have made it more available and accurate. The top four fields actively adopting hand tracking and gesture recognition are automotive, healthcare, virtual reality, and consumer electronics.
In recent years, HGR technology has started to penetrate various industries as advances in computer vision, sensors, machine learning, and deep learning have made it more available and accurate. The top four fields actively adopting hand tracking and gesture recognition are automotive, healthcare, virtual reality, and consumer electronics.
- Automotive
A gesture recognition solution from Sony DepthSensing Solutions has a time-of-flight feature that measures the time it takes for a gesture to “travel” from the infrared sensor to the object and back. The AI is trained to distinguish main gestures from gestural noise and to operate under any lighting conditions.
The BMW 7 Series has a built-in HGR system that recognizes five gestures and can control music and incoming calls, among other things. Less interaction with the touchscreen makes the driving experience safer and more convenient.
- Healthcare
Emergency rooms and operating rooms may be chaotic, with lots of noise from personnel and machines. In such environments, voice commands are less effective than gestures. Touchscreens are not an option either, since there’s a strict boundary between what is and is not sterile. But accessing information and imaging during surgery or another manipulation is possible with HGR tech, as proven by Microsoft. GestSure provides doctors with the ability to check MRI, CT, and other imagery with simple gestures without scrubbing out.
- Virtual reality
In 2016, Leap Motion (acquired by Ultrahaptics in 2019) presented updated HGR software that allows users, in addition to controlling a PC, to track gestures in virtual reality. The Leap Motion controller is a USB device that observes the area of about one meter with the help of two IR cameras and three infrared LEDs. This controller is used for applications in the medical, automotive, and other fields.
A hand tracking application from ManoMotion recognizes gestures in three dimensions using a smartphone camera (on both Android and iOS) and can be applied in AR and VR environments. The use cases for this technology include gaming, IoT devices, consumer electronics, and robots.
- Consumer electronics
The size of the global gesture recognition market is predicted to grow by $624 million from 2018 to 2022, and companies are trying to catch the opportunities. The Italian startup Limix uses a combination of IoT and dynamic hand gesture recognition to record sign language, translate it into words, and then play them on a smartphone via a voice synthesizer.
Home automation is another broad field within the consumer electronics domain in which gesture recognition is being employed. uSens develops hardware and software to make smart TVs sense finger movements and hand gestures. Gestoos’ AI platform with gesture recognition technology offers touchless control over lighting and audio systems. With Gestoo, gestures can be created and assigned via a smartphone or another device, and one gesture can be used to enable several commands.
These days, the consumer market is open for new experiences in HMI, and hand gesture recognition technology is a natural evolution from touchscreens. Demand for smoother and more hygienic means of interaction with devices as well as a concern for driver safety are pushing the adoption of HGR in industries from healthcare to automotive and robotics. And while software development for gesture recognition systems is quite challenging, expertise in AI, deep learning, computer vision, as well as generative AI, and innovative hardware from top tech providers make HGR solutions more affordable than they were even a few years ago.
Want to find out more about HGR system development from a trusted custom software development agency? Ask our experts.