Speech Recognition Based Emergency Stop Actuator

Abstract

Robots are becoming increasingly capable, but their range of motion can sometimes prevent the safe actuation of emergency stop buttons. By leveraging advances in artificial intelligence and emerging approaches to functional safety for artificial intelligence, new methods for actuation of emergency stops can be developed beyond the traditional button. For example, speech recognition algorithms, now a common technology, can be used for remote activation of an emergency stop function, helping to keep personnel out of harm’s way.

Keywords—Functional Safety, Artificial Intelligence, Emergency Stop, Robotics.

I. The Concept

A fundamental problem frequently experienced in the development of functionally safe robots is the use and placement of emergency stops.

Often the kinematic reach of advanced robotics encompasses the area immediately surrounding the robot (for example, humanoid robots with “extra-human” ranges of motion). This creates a challenge for placing an on-robot emergency stop: Where can the button be placed such that accessing it does not introduce a collision risk with the person attempting to press it?

Machinery standards do not specifically require the use of an emergency stop when not explicitly required by the risk assessment per R15.08 clause 5.1.5.2. However, it is good practice to include and, frankly, serves as reassurance for personnel who work or interact with the robot.

A potential solution to this problem is a sound detection-based emergency stop actuator that utilizes a verbal command— “robot stop” for example— to initiate an emergency stop.

In practice, this is a physical sensor with redundant microphones, a central processor, and output terminals for integration with a robotic system. It leverages advances in artificial intelligence-based speech recognition, along with emerging safety frameworks for artificial intelligence [IEC/ISO 22440] to safely stop a robot through voice commands.

II. The Hardware Design

At the highest level of architecture, this device uses redundant microphones that transmit signals to a central computer, which then outputs its decision to the integrated platform.

This redundant design meets the SIL 2 systematic capability recommendations of IEC 61508-2 when paired with the appropriate diagnostic coverage. It can be further improved to meet a higher SIL by adding a redundant processor and improved still further with a diverse microphone and processor along the redundant functional channel.

III. The Software Design

A. Traditional Software

The reception of data from the microphones, the conditioning of that data into a format suitable for ingestion by the voice recognition algorithms, and the output comparison, voting, and interpretation of the resulting binary outputs into analog signaling for the integrated platform all rely on traditional software development methods.

The primary goal is to isolate the black box algorithm for speech recognition as much as possible while ensuring that the surrounding software remains adequately human readable and interpretable.

B. Artificial Intelligence

The core artificially intelligent technology leveraged in this application is speech recognition, implemented here using a Recurrent Neural Network (RNN), though there are many ways to develop a successful speech recognition algorithm. The algorithm operates offline and is pre-trained on existing, language specific, labeled datasets of the verbal stop command phrase “robot stop”. This classifies the algorithm as an Application Usage Level A1 and Software Technology Class I, as defined by ISO/IEC 22440 clause 6.2.2.4 and 6.2.2.5, respectively.

To increase the safety integrity level of the application, redundant speech-recognition algorithms are used on separate processor cores. Each algorithm independently evaluates the microphone data and determines if the stop command has been spoken. Their decision outputs are then passed to a comparator, implemented using traditional software, to trigger the emergency stop if either algorithm votes yes.

IV. Proving Safety

The device as proposed in this paper is incomplete and must follow the functional safety lifecycle defined in IEC 61508-1 to achieve compliance. This technology can be developed either as a compliant item [IEC 61508-1] or Safety Element out of Context (SEooC) [ISO 26262] if intended to be an standalone solution for integration into various end-use applications. Developing it as an SEooC requires documenting assumptions for use, integration constraints, and other relevant considerations.

Alternatively, this design could be directly incorporated into a robotic system as the actuation component of an emergency stop, or another type of protective stop.

A. Hardware

The key machinery standards for ensuring adequate testing and development of the device are IEC 61508-1 and IEC 61508-2.

B. Traditional Software

IEC 61508-3 outlines a clear process for the development of functionally safe software, which will be applied in this theoretical application.

C. Artificial Intelligence

As a relatively new field, safety in artificial intelligence is only beginning to be defined. The upcoming ISO/IEC 22440-1 defines guidelines for the design, development, and testing of AI.

V. Trading Performance for Up-Time

As many people have experienced with modern voice assistants, unintended activations can occur. In a robotic application, unintended triggering of the emergency stop would reduce the robot’s operational up-time, which should be minimized to the extent possible to ensure the robot system delivers adequate value.

There are several strategies to reduce accidental activation. One is to require a specific phrase or string of words for activation. Another is to train the voice recognition algorithm on the voices of a specific person or group of people who interact with the robot. A third method is to increase the minimum confidence level required for the algorithm to positively recognize the voice command.

Each of these options come with trade-offs that must be carefully considered as a part of the risk assessment to determine how best to tailor the emergency stop to the specific application. This list is not exhaustive, and additional strategies should be explored.

VI. Summary

In summary, this paper highlights an alternative method for safe actuation of emergency stop functionality by describing, at a high level, the hardware and software technology that could be used. It also references the core standards necessary to achieve compliance for such a system.

As robotic systems continue to increase in capability, new tools and methods are needed to ensure safety of people in their vicinity.

References

  1. Harsh Ahlawat, et al. “Automatic Speech Recognition: A Survey of Deep Learning Techniques and Approaches.” International Journal of Cognitive Computing in Engineering, 1 Jan. 2025, https://doi.org/10.1016/j.ijcce.2024.12.007.

Author

Zackary Anderson
Functional Safety Engineer, Reynolds & Moore
Salt Lake City, Utah, USA
zack.anderson@reynolds-moore.com