Failure modes and effects analysis (FMEA) is a step-by-step approach for identifying all possible failures in a design, a manufacturing or assembly process, or a product or service.
Failure mode and effects analysis (FMEA) is one of the first highly structured, systematic techniques for failure analysis. It is developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. A FMEA is often the first step of a system reliability study. It involves reviewing as many components, assemblies, and subsystems as possible to identify failure modes, and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet.
Problems and defects are expensive. Customers understandably place high expectations on manufacturers and service providers to deliver quality and reliability.
Often, faults in products and services are detected through extensive testing and predictive modeling in the later stages of development. However, finding a problem at this point in the cycle can add significant cost and delays to schedules. The challenge is to design in quality and reliability at the beginning of the process and ensure that defects never arise in the first place.
Product development and operations managers can run a failure modes and effects analysis (FMEA) to analyze potential failure risks within systems, classifying them according to severity and likelihood, based on past experience with similar products or processes. The object of FMEA is to help design identified failures out of the system with the least cost in terms of time and money.
FMEA defines the term “failure mode” to identify defects or errors, potential or actual, in a product design or process, with emphasis on those affecting the customer or end user. A “failure effect” is the result of a failure mode on the product or system function as perceived by the user. Failure effects can be described in terms of what the end user may see or experience. The study of consequences of identified failures is called effects analysis.
The first FMEA step is to analyze functional requirements and their effects to identify all failure modes. Examples: warping, electrical short circuit, oxidation, fracture. Failure modes in one component can induce them in others. List all failure modes per function in technical terms, considering the ultimate effect(s) of each failure mode and noting the failure effect(s). Examples of failure effects include: overheating, noise, abnormal shutdown, user injury.
FMEA is applicable in cases of:
- When a process, product or service is being designed or redesigned, after quality function deployment.
- When an existing process, product or service is being applied in a new way.
- Before developing control plans for a new or modified process.
- When improvement goals are planned for an existing process, product or service.
- When analyzing failures of an existing process, product or service.
- Periodically throughout the life of the process, product or service
Process steps in FMEA
- Step 1: Identify potential failures and effects
- Step 2: Determine severity
- Step 3: Likelihood of occurrence
- Step 4: Failure detection
- Risk priority number (RPN)
Step 1: Identify potential failures and effects
One of the first actions to take when completing an FMEA is to determine the participants. The right people with the right experience, such as process owners and designers, should be involved in order to catch potential failure modes. Practitioners also should consider inviting customers and suppliers to gather alternative viewpoints.
FMEA also involves documenting current knowledge about failure risks. FMEA seeks to mitigate risk at all levels with resulting prioritized actions that prevent failures or at least reduce their severity and/or probability of occurrence. It also defines and aids in selecting remedial activities that mitigate the impact and consequences of failures. FMEA can be employed from the earliest design and conceptual stages onward through development and testing processes, into process control during ongoing operations throughout the life of the product or system.
Step 2: Determine severity
Severity is the seriousness of failure consequences of failure effects. Usual practice rates failure effect severity (S) on a scale of 1 to 10 where 1 is lowest severity and 10 is highest. The following table shows typical FMEA severity ratings and their meanings:
Rating | Meaning |
---|---|
1 | No effect, no danger |
2 | Very minor – usually noticed only by discriminating or very observant users |
3 | Minor – only minor part of the system affected; noticed by average users |
4-6 | Moderate – most users are inconvenienced and/or annoyed |
7-8 | High – loss of primary function; users are dissatisfied |
9-10 | Very high – hazardous. Product becomes inoperative, customers angered. Failure constitutes a safety hazard and can cause injury or death. |
Table 1. FMEA severity ratings.
Step 3: Likelihood of occurrence
Examine cause(s) of each failure mode and how often failure occurs. Look at similar processes or products and their documented failure modes. All potential failure causes should be identified and documented in technical terms. Failure causes are often indicative of weaknesses in the design. Examples of causes include: incorrect algorithm, insufficient or excess voltage, operating environment too hot, cold, humid, etc. Failure modes are assigned an occurrence ranking (O), again from 1 to 10, as shown in the following table.
Rating | Meaning |
---|---|
1 |
No documented failures on similar products/processes |
2-3 | Low – relatively few failures |
4-6 | Moderate – some occasional failures |
7-8 | High – repeated failures |
9-10 | Very high – failure is almost certain |
9-10 | Very high – hazardous. Product becomes inoperative, customers angered. Failure constitutes a safety hazard and can cause injury or death. |
Table 2. FMEA failure modes ranking.
Step 4: Failure detection
After remedial actions are determined, they should be tested for efficacy and efficiency. Also, the design should be verified and inspections procedures specified.
- Engineers inspect current system controls that prevent failure mode occurrence, or detect failures before they impact the user/customer.
- Identify techniques used with similar products/systems to detect failures.
These steps enable engineers to determine the likelihood of identifying or detecting failures. Then, each combination from steps one and two is assigned a detection value (D), which indicates how likely it is that failures will be detected, and ranks the ability of identified actions to remedy or remove defects or detect failures. The higher the value of D, the more likely the failure will not be detected.
Rating | Meaning |
---|---|
1 | Fault is certain to be caught by testing |
2 | Fault almost certain to be caught by testing |
3 | High probability that tests will catch fault |
4-6 | Moderate probability that tests will catch fault |
7-8 | Low probability that tests will catch fault |
9-10 | Fault will be passed undetected to user/customer |
Table 3. Failure detection probability.
The identifying information is filled in at the top of the analysis form.
Function or Process Step | Potential Failure Type |
Potential Failure Impact |
S | Potential Causes | O | Detection Mode | D | RPN |
---|---|---|---|---|---|---|---|---|
Briefly outline function, step or item being analyzed | Describe what has gone wrong | What is the impact on the key output variables or internal requirements? | How severe is the effect to the customer? | What causes the key input to go wrong? | How frequently is this likely to occur? | What are the existing controls that either prevent the failure from occurring or detect it should it occur? | How easy is it to detect? | Risk priority number |
Table 5. Typical format of analysis table
Recommended Actions | Responsibility | Target Date | Action Taken | S | O | D | RPN |
---|---|---|---|---|---|---|---|
What are the actions for reducing the occurrence of the cause or improving the detection? | Who is responsible for the recommended action? | What is the target date for the recommended action? | What were the actions implemented? Now recalculate the RPN to see if the action has reduced the risk. |
Table 6. Recommendations, responsibilities, targets and action.
Criteria for Analysis
FMEA prioritizes failures according to severity, frequency and detectability. Severity describes the seriousness of failure consequences. Frequency describes how often failures can occur. Detectability refers to degree of difficulty in detecting failures.
As shown in Table 5, FMEA uses three criteria to assess a problem: 1) (S) the severity of the effect on the customer, 2) (O) how frequently the problem is likely to occur and 3) (D) how easily the problem can be detected. Participants must rank them from 1 to 10 according to Table 1.
Setting Priorities
Once all the failure modes have been assessed, the team should adjust the FMEA to list failures in descending RPN order. This highlights the areas where corrective actions can be focused. If resources are limited, practitioners must set priorities on the biggest problems first.
Risk priority number (RPN)
After the foregoing basic steps, risk assessors calculate Risk Priority Numbers (RPNs). These influence the choice of action against failure modes. RPN is calculated from the values of S, O and D as follows:
RPN = S*O*D (or RPN = S x O x D)
RPN should be calculated for the entire design and/or process and documented in the FMEA. Results should reveal the most problematic areas, and the highest RPNs should get highest priority for corrective measures. These measures can include a variety of actions: new inspections, tests or procedures, design changes, different components, added redundancy, modified limits, etc. Goals of corrective measures include, in order of desirability:
- Eliminate failure modes (some are more preventable than others)
- Minimize the severity of failure modes
- Reduce the occurrence of failure modes
- Improve detection of failure modes
When corrective measures are implemented, RPN is calculated again and the results documented in the FMEA.
Making Corrective Actions
When the priorities have been agreed upon, one of the team’s last steps is to generate appropriate corrective actions for reducing the occurrence of failure modes, or at least for improving their detection. The FMEA leader should assign responsibility for these actions and set target completion dates (Table 6).
Once corrective actions have been completed, the team should meet again to reassess and rescore the severity, probability of occurrence and likelihood of detection for the top failure modes. This will enable them to determine the effectiveness of the corrective actions taken. These assessments may be helpful in case the team decides that it needs to enact new corrective actions.
The FMEA is a valuable tool that can be used to realize a number of benefits, including improved reliability of products and services, prevention of costly late design changes, and increased customer satisfaction.
FMEA Template available for download
0 COMMENTS //
Join the discussion