FDA Requests Public Comment on How to Measure and Manage Performance of AI-Enabled Medical Devices

By: Jennifer Yoo , Pinar Bailey, Ph.D.

What You Need To Know

  • The U.S. Food and Drug Administration (FDA) is seeking comments on methods of assessing the real-world performance of AI-enabled medical devices, including generative AI.
  • Prior FDA draft guidance has outlined a total product life cycle (TPLC) approach with clear expectations for performance monitoring, transparency and bias mitigation, where future guidance may follow suit.
  • Companies have an opportunity to influence future FDA requirements by submitting comments before the December 1, 2025 deadline.

The FDA has requested comments on current, practical ways to measure and manage real-world performance of artificial intelligence-enabled devices, including those that use generative AI. The FDA’s position on assessing the performance of AI-enabled devices has evolved from an initial concept proposing a new regulatory pathway tailored for adaptive algorithms toward comprehensive, prescriptive recommendations anchored in a total product life cycle (TPLC) approach, emphasizing transparency, continuous monitoring and bias mitigation. Submitting public comments presents an opportunity for companies to inform the regulatory landscape for AI-enabled devices.

Areas of Interest

In particular, the FDA’s Center for Devices and Radiological Health (CDRH) seeks public comment in the following areas as they relate to AI-enabled medical devices:

  • Performance metrics, indicators and timeframe used in measuring safety, effectiveness and reliability in real-word clinical use.
  • Real-world evaluation methods and infrastructure to proactively monitor performance post-deployment, including use of human expert review.
  • Postmarket data sources and quality management to address data quality, completeness, interoperability and incorporation of clinical outcomes and user feedback.
  • Monitoring protocols that trigger additional assessments and response to performance degradation.
  • Human-AI interaction and user experience that influence design features, user training or communication strategies in order to maintain safe and effective use as systems evolve.
  • Additional considerations and best practices, including implementation barriers, incentives to support efforts, and approaches to maintain patient and data protections.

Evolution of FDA’s Approach to AI-Enabled Medical Devices

The FDA’s shift to a TPLC approach centers on addressing the unique nature of continuously learning AI/machine learning (ML) Software as a Medical Device (SaMD), which can adapt over time, contrasting with the traditional regulatory paradigm designed for “locked” algorithms.

This TPLC approach enables continuous improvement of performance while maintaining safeguards. The potential impact on device manufacturers stems from the clarity and predictability provided by these evolving expectations, especially through the proposed predetermined change control plan (PCCP) pathway.

Prior FDA draft guidance released by the Biden administration set the stage for the issues the agency is most concerned with in this area. Initially, the FDA stated that performance evaluation protocols (part of the Algorithm Change Protocol (ACP)) must include the delineation of appropriate metrics and analysis procedures, statistical analysis plans, and performance targets that the revised algorithm must achieve. Later, in January and February 2025, the FDA focused further on postmarket performance monitoring plans as part of a manufacturer’s quality system, urging manufacturers to make proactive efforts to capture device performance after deployment. These plans would detail data collection and analysis methods for identifying, characterizing, and assessing changes in model performance, and monitoring potential causes like shifts in input data. The plans also would include mechanisms for deploying updates, mitigations, and corrective actions. In this framework, a robust performance monitoring plan can be leveraged as a crucial risk control element in the premarket submission. If the plan is approved, manufacturers would be provided clear regulatory instructions on how to handle changes: documenting minor changes internally if within the PCCP/ACP bounds or engaging in a “focused review” or new submission for changes outside the plan. 

Data Drift

In the draft guidance, the FDA recognized that performance of AI-enabled devices deployed in the real world may change or degrade over time, potentially posing a risk to patients. Performance changes may result from various factors, including shifts in patient populations, disease patterns, or data drift, which occurs when the systems producing inputs for the AI-enabled device change over time, impacting performance in ways that may not be evident to users. The FDA also identified cybersecurity threats as potential causes of performance drift given that they change the underlying data distribution. The FDA therefore laid out their expectation that manufacturers have a strategy to proactively monitor, identify, and address device performance changes and changes to device inputs or context of use that could lead to performance issues.

Metrics and Benchmarks

The FDA further expected performance validation to provide objective evidence that the device performs predictably and reliably according to its intended use. In the draft guidance, the FDA referenced specific, key clinical performance metrics, such as the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, true/false positive/negative counts (e.g., in a confusion matrix), and positive/negative predictive values (PPV/NPV). The FDA also referenced studies of repeatability and/or reproducibility to help quantify the uncertainty associated with device output, as well as studies showing subgroup analysis to benchmark against bias. The FDA further discussed that the validation process must include explicit efforts to identify differential performance across various sub-populations to ensure that the device remains safe and effective across the expected intended use population.

Monitoring Methodology

The FDA’s TPLC approach also framed the monitoring methodology for ongoing vigilance and structured documentation and suggested that manufacturers should establish a postmarket performance monitoring plan as part of their quality system to identify and respond to performance changes after deployment. A postmarket monitoring plan is important because premarket testing may not completely control risks from real-world environmental changes. As part of a robust plan, manufacturers would be expected to include procedures for identifying, characterizing, and assessing changes in model performance, a monitoring process to identify potential causes of undesirable changes, such as shifts in input data (data drift) or changes in patient demographics, and an action plan for deploying updates, mitigations, and corrective actions in a timely manner. This process can leverage the streamlined approach of an approved PCCP. The FDA may, in the future, require performance monitoring plans to be included in submissions as a proactive means of risk control to provide reasonable assurance of safety and effectiveness.

User-Related Impacts on AI-Enabled Devices

The FDA’s concerns and expectations regarding user-related impacts of AI-enabled medical devices center on two interconnected pillars: ensuring clear and functional transparency for all users and actively identifying and mitigating bias to ensure equitable performance.

Transparency: The FDA considers transparency crucial because AI models, especially complex deep neural networks, may be opaque, making their logic and decision-making difficult for users to understand. This opacity, coupled with the device’s reliance on data correlations, may lead to misuse, misunderstanding, or misinterpretation of the device’s output, which poses a risk to safety and effectiveness.

Transparency refers to clearly communicating contextually relevant performance and design information to the appropriate stakeholders in a manner that is both accessible and functionally comprehensible, enabling them to understand and act on the information. The FDA has previously recommended that manufacturers integrate “transparency by design” from the earliest design phase, focusing on the user’s needs and the context of use. The FDA recognized a broad range of intended users, including healthcare professionals, patients, caregivers, technicians, administrators, and others who interact with the device during installation, use, and maintenance. Transparency considerations would account for each user’s specific needs, functional capabilities, experience, knowledge levels, and training. The FDA may, for example, require usability evaluations to address whether all intended users can achieve specific goals while using the device and whether they can consistently and correctly receive, understand, interpret, and apply information related to the AI-enabled device. For risk mitigation, the FDA referenced the user interface, which serves as a risk control mechanism, and specific labeling requirements, which, if not sufficiently tailored, may encumber a manufacturer’s trade secrets and other intellectual property assets.

Bias: AI bias has been a concern for the FDA because it represents a potential tendency for the device to produce incorrect results in a systematic yet potentially unforeseeable way, due to limitations or errors in the training data. Thus, data representativeness is a core concern since models are highly dependent on the data used to train and test them. If training data underrepresents certain populations, the model may overfit to the biases of the training data or inadvertently learn spurious correlations (i.e., correlations unique to specific scanners or sites rather than true biological mechanisms). This may lead to differential performance (AI bias) in underrepresented groups. The FDA has therefore suggested previously that manufacturers implement strategies to address bias throughout the TPLC and further suggested documentation requirements in marketing submissions on the demographic distributions of the development data (and if relevant population characteristics are unavailable, manufacturers should provide an explanation and justification for the use of the data without this information and how risks associated with this gap have been controlled). The agency may make similar suggestions in future guidance.

The FDA’s Request for Public Comments

As the next step in the FDA’s evolving stance on regulating AI-enabled medical devices, the agency’s request for comments appears to be aimed at understanding practical approaches to measuring and evaluating the performance of AI-enabled medical devices, including what metrics manufacturers use in practice, as well as what barriers to innovation they encounter, for ongoing, systematic monitoring. For example, the public is encouraged to comment on what metrics or performance indicators are used to measure the safety, effectiveness, and reliability of AI-enabled medical devices in real-world clinical use, the tools and methods for proactive monitoring, the involvement of humans in review and monitoring approaches, human usage patterns and user interactions in clinical/real-world settings, design features for safe and effective use of evolving medical device systems, data sources used for performance evaluation, approaches taken to ensure data quality, triggers for monitoring, approaches for maintaining patient privacy, as well as interoperability challenges for monitoring and other implementation barriers manufacturers encounter.

Next Steps

It is not yet clear how the FDA will further evolve and solidify its current position on assessing the effectiveness of AI-enabled devices and how it plans to weigh risks against practical and commercial concerns in light of the current administration’s interest in accelerating AI applications in healthcare solutions. By submitting public comments, device manufacturers incorporating AI can help ensure their technical and business constraints, innovative approaches, and practical challenges regarding performance management are understood and considered by the agency when establishing future guidance. The public comment period is currently open and will end on December 1, 2025.