Liability Considerations for Superhuman (and Inscrutable) Performance of AI Models

By: Zach Harned

A fascinating question to consider in the field of artificial intelligence is what that intelligence should resemble? Modern day deep neural networks (DNNs) do not bear resemblance to the complex network of neurons that make up the human brain; however, the building blocks of such DNNs—the artificial neuron or perceptron devised by McCulloch and Pitts back in 1943—were biologically motivated and intended to mimic human neuronal firing. Alan Turing’s famous Turing test (or imitation game) equates intelligence with conversational indistinguishability between person and machine. Is the goal to develop AI models that reason like a person, or to create AI models capable of superhuman performance even if such performance is achieved in a foreign and unfamiliar manner? And how do these two different paths impact considerations of liability?

The answer to this question is highly contextual, and the motivations in each case are interesting and various. For instance, consider the history of AI’s role in the board games chess and Go. Each game’s history follows the same trajectory, starting with human superiority, followed by a time in which the combination of human plus AI were the strongest players, and concluding with AI alone being dominant. Currently, giving a human some control over an AI chess or Go system only hampers performance because these AI systems play the game at a level sometimes difficult for humans to understand, such as AlphaGo’s so-called “alien” move 37 in the epic faceoff with Lee Sedol, or Alpha Zero’s queen in the corner move, which DeepMind co-founder Demis Hassabis observed as “like chess from another dimension.” In these such cases, the inscrutability of the AI’s superhuman decisions is not necessarily a problem, and recent research has shown that it has even aided humans by spurring them on to eschew traditional strategies and explore novel and winning gameplay. Of course, AI vendors should only advertise an AI model as exhibiting superhuman performance if it truly does exceed human capabilities. This is because the FTC recently issued guidance warning against exaggerating claims for AI products.

Unlike boardgames, in the high-stakes realm of medical AI, having an AI model that reasons and performs in a manner similar to humans may favorably shift the liability risk profile for those developing and using such technology. For example, patients likely want an AI model that makes a diagnosis similar to the way a typical physician does, but better (e.g., the AI is still looking for the same telltale shadows on an x-ray or the same biomarker patterns from a blood panel). The ability of medical AI models to provide such explanations is also relevant to regulators such as the FDA, which notes that an algorithm’s inability to provide the basis for its recommendations can disqualify the algorithm from being classified as non-device Clinical Decision Support Software—such classification is desirable because it is excluded from FDA’s regulatory oversight and hence reduces regulatory compliance overhead.

Another interesting example comes from researchers who demonstrated that medical AI models can possess the ability to determine the race of a patient merely by looking at a chest CT scan, even when the image is degraded to the point that a physician cannot even tell that the image is an x-ray at all. The researchers note that such inscrutable superhuman performance is actually undesirable in this case, as it may increase the risk of perpetuating or exacerbating racial disparities in the healthcare system. Hence it can sometimes be desirable to have a machine vision system “see” the world in a way similar to humans. But the concern is whether this might come at a cost to the performance of the AI system. Having an underperforming AI model introduces the potential for liability when such underperformance might result in harm.

Luckily, some recent research has given us reason for optimism on this point, showing that sometimes you can have your cake and eat it too. This research involves Vision Transformers (ViT), which utilize the Transformer architecture originally proposed for text-based applications back in 2017. The Transformer architecture for text played a large part in the rapid development and success of modern day large language models (such as Google’s Bard), and now it is leading to great strides in the machine vision domain as well, an area that up until this point has been dominated by the convolutional neural network (CNN) architecture. The ViT in this research is substantially scaled up, with a total of 22 billion parameters; for reference, the previous record holder had four billion parameters. The ViT was also trained on a much larger dataset of four billion images, as opposed to the previously used dataset of 300 million images. For more details, the academic paper also provides the ViT’s model card, essentially a nutrition label for machine learning models. This research is impressive not only because of its scale and the state-of-the-art results it achieved, but also because the resulting model exhibited an unexpected and humanlike quality, namely, a focus on shape rather than texture.

Most machine vision models demonstrate a strong texture bias. This means that, in making an image classification decision, the AI model may be focused 70%-80% on the textures of the image and only 20%-30% on the shapes in the image. This is in stark contrast to humans, who exhibit a strong 96% shape bias, with only 4% focus on texture. The ViT mentioned in the research above achieves 87% shape bias with a 13% focus on texture. Although not quite at human level, this is a radical reversal compared to previous state-of-the-art machine vision models. As the researchers note, this is a substantial improvement in the AI model’s “alignment to human visual object recognition.” This emergent humanlike capability shows that improved performance does not always need to come at the cost of inscrutability. In fact, they sometimes travel hand in hand, as is the case with this ViT which achieves impressive, if not superhuman, performance while also exhibiting improved scrutability by aligning with the human bias (or emphasis) on shape in vision recognition tasks.

So, is it safer from a liability perspective for your AI model to (a) reason like a human and perhaps suffer from some of our all-too-human underperforming flaws, or (b) exhibit superhuman performance and suffer from inscrutability? Like with so many things, the lawyerly answer is, it depends, or more specifically, it depends on the context of the AI model’s use. But luckily, as the aforementioned Vision Transformer research demonstrates, sometimes you can have the best of both worlds with a scrutable and high-performing AI system.

Published by PLI Chronicle.