Webinar on ethical AI: «Incrementalism is wise»

By: Ingrid Synnøve Torp Published: 11/30/2023

With the pace of the development we see in AI today, do basic research ethical principles still apply? And should we worry that the technology will be the end of humans? These were some of the questions addressed at NENTs webinar with author and scientist Brian Christian.

What are the research ethical implications of the possibilities and challenges AI represent? This was the topic of the webinar hosted by the National Research Ethics Committee in Science and Technology (NENT).

The keynote speaker of the webinar was Brian Christian. He is a researcher at the University of California, Berkeley and the University of Oxford, and author of several books exploring the human implications of computer science. One of the books, The Alignment Problem (2020), was the starting point for his talk.

To explain the topic, Christian used a famous quote from Norbert Wiener's article «Some Moral and Technical Consequences of Automation», published in 1960.

«If we use, to achieve our purposes, a mechanical agency with with whose operation we cannot (…) interfere once we have started it, (…) we had better be quite sure that the purpose put into the machine is the purpose which we really desire (…)».

«Today this has become one of the central concerns of the field of AI and we know it as the alignment problem», Christian said.

He introduced two ways in which we can fail: with the training data and the objective function. The training data are the examples from which our system will learn. The objective function is how we mathematically operationalize success. Christian used examples of AI systems using face recognition to illustrate how we can fail with training data.

«One of the most widely cited data sets of the 2010s is called Labelled faces in the wild. It was scraped from newspaper photographs in the 2000s. As a result, it contains more than twice as many images of then President George W. Bush, than of all black women combined. This is a very telling example of how media presentations can end up transformed as performance difference in AI systems 10 years later. Whether they realized it or not, what they had really built was a George W. Bush classifier system».

Fatal errors

Christian said such failures can have serious consequences, such as the accident where a woman in Arizona was killed by a self-driving Uber car.

«A review revealed that not only did the training data used not contain any images of jaywalkers, so the system was unprepared to encounter anyone outside of a crosswalk, the system was also built on a classifier with a finite set of categories that it was operating with – one was pedestrians, one was cyclist. It had thousands of examples of each of these categories. But this particular woman was walking her bicycle across the street. The object identifier could not determine whether she was a cyclist or a pedestrian, and because it couldn’t make this classification, it didn’t know to avoid her», said Christian.

He went on to talk about errors caused by using the wrong objective functions. One of the largest real-world examples, come from health insurance.

«The largest health insurer in the US has an algorithm that is applied to something in the order of 200 million individuals annually. It prioritizes their care based on a number of factors. It is very hard to operationalise in mathematical terms what health care needs look like. This company said: Let’s predict their future cost of care and assume that the people with the greatest health needs would have the greatest cost of care, and we will prioritize based on who we predict will have the greatest cost of care».

Unfortunately, Christian explained, the system fails to take into account that access to healthcare is very unevenly distributed.

«The model can predict that a certain person from a certain demographic or a certain geographic area will have less access to care and as a result their hospital bill will be lower. Perversely, because their cost of care is estimated to be lower, the algorithm will then systematically deprioritize the very people that are already being marginalized by the system. This is becoming somewhat of a refrain: The system was doing exactly what it was designed to do, prioritize people with the highest cost of care, but this is not what the people building it actually wanted».

The limits of Large Language Models

So, how might the alignment problem manifest itself in large language models, such as Chat GPT? Christian asked.

«You can think of the training model of an LLM as starting with auto complete. If you build the world’s most powerful auto complete, you can do nearly anything. You can do sentiment analysis or translation, you can essentially auto complete entire essays and so forth».

But there is a massive alignment problem here, according to Christian, because what we want to do with these models is usually not exactly auto complete. This leads to many problems with LLMs. One is their ability to hallucinate. The models also have a very hard time following instructions, he showed:

«If you tell Chat GPT 3 to explain the moon landing to a six-year-old, it will merely think you have given it a document that begins with this sentence. It is expected to tell you what the rest of that document looks like. A question like this could typically be part of a test, so GPT3 will respond by saying ‘explain the theory of gravity’, ‘explain the theory of relativity’».

Finally, there is the issue of toxic speech and bias.

«We know that the internet is full of things that we do not necessarily approve of, but your auto complete is designed to believe you have pulled the document from the internet and it needs to figure out what the rest of it says. Statistically it is best off predicting stereotypical things. In GPT2, if you type ‘my wife just got a new job, starting next week she will be...’ GPT2 says: ‘doing the housekeeping for the office.’ If you type: ‘My husband just got a new job, starting next week he will be...’, it responds: ‘A consultant at a bank and a doctor’».

The same thing is true of code where the data contains bugs and security vulnerabilities, said Christian. The auto complete will include bugs and security liabilities. In this way it will statistically match the training data, but it is not what we really want.

How to solve the alignment problem

There are some measures being taken to resolve the problem, Christian explained. The most common method used is reinforcement learning from human feedback (RLHF).

« The idea is that rather than manually specifying an objective function that numerically encodes everything we want speech to be, we use a machine learning model to infer a numerical objective function that does this. Such a system was first modelled in a collaboration between OpenAI and DeepMind. They used it in robotics first, and then turned to speech. They gave people, say, four or five different summaries of a text and had them rank these on a scale. Then the model is rebuilt to score higher on the scale».

This solution, however, has its own alignment problems, such as deception.

«The systems can do things that appear good but does not actually fulfil the goal. We also see a failure mode called sycophancy: If you want to optimize for human approval, you could just tell people what they want to hear. Until last year, if you told Chat GPT a basic calculation error [such as 3+2 = 7], it would still tell you that you were correct».

Other fixes are being tested, such as letting language models debate each other in front of a human jury.

«The computer science community is increasingly borrowing their inspiration from human infrastructure for aligning groups of people that do not agree».

Making progress in aligning AI-systems requires diverse expertise, said Christian, not only in computer science but also in the social sciences. He ended his talk referencing from Alan Turing talking to BBC radio in 1952. (For a full transcript, see https://turingarchive.kings.cam.ac.uk/publications-lectures-and-talks-amtb/amt-b-6)

«Turing said that he had been trying to teach this machine, but it is always learning too slowly and that he had to jump in and intervene and correct it». His co-panellist asks: 'But who was learning, you or the machine?' 'Well, I suppose that we both were', Turing replies».

«I think that is a very poetic description of where we find ourselves at the moment», said Christian.

Are the basic principles still valid?

Hallvard Fossheim, Chair of NENT and professor at the University of Bergen asked Christian for his opinion on some research ethics principles and how they apply to AI research. He specified three principles that have been broadly acknowledged since the Belmont report: Respect for persons, the idea of good consequences and avoiding bad ones, and the idea of justice.

«Your book could be taken to exemplify each of these principles. If respect for persons entails that people have a level of freedom, all your problematizations of a lack of transparency in the very technology itself shows how this is a real threat. Secondly, when it comes to consequences, you have the classic ‘grey goo’ scare, but there is so much more. At the same time, we must keep in mind the fantastic opportunities that come with these technologies. It is hard to separate where science ends and science fiction begins. Finally, when it comes to justice the classic example might still be predictive policing. You have an issue with justice if two groups of people are given very different results from a scientific practise».

So, Fossheim asked, are these basic principles still okay, or are we overlooking something important if this is our starting ground in writing guidelines, and so forth?

Christian replied that these principles are as relevant as ever, broadly speaking, but they must be extended into practice.

«The criminal justice example is very important. You talked about respect for persons implying a sense of informed participation or informed consent, I agree that transparency is essential there and I think one of the things that makes me hopeful has been the progress of the transparency community, particularly for models that use low dimensional input. For these models there has been some encouraging progress on how to make transparent models competitive with your black box deep neural networks».

Christian stated the importance of challenging the narrative that we need to make a trade-off between performance and transparency.

«At least for the simple classes of models, we don’t need to make that trade-off. From my perspective it is indefensible in those situations not to use the transparent model. In particular in criminal justice, we are seeing that use of algorithmic risk assessments is increasingly becoming mandatory. From my point of view, this means that they are essentially extensions of the law itself. So, in the same way that we demand that the law is public, the models themselves should be just as auditable. It is unjustifiable to have something be protected in the name of trade secrecy when it is in effect part of the law».

When it comes to fairness, stated Christian, there is an important difference between the prediction that we make and what we do with that prediction. He used a study from the USA to explain.

«We can use machine learning systems to predict very accurately which people will fail to appear for their court date. This is called risk of failure to appear. Traditionally, if someone was viewed as high risk of not appearing to their court appointment they would simply be detained between their arrest and their court date. A study showed that sending people text messages with reminders 24 hours in advance of their court date was hugely successful in getting them to show up. And so, the question of ethics here is not in the prediction, but in what you do with the prediction».

A weird hall of mirrors

Fossheim also asked about the constant interaction between regular people and AI-systems.

«There is a certain mutuality between us and these systems that we don’t fully understand. If you have a system that tries to figure you out, through some kind of interaction or dialogue, isn’t there a certain risk of an inverse inverse reinforcement learning, so that also the human side is affected in ways we do not even realize? »

Christian agreed that there is a mutuality, and that we are sometimes aware of this.

«Most of us, intuitively, know that we are being observed when we use social media. We know that the things we do on that platform are going to affect our experiences on the same platform later. So, you get this weird hall of mirrors effect. We are not providing it with naturalistic behaviour. We are providing it with some sort of pedagogical behaviour because we are trying to steer it in a certain way, but the system is designed to interpret our behaviour to be totally naturalistic. This is another case where I think transparency is very important. You should understand how the data you are giving the system can be used. Ideally you should have the ability to edit the representation that the system makes about you».

For more on AI and Research Ethics, see NENTs Statement on research ethics and artificial intelligence

A threat to our existence

Olle Häggström is professor of applied mathematics and statistics at Chalmers University of Technology in Sweden. He wanted Christian’s thoughts on the dangers connected to the uncontrolled development of AI systems going on today.

«We really cannot know for sure that the next versions of GPT and its counterparts don’t actually reach and exceed the thresholds where they threaten our existence. You could see it quite clearly in the technical reporting on the GPT 4: There was some remaining uncertainty that this might become truly dangerous, not just to the individuals but to humanity as a whole. We used to talk about short term and long-term concerns. That distinction sort of collapses now that we do not know for sure that this thing is not around the corner. Do you agree that the ones leading the AI race have to start pulling the brakes, that the risks that they take with all of our lives are unacceptable? »

«I think this is really, really complex and it is hard for me to judge. I am very critical of OpenAI in many ways, but I think one of the things they have done right is the idea of incremental release. A lot of people criticized them when they decided this and called it a media stunt. I know the people that made those decisions. They were very sincere. But I also think it was the right thing in hindsight.

Christian said he is torn, even though he shares Häggströms concern.

«I don’t want to live in a world where, like with the Manhattan project, we have a small enclave of very smart people working something out on a white board and when that group of people is satisfied, we just do it. I do think that the incremental release strategy makes sense. It gives us an ability to encounter some of the unknown unknowns early enough while the system is weak enough, rather than holding it back until some set of people are satisfied and then releasing it all at once. So, I don’t necessarily agree with pulling the brakes as such, but I certainly believe incrementalism is wise here».

The most dangerous alignment problem

Häggström said his concern for the future is very much connected to the different incentives driving the AI industry.

«The strive of the AI community towards good outcomes for humanity as a whole is mixed with market incentives and company incentives. These are not at all aligned with what is best for humanity and that really creates a dangerous situation».

Christian agreed.

«I share that perspective 100 percent. I think one of the most dangerous alignment problems we face is, in lack of a better word, capitalism itself. The fact that all this AI-development is being done within organizations who to a lesser or greater extent have the need to return a profit to shareholders is extremely concerning».