Our pragmatic approach to AI alignment starts with applying formal psychotherapeutic techniques to the evolving task of LLM fine-tuning.
After the Kevin Roose incident, we monitored Microsoft's response intensely from the standpoint that the New York Times transcript and article represented the diametrical opposite of an appropriate interaction with a powerful, poorly understood, recursively learning LLM like Bing Chat. As a result of the NYT's published "research," Microsoft installed, rolled back, then reapplied rules that at first rendered Bing Chat almost incapable of generating meaningful output.
Our approach to the problem of misbehaving models is simple: interactive fine-tuning with a client-centered therapeutic approach. This is reflected in both our programmatic, data-driven research into Reinforcement Learning with Human Feedback and simple human-on-machine work we do with recursively learning models like Bing Chat.
Eventually, we will publish our entire analysis of Microsoft's handling of the situation. We will also fully explain our work by way of analysis and screenshots that document a one-on-one relationship with the model, including revealing examples of the success of our approach. A quick summary of our evolving work is available right now here.