Recently, OpenAI introduced CriticGPT, a new AI tool designed to enhance the reliability of AI systems like ChatGPT by identifying errors in computer code generated by these systems. CriticGPT operates within the framework of reinforcement learning from human feedback (RLHF), where it assists human trainers by pinpointing potential mistakes in code produced by ChatGPT.
To train CriticGPT, researchers deliberately introduced errors into code samples, allowing the AI to learn to recognize and flag various types of coding errors. In testing against human reviewers, CriticGPT demonstrated a 63% improvement in error detection compared to human review alone, especially in identifying real-world mistakes made by large language models (LLMs). Teams that combined CriticGPT with human reviewers produced more detailed error reports, and the AI also helped mitigate false error identifications that AI-only reviews sometimes create.
Read More: ChatGPT Comes to iPhone, ‘Apple Intelligence’ Unveiled
However, CriticGPT has limitations. It was trained primarily on short code snippets, which may not generalize well to longer and more complex coding tasks that future AI systems will encounter. Additionally, while CriticGPT reduces false positives, it does not completely eliminate them, and it may still miss errors that are dispersed throughout code rather than localized in one area.
Looking ahead, OpenAI plans to integrate CriticGPT-like AI assistants into their review processes to improve how human trainers evaluate outputs from large language models. These tools aim to facilitate the identification of errors in complex outputs, enhancing overall reliability and alignment with human goals in AI systems.