Machine Learning Research
Toward Safer, More Helpful Models
The technique known as reinforcement learning from human feedback fine-tunes large language models to be helpful and avoid generating harmful responses such as suggesting illegal or dangerous activities. An alternative method streamlines this approach and achieves better results.