the stream

2025/1/25 20:16

AI systems with research taste

How might we design an AI system skilled at asking questions whose verifiable answers/rewards are maximally useful for RL?

This would be like a "good research question taste" model which would allow us to assemble a training set of more sample-efficient Qs w/ verifiable answers. Like a teacher who guides a student by asking the right questions at the right time, a system optimized for good research taste would help humanity advance the frontier of unsolved problems as efficiently as possible, especially when paired with "good problem-solving taste" AI systems.

Good question taste feels like it may be a much harder technical accomplishment than good problem-solving taste, but feels much more fundamental to creating intelligence that meaningfully reduces the cost of new science.