Overall accuracy for GPT-5.5 and Opus 4.7 remains flat on SpatialBench. Scientist-reviewed trajectories reveal persistent gaps in assay-aware biological judgment.
why not provide data analysis instructions/examples that are provided by the assay suppliers where possible? is the expectation that the agents should be able to retrieve this on their own?
my expectation would be that if the agents have gotten better at general intelligence, then perhaps we would see performance gains when instructions/examples are provided. not specified instructions that the user makes, just the instructions/examples from the assay supplier.
but i agree that the agents should have ideally been able to retrieve this themselves.
why not provide data analysis instructions/examples that are provided by the assay suppliers where possible? is the expectation that the agents should be able to retrieve this on their own?
Yes, we construct tasks with the kind of realistic context you would give an experienced scientist.
this makes sense.
my expectation would be that if the agents have gotten better at general intelligence, then perhaps we would see performance gains when instructions/examples are provided. not specified instructions that the user makes, just the instructions/examples from the assay supplier.
but i agree that the agents should have ideally been able to retrieve this themselves.