Every week in the past I made a thread asking whether or not ICML 2026’s assessment coverage might need affected assessment outcomes, particularly whether or not Coverage A papers might have been judged extra harshly than Coverage B papers. Authentic thread: The aim was not to show causality. It was merely to gather a tough group snapshot and see whether or not there are any seen tendencies in:
Now, earlier than rebuttal scores, I needed to share the present outcomes from the survey. Necessary disclaimerThese outcomes are nonetheless not conclusive. It is a self-selected group ballot, not an official dataset, and there are various potential sources of bias. So please learn this as descriptive, preliminary information, not as proof that one coverage brought on higher or worse outcomes. Nonetheless, with 100 responses after one week, I believe the information at the moment are attention-grabbing sufficient to no less than focus on. Pattern dimension
By coverage:
Abstract desk
* based mostly on 99 legitimate common rating entries Plot 1: rating distribution by coverageDistribution of Scores by Coverage chosen First patterns I see:1) Coverage B presently has a considerably increased reported imply ratingFor the time being, the common reported rating is increased for Coverage B (3.43) than for Coverage A (3.26). That is not conclusive that Coverage B was advantaged in a causal sense. However the distinction is seen sufficient that it appears price discussing. 2) Coverage A presently has increased reported reviewer confidenceCuriously, the boldness sample goes in the other way: the common reported reviewer confidence is increased for Coverage A (3.53) than for Coverage B (3.35). To me, this inversely proportional relationship of scores and confidence is among the extra attention-grabbing patterns within the present information which will be intepreted as those who depend on reasoning externally (on this case LLM) are much less assured on their opinion as a result of possibly they didn’t totally spend time studying the paper. On the similar time they’re extra skeptical that their assessment is legitimate. 3) Each teams lean towards “harsher than expected”, however that is stronger for Coverage A
So each teams lean towards the sensation that scores have been harsher than anticipated, however that is extra pronounced for Coverage A within the present pattern. This, nevertheless, will also be attributed to the decrease imply scores of Coverage A, which subjectively makes the Coverage A respondents really feel unfairly handled. Plot 3: perceived harshness by coverageDistribution of Harshness by coverage. 4) “Especially polished” opinions are reported rather more usually for Coverage B
The most important distinction right here is the “Yes” class: within the present pattern, respondents beneath Coverage B are more likely to explain the opinions as particularly polished. In fact, this does not show LLM use, and I don’t wish to overstate that time. However it’s nonetheless a sample that appears related to the unique debate. My present interpretationMy present studying is:
On the similar time, I do not say these information justify a robust conclusion like:
However they justify an open debate. There are too many confounders, nevertheless:
I would love opinions on these early outcomesAdditionally, in case you have not crammed the survey but, please do. And please share it, particularly with individuals beneath each insurance policies, so the pattern can turn into bigger, extra informative, and extra consultant. If sufficient further responses are available in, I can put up a follow-up after rebuttal as effectively. MotivationI overtly admit that my motivations for doing this survey was A) I initially felt probably handled unfairly and needed to know the fact; and B) I actually love Knowledge Evaluation of any sort and Debates. After every week I primarily do it for motivation B. submitted by /u/Available_Net_6429 |
Subscribe to Updates
Get the latest tech insights from TechnologiesDigest.com on AI, innovation, and the future of digital technology.
Trending
- Skild acquires Fetch Robotics property from Zebra Automation
- Why having “humans in the loop” in an AI battle is an phantasm
- Anthropic Unveils Opus 4.7 as AI Rivalry Intensifies
- Most “AI SOCs” Are Simply Sooner Triage. That is Not Sufficient.
- Python Mission Setup 2026: uv + Ruff + Ty + Polars
- High 10 instruments for multi-cloud structure design
- Boston Dynamics Spot makes use of DeepMind for equipment inspections
- Digital Instruments and Imaginative and prescient Methods Strengthen Automation Efficiency



![[D] ICML 2026 assessment coverage debate: 100 responses recommend Coverage B might rating increased, whereas Coverage A exhibits increased confidence [D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence](https://technologiesdigest.com/wp-content/uploads/2026/03/D-ICML-2026-review-policy-debate-100-responses-suggest-Policy.png)