These questions tested complex spatial reasoning, table navigation, logo extraction, and handwritten text. ADE got them all correct.
DocVQA Challenge - 99.156% Accuracy with ADE DPT-2 on Validation Set
Smell Test - 45 errors, 18 dataset issues, and 63 of 5,286 successful cases shown
Explore our failures and successes. Click sections to expand/collapse.
Randomly picked successes
These questions tested complex spatial reasoning, table navigation, logo extraction, and handwritten text. ADE got them all correct.
Complete transparency - every failure case with detailed analysis
Click on each error to expand details. Images show bounding box overlays where available.
Questionable dataset instances excluded from accuracy calculation
These cases have questionable annotations, ambiguous questions, or debatable ground truth. They are excluded from the final accuracy calculation but shown here for transparency.