Tech

Google’s Antigravity 2.0 Leads OpenSCAD Architectural Benchmark Amidst AI Coding Shifts

A practical benchmark comparing six AI coding tools on their ability to model the Pantheon in Rome highlights the capabilities and limitations of current large language models in spatial geometry generation.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
ModelRift test reveals autonomous agents can generate complex parametric CAD, but human oversight remains essential for export-ready geometry.

ModelRift has released the results of a practical benchmark evaluating the capacity of artificial intelligence coding tools to generate parametric Computer-Aided Design (CAD) code using OpenSCAD. The test required six distinct AI systems to translate architectural reference images of the Pantheon in Rome into complex 3D geometry, a task chosen to assess how models handle radial symmetry, Boolean operations, and intricate structural details. The findings indicate that while autonomous generation is advancing rapidly, human-in-the-loop visual feedback remains critical for refining spatial geometry and ensuring meshes are export-ready for 3D printing.

Google’s Antigravity 2.0, powered by the newly released Gemini 3.5 Flash High model, achieved the highest quality score of 4.5 out of 5 among fully autonomous agents. Launched shortly before the benchmark run, Antigravity 2.0 outperformed competitors including Codex 5.5 High and Claude Sonnet by autonomously researching real architectural parameters rather than relying solely on visual estimation. The model implemented complex details such as the coffered ceiling, mixed-material columns, and a cutaway mode to showcase interior proportions, marking a significant step forward in spatial code generation.

The benchmark utilised OpenSCAD as the target language due to its compact, text-based syntax, which aligns closely with how language models reason about structure. Unlike natural language descriptions or UI-driven 3D applications, OpenSCAD allows agents to describe buildings through nested transformations, loops, and named modules. This approach ensures that the resulting geometry is inspectable, reproducible, and easily revised through parameter changes, making it a robust environment for testing the geometric judgment of AI agents.

While Antigravity 2.0 led in quality, speed was not a predictor of performance. Cursor Composer 2.5 completed the task fastest but produced the weakest result, generating a simplified placeholder that lacked architectural nuance. Conversely, Claude Sonnet took the longest among the original autonomous runs yet delivered the cleanest silhouette and most coherent proportions. Codex 5.5 High produced the densest model with impressive details like the entablature inscription, but its final exported STL file suffered from geometry problems around the portico roof, highlighting a disconnect between preview correctness and final mesh integrity.

The results underscore that fully autonomous generation is not yet the optimal workflow for complex spatial tasks. ModelRift demonstrated that a human-in-the-loop process, where users can annotate visual feedback directly on 3D renders, yielded higher quality results than any single-pass autonomous run. Although the Antigravity 2.0 result was impressive, the benchmark concluded that for production-quality architectural reconstruction, iterative visual refinement remains indispensable to correct misplacements and ensure structural accuracy.

Continue reading

More from Tech

Read next: Apple to roll out manual EQ controls for AirPods in iOS 27 update
Read next: Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset
Read next: Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026