Google returns to wearable market with AI-powered audio glasses
The tech giant unveiled its latest hardware strategy at Google I/O, marking a distinct shift from its earlier visual-focused attempts and positioning the audio-only models as the first step in a broader mixed-reality roadmap.

Google has announced a new line of AI-powered audio glasses, marking its return to the wearable technology sector. The devices were unveiled at Google I/O on Tuesday as part of a collaborative effort involving Samsung, eyewear manufacturers Warby Parker and Gentle Monster, and Google’s own software engineering teams.
Unlike previous iterations of the company’s hardware, these new units are explicitly branded as audio glasses rather than full smart glasses with visual displays. The devices are designed to pair with both Android and iOS smartphones, allowing users to issue verbal commands to access Google’s ecosystem. This includes direct interaction with the Gemini AI model to perform tasks such as ordering services or managing applications through voice alone.
The announcement signals a strategic pivot from Google’s earlier foray into the space with Google Glass, which faced significant public scrutiny and helped popularise the term glassholes. The current approach mirrors the model currently employed by Meta, focusing on audio interaction and seamless integration with existing mobile ecosystems rather than introducing a separate visual display interface at launch.
Hardware development for the frames was handled in partnership with established eyewear brands Warby Parker and Gentle Monster, while Samsung provided hardware collaboration. The devices are scheduled for release later this year, serving as the initial phase of Google’s commercialisation strategy for mixed reality.
While the audio-only models are hitting the market in the coming months, Google has indicated that display-integrated versions of the smart glasses are expected to follow in the autumn. This phased rollout suggests a cautious approach to hardware adoption, prioritising voice-based utility before introducing more complex visual components.


