Tech

Google expands Gemini API File Search with multimodal and metadata capabilities

Google has announced enhancements to its Gemini API File Search tool, introducing support for multimodal data, custom metadata, and page-level citations to improve the efficiency and transparency of artificial intelligence applications.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
New updates allow developers to build more verifiable retrieval-augmented generation systems by processing images alongside text and attaching granular source citations.

Google has updated the Gemini API File Search tool to support multimodal data, custom metadata, and page-level citations. These enhancements allow developers to construct more robust retrieval-augmented generation (RAG) systems capable of processing images and text simultaneously, filtering unstructured data with key-value labels, and providing granular source grounding for improved verification. Previously, the tool was limited to text-only processing, which hindered the organisation of visual assets and the verification of specific claims within large documents.

The new multimodal capability means the tool now natively processes images and text simultaneously using the Gemini Embedding 2 model. This shift enables searches based on visual style or emotional tone rather than just keywords. For example, a creative agency could search an entire archive for an image matching a specific emotional tone described in a natural language brief, rather than relying on filenames.

Custom metadata allows developers to attach key-value labels to unstructured data to scope queries and reduce noise. By applying metadata filters at query time, an application can limit requests to specific data slices, such as filtering by department or status. This significantly reduces noise from irrelevant documents, increasing both the speed and accuracy of RAG workflows.

When an application pulls an answer from a massive document, users need to verify exactly where that answer came from. The updated File Search tool now ties the model's response directly to the original source by capturing the page number for every piece of indexed information. This level of granularity allows applications to link responses directly to source locations, aiding in fact-checking and transparency.

The infrastructure role of the tool is designed to handle heavy tasks like file uploading and searching, allowing developers to focus on product building. Whether prototyping a weekend project or scaling a production application for thousands of users, RAG systems can now better organise text and visual data. Google states that uploading files and searching across them remains simple, with further guidance available in the developer guide and API documentation.

While the update was described as occurring today, the source material does not specify the exact release date or time zone. Additionally, there is no data provided regarding how many developers have adopted these features or the performance improvements observed in real-world production environments.

Continue reading

More from Tech

Read next: Apple opens developer access to iOS, iPadOS and macOS 27 betas
Read next: Apple confirms macOS 27 Golden Gate requires Apple Silicon, ending Intel support
Read next: Apple unveils watchOS 27 with Siri AI integration and hardware restrictions