Google expands Gemini API File Search with multimodal and metadata capabilities
Google has announced enhancements to its Gemini API File Search tool, introducing support for multimodal data, custom metadata, and page-level citations to improve the efficiency and transparency of artificial intelligence applications.
Google has updated the Gemini API File Search tool to support multimodal data, custom metadata, and page-level citations. These enhancements allow developers to construct more robust retrieval-augmented generation (RAG) systems capable of processing images and text simultaneously, filtering unstructured data with key-value labels, and providing granular source grounding for improved verification. Previously, the tool was limited to text-only processing, which hindered the organisation of visual assets and the verification of specific claims within large documents.
The new multimodal capability means the tool now natively processes images and text simultaneously using the Gemini Embedding 2 model. This shift enables searches based on visual style or emotional tone rather than just keywords. For example, a creative agency could search an entire archive for an image matching a specific emotional tone described in a natural language brief, rather than relying on filenames.
Custom metadata allows developers to attach key-value labels to unstructured data to scope queries and reduce noise. By applying metadata filters at query time, an application can limit requests to specific data slices, such as filtering by department or status. This significantly reduces noise from irrelevant documents, increasing both the speed and accuracy of RAG workflows.
When an application pulls an answer from a massive document, users need to verify exactly where that answer came from. The updated File Search tool now ties the model's response directly to the original source by capturing the page number for every piece of indexed information. This level of granularity allows applications to link responses directly to source locations, aiding in fact-checking and transparency.
The infrastructure role of the tool is designed to handle heavy tasks like file uploading and searching, allowing developers to focus on product building. Whether prototyping a weekend project or scaling a production application for thousands of users, RAG systems can now better organise text and visual data. Google states that uploading files and searching across them remains simple, with further guidance available in the developer guide and API documentation.
While the update was described as occurring today, the source material does not specify the exact release date or time zone. Additionally, there is no data provided regarding how many developers have adopted these features or the performance improvements observed in real-world production environments.


