Tech

SIGMOD 2026: Research prototype F3 challenges legacy data formats with Wasm integration

Developed by the future-file-format project, the F3 data file format prioritises efficiency and extensibility but remains strictly in the research phase.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
Open-source project aims to rectify layout shortcomings in Parquet through embedded WebAssembly decoders

A research prototype named F3, an open-source data file format designed for efficiency, interoperability, and extensibility, was presented at SIGMOD 2026. Developed under the future-file-format project, the initiative aims to address layout shortcomings inherent in legacy formats such as Parquet. The project utilises embedded WebAssembly (Wasm) decoders to maintain extensibility and future-proofing capabilities within the file structure.

The F3 format seeks to rectify the layout limitations found in last-generation columnar storage solutions while preserving strong interoperability. By integrating Wasm decoders, the architecture allows for flexible, user-defined encoding schemes. The codebase includes subdirectories for core format definitions using FlatBuffer, encoding logic, benchmarks, and the Wasm decoding implementation, all housed within the future-file-format GitHub organisation.

Despite its technical ambitions, the project is explicitly described as a research prototype verifying the ideas presented in its accompanying paper. The developers have stated that users should not use F3 in production environments. Stability, performance in diverse environments, and long-term viability remain unverified, with claims regarding its superiority over Parquet based solely on the authors' research findings rather than independent industry consensus.

Testing for the prototype has been conducted exclusively on an Intel machine running Debian 12. Consequently, cross-platform compatibility is not guaranteed. The codebase includes specific scripts and benchmark experiments, including micro and end-to-end tests, which are detailed in the project's documentation for paper reproduction.

The project is licensed under the MIT License. While it offers a novel approach to data file organisation through embedded Wasm, it remains a specialised academic exercise rather than a ready-made alternative for commercial data infrastructure.

Continue reading

More from Tech

Read next: Digitised 1970s San Diego highway footage reveals lost era of commercial design
Read next: The hidden costs behind Thunderbolt 4 and 5 cable premiums
Read next: Expert unable to diagnose cause of Lincoln Memorial Reflecting Pool deterioration