Tech

FUTO releases one-million swipe dataset for AI model training

The entity behind swipe.futo.org has published a collection of QWERTY English typing gestures collected from August 2024, aimed at advancing swipe typing technology.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
Open-source dataset available on HuggingFace under MIT license

FUTO has made available a dataset comprising one million QWERTY English swipe typing gestures, released under the permissive MIT license. The collection, which is now hosted on the HuggingFace platform, represents a significant resource for developers and researchers working on natural language processing and input prediction systems.

The data gathering initiative commenced in August 2024 via the swipe.futo.org domain. Participants were invited to voluntarily visit the site on mobile devices, where they were provided with instructions and consent forms. Following agreement to contribute, users were presented with sentences, primarily sourced from Wikipedia, and asked to swipe them word-by-word.

The initial collection effort yielded in excess of one million swipes. FUTO subsequently applied filters to remove a small subset of low-quality data, resulting in the final release of one million clean gestures. The specific criteria used to determine data quality were not detailed in the announcement.

Officially released in March 2025, the dataset was utilised by FUTO to train its proprietary models and to evaluate the performance of various swipe typing systems. The release provides an open benchmark for the industry, allowing external parties to test algorithms against a standardised set of user interactions.

While the source material identifies FUTO as the entity responsible for the collection and release, it does not explicitly define the organisation’s broader corporate structure or identity. The data is available for immediate download, marking a step towards greater transparency in the development of predictive text technologies.

Continue reading

More from Tech

Read next: Go Security Lead Argues LLMs Render Confidential Vulnerability Reporting Obsolete
Read next: Dyson slashes prices on vacuums and hair tools for Prime Day
Read next: Microsoft slashes Surface Laptop price by 38 per cent for Prime Day