Tech

Norway’s National Library builds sovereign LLM with Huawei storage infrastructure

Head of IT Platform Marius Husnes outlines the technical challenges of moving petabyte-scale datasets from preservation archives to high-throughput AI pipelines at Huawei’s ID Forum 2026.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Hacker News · original

Artificial Intelligence Media Research

Related coverage

Explore Artificial Intelligence coverage Explore Media coverage Explore Research coverage More from the Tech desk

Tech

No image available

Ministry of Culture project leverages 20 petabytes of cultural heritage data to address gaps in local language AI capabilities

Norway’s National Library is developing a sovereign large language model designed to understand the Norwegian language, utilising two petabytes of Huawei OceanStor Dorado flash storage within its AI training data pipeline. Marius Husnes, the library’s Head of IT Platform, presented the initiative at Huawei’s ID Forum 2026 in Paris, highlighting that no commercial provider is currently building a local language model. Husnes argued that nations lacking sovereign AI trained in their native tongue face a disadvantage, as globally trained English-speaking models often fail to capture specific national histories, news, and cultural nuances.

The Norwegian Ministry of Culture tasked the National Library with the project due to its legal deposit mandate, which requires the preservation of all published books, newspapers, web pages, and broadcast content. This mandate provides the library with access to 20 petabytes of unique data, stored in a 3-2-1 preservation format that totals 60 petabytes overall. An agreement with Norwegian newspapers allows for the use of copyrighted content for training, a resource Husnes noted is unavailable to private companies. The library has been digitising its collection since 2005, generating extensive metadata and OCR-scanned text from raw materials.

The technical architecture involves a multi-stage processing workflow. In-house computation is handled by an Nvidia DGX H200 system and a 384-core CPU cluster, supported by Huawei all-flash arrays for low-latency data preparation. This stage covers data ingestion, cleaning, deduplication, and format normalization. Once processed, the data is transferred to the Sigma2 Olivia supercomputer, an HPE Cray system equipped with 448 GPUs and 64,512 CPU cores, for the actual training runs.

Husnes identified data quality, cleaning, and pipeline throughput as the primary bottlenecks, rather than compute power. The team faced significant challenges in bridging the gap between the high-latency preservation archive, optimised for durability and cost, and the high-throughput AI pipeline designed for parallel data input/output. Husnes noted that there was a lack of industry guidance on moving petabyte-scale datasets from archive storage to AI pipelines, forcing the library’s team to develop their own solutions.

The project underscores the growing role of Huawei storage in the European market, according to Husnes. He described the initiative as a solution to a problem every non-English-speaking nation will encounter: building artificial intelligence that reflects local language, culture, and history. The training is currently ongoing, with the library positioning itself as a custodian of digital heritage rather than merely a builder of technology.

Norway’s National Library builds sovereign LLM with Huawei storage infrastructure

More from Tech

Apple to roll out manual EQ controls for AirPods in iOS 27 update

Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset

Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026