Norway’s National Library builds sovereign LLM with Huawei storage infrastructure
Head of IT Platform Marius Husnes outlines the technical challenges of moving petabyte-scale datasets from preservation archives to high-throughput AI pipelines at Huawei’s ID Forum 2026.
Norway’s National Library is developing a sovereign large language model designed to understand the Norwegian language, utilising two petabytes of Huawei OceanStor Dorado flash storage within its AI training data pipeline. Marius Husnes, the library’s Head of IT Platform, presented the initiative at Huawei’s ID Forum 2026 in Paris, highlighting that no commercial provider is currently building a local language model. Husnes argued that nations lacking sovereign AI trained in their native tongue face a disadvantage, as globally trained English-speaking models often fail to capture specific national histories, news, and cultural nuances.
The Norwegian Ministry of Culture tasked the National Library with the project due to its legal deposit mandate, which requires the preservation of all published books, newspapers, web pages, and broadcast content. This mandate provides the library with access to 20 petabytes of unique data, stored in a 3-2-1 preservation format that totals 60 petabytes overall. An agreement with Norwegian newspapers allows for the use of copyrighted content for training, a resource Husnes noted is unavailable to private companies. The library has been digitising its collection since 2005, generating extensive metadata and OCR-scanned text from raw materials.
The technical architecture involves a multi-stage processing workflow. In-house computation is handled by an Nvidia DGX H200 system and a 384-core CPU cluster, supported by Huawei all-flash arrays for low-latency data preparation. This stage covers data ingestion, cleaning, deduplication, and format normalization. Once processed, the data is transferred to the Sigma2 Olivia supercomputer, an HPE Cray system equipped with 448 GPUs and 64,512 CPU cores, for the actual training runs.
Husnes identified data quality, cleaning, and pipeline throughput as the primary bottlenecks, rather than compute power. The team faced significant challenges in bridging the gap between the high-latency preservation archive, optimised for durability and cost, and the high-throughput AI pipeline designed for parallel data input/output. Husnes noted that there was a lack of industry guidance on moving petabyte-scale datasets from archive storage to AI pipelines, forcing the library’s team to develop their own solutions.
The project underscores the growing role of Huawei storage in the European market, according to Husnes. He described the initiative as a solution to a problem every non-English-speaking nation will encounter: building artificial intelligence that reflects local language, culture, and history. The training is currently ongoing, with the library positioning itself as a custodian of digital heritage rather than merely a builder of technology.


