Tech

Nvidia unveils cuda-oxide, an experimental compiler bridging Rust and CUDA

In an early-stage alpha release, the company introduces a custom codegen backend that translates standard Rust code directly into CUDA PTX, prioritising safety while acknowledging the complexities of parallel computing.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
The tech giant's new tool aims to let developers write GPU kernels using Rust's native type system and ownership model without foreign bindings.

Nvidia has launched cuda-oxide, an experimental compiler designed to translate Rust code directly into CUDA PTX without the need for domain-specific languages or foreign language bindings. The initial v0.1.0 release is an early-stage alpha, enabling developers to write GPU kernels using Rust's native type system, ownership model, and async programming features.

The tool functions as a custom rustc codegen backend that generates PTX from pure Rust, eliminating the historical requirement for unsafe blocks or external interfaces like cc or bindgen to interact with GPU hardware. By leveraging this approach, the project allows for composing GPU work as lazy DeviceOperation graphs and scheduling across stream pools, all while maintaining idiomatic Rust syntax.

A key feature of the alpha release is support for async programming, which permits the use of .await with runtimes like tokio within GPU kernels. The #[cuda_module] attribute embeds generated device artifacts directly into the host binary and produces typed load functions and launch methods for each kernel. Lower-level APIs such as load_kernel_module and cuda_launch! remain available for developers requiring custom sidecar artifacts or bespoke launch code.

Safety is established as a first-class goal within the project, utilising Rust's ownership model to mitigate common memory errors. However, the documentation explicitly acknowledges that GPUs possess inherent subtleties, requiring careful adherence to the project's specific safety model to ensure stability in parallel execution environments.

The v0.1.0 release carries the standard warnings associated with early-stage software, meaning users should expect bugs, incomplete features, and potential API breakage as the tool evolves. Feature parity with mature CUDA C++ toolchains has not yet been defined beyond this alpha stage, and the documentation advises that future updates may introduce changes to the interface.

This development marks a significant step in the integration of systems programming languages with Nvidia's parallel computing platform. By removing the friction of DSLs and FFI bindings, Nvidia aims to broaden the accessibility of high-performance GPU computing for the Rust community while continuing to refine the underlying technology.

Continue reading

More from Tech

Read next: Apple to roll out manual EQ controls for AirPods in iOS 27 update
Read next: Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset
Read next: Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026