The upcoming release of G2CPU v1.6 brings a major step forward for LabVIEW developers working with NVIDIA CUDA GPUs. A new feature, asynchronous DMA transfers, will soon be available across Windows, Linux, and LabVIEW Real-Time, making it possible to move data between LabVIEW and the GPU at unprecedented speeds.
The Starting Point: Synchronous Transfers
To understand why this is important, it helps to first look at how things work today with synchronous transfers. In this model, operations happen one after the other: data is uploaded to the GPU, processed, and then downloaded back to LabVIEW. Each step blocks the others, so while uploading no analysis is performed, while analyzing no data moves, and while downloading the GPU sits idle. Even under ideal conditions, this means only about a third of the available performance is being used.
Figure 1: Synchronous GPU LabVIEW code
Figure 2: Timeline of synchronous operations
As data sizes increase, the time-to-result stretches further, since each stage must grow in proportion.
Figure 3: Timeline of synchronous operations with larger datasets
The Upgrade: Asynchronous Transfers
Asynchronous transfers completely change the picture. Instead of moving step by step, uploading, analyzing, and downloading all take place in parallel. The GPU can analyze one dataset while the next is already uploading and the previous one is downloading. This overlap means that performance is no longer dictated by every step in sequence but only by the slowest operation in the chain. If analysis is the longest task, transfer times essentially disappear from the equation. Larger datasets no longer translate directly into longer wait times, unless I/O itself becomes the bottleneck.
Figure 4: Asynchronous GPU LabVIEW code with CUDA DMA access
Figure 5: Timeline of asynchronous operations
As data sizes increase, the time-to-result is unaffected by upload and download operations as long as analysis is the limiting operation.
Figure 6: Timeline of asynchronous operations with larger datasets
Why DMA Takes It Further
In v1.6, asynchronous transfers are enhanced with Direct Memory Access (DMA). Rather than burdening the CPU and memory controller with every data copy, DMA gives the GPU direct access to LabVIEW memory regions through Data Value References (DVRs). This approach will feel familiar to anyone who has used NI’s high-performance drivers such as RFSA, FPGA, or TDMS. The practical effect is that CPU load is reduced and transfers can run at speeds very close to the limits of the PCIe bus.
On real systems, that means sustained transfer rates of around 6–7 GB/s on RADX Technologies PXIe GPU's, up to 24 GB/s on PCIe Gen4, and as much as 45 GB/s on PCIe Gen5. Compared to synchronous operation, the improvement is dramatic: what once consumed most of your processing time now runs invisibly in the background, allowing your GPU to remain busy with the actual work.
Figure 7: Impact of package size on DMA access performance. (PCIe x16 gen 4)
How to Get Ready
When v1.6 becomes available, enabling asynchronous DMA transfers will be straightforward. Developers will simply update to the new version, use the asynchronous API for data transfers, and wrap their data in DVRs to take advantage of DMA. Built-in benchmarking tools will make it easy to confirm that your system is running at full throughput.
Why This Matters
Asynchronous DMA transfers unlock the full potential of PCIe and modern GPUs inside LabVIEW. For developers, the benefits are clear: faster results from large datasets, reduced load on the CPU, and a workflow that fits naturally into existing LabVIEW projects.
The release of G2CPU v1.6 is right around the corner, and with asynchronous DMA transfers, LabVIEW developers will be able to push their applications to a new level of performance.










