Currently digitizers have a bottleneck caused by having to use either the host PC's central processor with 8 or 16 cores or a FPGA that is complex to program.  Spectrum Instrumentation has solved this problem with its new SCAPP software option - the Spectrum CUDA Access for Parallel Processing - that opens an easy-to-use yet extremely powerful way to digitize, process and analyze electronic signals.  SCAPP allows a CUDA-based Graphical Processing Unit (GPU) to be used directly between any Spectrum digitizer and the PC.  The big advantage is that data is passed directly from the digitizer to the GPU where high-speed parallel processing is possible using the GPU board's multiple (up to 5000) processing cores.  That provides a significant performance enhancement when compared to sending data directly to a PC that may have only 8 or 16 cores.  It becomes even more important when signals are being digitized at high-speeds such as 50 MS/s, 500 MS/s or even 5 GS/s.

Spectrum's SCAPP
The Spectrum approach uses a standard off-the-shelf GPU, based on Nvidia's CUDA Standard.  The GPU connects directly with the Spectrum digitizer card, with no more CPU interaction, opening the huge parallel core architecture of the CUDA card for signal processing.  The structure of a CUDA graphics card fits very well as it is designed for parallel data processing, which is exactly the same as most signal processing jobs.  For example, the processing tasks of data conversion, filtering, averaging, baseline suppression, FFT window functions or even FFTs themselves can all be easily parallelized.

Signal processing approaches
Until today, there have basically been two different approaches for processing data for high speed digitizers. The first and most common method simply uses the CPU for calculations.  This approach offers a straightforward way to create processing programs using a variety of different programming languages and nearly no extra cost.  However, the performance is often limited by the CPU's resources as it must share its processing power with the rest of the PC system, the operating system and the GUI components.

The second approach is to use Field Programmable Gate Array (FPGA) technology, either with fixed processing packages from the vendor (like the Block Average package from Spectrum) or using an open FPGA with a Firmware Development Kit (FDK).  It's a really powerful solution but it comes with a much higher cost and complexity.  Large FPGAs are expensive and to use them requires an FDK from the digitizer vendor along with other implementation tools from the FPGA vendor.  Also, the level of knowledge to implement signal processing into an FPGA using VHDL isn't a skill everybody has.  This soon results in very long development cycles.  Even worse, it is easy to run into the limits of the FPGA that is soldered onto the card.  For example, if the block RAM is at the limit, there is nothing to improve anymore.

TCO - Total Cost of Ownership
Comparing the SCAPP approach to any FPGA-based solution the TCO is very low: a matching CUDA graphics card ranges from around 150€ to 3000€ and the necessary software development kits (SDKs) are free of charge.  However, the largest cost saver is the development time.  Instead of spending weeks to just understand the FDK, the structure of the FPGA firmware, the FPGA design suite and the Simulation tools, the user can start immediately with some easy-to-understand C-Code and common design tools.

Product Details
The SCAPP driver package consists of the driver extension for Remote Direct Memory Access (RDMA) that allows the direct data transfer from Digitizer to GPU.  It includes a set of examples for interaction with the digitizer and the CUDA-card and another set of CUDA parallel processing examples with easy building blocks for basic functions like filtering, averaging, data de-multiplexing, data conversion or FFT.  All the software is based on C/C++ and can easily be implemented and improved with normal programming skills.  Starting with tested and optimized parallel processing examples gives first results within minutes.

Performance
The interconnection between digitizer and GPU is based on PCI Express.  Depending on the selected Spectrum digitizer card, a continuous throughput of more than 3.0 GByte/s between the digitizer and GPU can be achieved.  That is enough to support continuous acquisition from a 1 channel 8-bit digitizer sampling at 2.5 GS/s or a 2 channel 14-bit unit running at 500 MS/s.  By using one of Spectrum's transfer-bandwidth saving data acquisition modes, like Multiple Recording, the sampling speeds can be even much higher.

CUDA cards are scalable with processing cores between 256 and 5000 (in comparison a dual Quad-Core Xeon CPU with Hyperthreading will only give 16 cores), with memory of several GByte and up to 12.0 TFLOP (1012 -Trillion Floating Point Operations per second).  A small sized card with 1k cores and 3.0 TFLOP is already capable of doing continuous data conversion, multiplexing, windowing, FFT and averaging at 2 channels 500 MS/s with a FFT block size of 512k - and that can run for hours.  In contrast, an FFT package from other digitizer vendors will typically limit the FFT block size to a maximum of 4k or 8k as this is the limitation of the FPGA.

Supported Spectrum Products
The SCAPP package is a driver extension for all Spectrum cards.  It can be used with the ultra-fast digitizers of the M4i platform (250 MS/s 16 bit, 500 MS/s 14 bit or 5 GS/s 8 bit) as well as the latest medium performance M2p platform (20 to 80 MS/s multi-channel 16 bit).  The basic RDMA functionality is available under a Linux operating system.

Press pack at  https://spectrum-instrumentation.com/sites/default/files/download/20171120_scapp_signalprocessingwithcuda.zip 
Video at https://www.youtube.com/watch?v=HK5eZb65nlY

For more information, please visit www.spectrum-instrumentation.com.