qthreads

Intelligent GPU, Multi-Core CPU, and FPGA Accelerators

Single Thread to Multithreaded Executable 

x86, ARM, RISCV, Apple M1 CPU

nVidia and AMD GPU

AMD FPGA

All Automatically from Same Source Code

The Past

THERE WAS NO HETEROGENEOUS COMPUTE DEVELOPMENT PLATFORM

Compute is dominated by x86 and ARM. The focus for decades has been single core performance enhanced with multiple cores that mostly operate independently. Processor limitations are being augmented with GPUs, FPGAs and custom ML accelerators. However, SW developers do not have the platform to effectively utilize multiple cores and implement heterogeneous designs across processors plus accelerators.

Today

SW DEVELOPERS EASILY DEVELOP & DEPLOY APPLICATIONS

The CacheQ heterogeneous development platform enables SW developers to easily develop, deploy and orchestrate applications across multiple cores and heterogeneous distributed compute architectures delivering significant increases in application performance and at the same time dramatically reducing development time.

Multicore Acceleration

Proven Results

Multithreading is all about intelligence based code and data analysis

Generating multithreaded code from unmodified single thread source requires exhaustive analysis of loops, understanding data access patterns and knowledge of the target architecture. Unlike alternative approaches qthreads will automatically generate multithreaded code without the additions of pragmas and user directives. qthreads takes single threaded code and generates multithreaded C++ code that works with any C++ compiler. The number of threads is runtime specified depending on hardware availability and optimized execution. Profiling is an integral part of the process to ensure loops that contribute little execution time are not threaded and loops that contribute the majority of execution time are threaded. The reason loops are not multithreaded is clearly reported. If profiling indicates a high execution loop is not threaded educated steps can be taken by the developer to alleviate issues that preclude multithreading. The same unmodified source can be used to target x86, ARM, RISCV, embedded, monolithic heterogenous SOCs and servers. Lastly, executing code unthreaded or any number of multiple threads will produce identical results. Not the case with other technologies. Intelligence based multithreading produces superior results with dramatically less resource investment and time. It is time to adopt a new heterogeneous SW development platform.

FPGA Acceleration

Proven Results

Up to 100x plus performance gain

Greater than 15x reduction in development time

Performance improvements using FPGAs is all about accelerating loops.

If N is the number of iterations of a loop and C is the number of cycles the execution time is roughly (N*C)/(clock rate). On a fully pipelined FPGA implementation the execution time is (N+C)/(clock rate). QCC automatically pipelines loops. The fully pipelined loop is integrated with an application specific many port cached memory architecture ensuring high bandwidth data movement. In addition, with a command line option loops can be unrolled to deliver significantly higher performance bound by hardware resource constraints.

Developing solutions for FPGAs has been the domain of HW developers. Software applications must be extensively modified to achieve the desired performance goals. In most cases this is a nine to twelve month process. CacheQ delivers a complete development and deployment platform for SW developers.  Partitioning between host and accelerators is handled by the platform. Performance simulation, profiling, resource estimates and memory configuration are supported prior to implementation. All the capabilities required to support many ported dynamically allocated memory are available to the developer. A complete development platform requiring limited code modifications dramatically shortens development time and improves system quality.

The Cacheq Development Flow

The QCC development platform accepts HLL (C source or object) as input and through a number of steps generates an optimized multithreaded executable or a partitioned accelerated executable. The output is a multithreaded x86, ARM or RISCV executable or an x86/ARM executable and an FPGA bitstream.

The Technology

QCC is a new platform developed to shorten development time and accelerate code across heterogeneous compute platforms. Even though diverse hardware is supported the development platform should feel familiar to SW developers. Multicore execution requires very specific processor knowledge and the tools today offer little to no guidance. qthreads analyzes code and automatically produces multithreaded code. Guidance is provided when critical sections of code cannot be multithreaded. For FPGA acceleration companies and pundits suggest HLS was developed to fill the SW development void for FPGAs.  In reality, HLS is a higher level of abstraction for HW engineers. It is not a tool for software developers. QCC was specifically developed to streamline acceleration development across heterogeneous hardware in a fashion that meets the needs and knowledge of SW developers.

Request a Demo

Request a demo to learn more about our acceleration and distributed computing solution.

Contact Us

First

Last