QR Decomposition on NVIDIA GPU Using cuSOLVER Libraries
This example shows how to create a standalone CUDA® executable that leverages the CUDA Solver library (cuSOLVER). The example uses a curve fitting application that mimics automatic lane tracking on a road to illustrate:
Fitting an arbitrary-order polynomial to noisy data by using matrix QR factorization.
Using the
coder.LAPACKCallback
class to provide the LAPACK library information for the code generator when generating standalone executables.
Prerequisites
CUDA enabled NVIDIA® GPU.
NVIDIA CUDA toolkit and driver.
LAPACK library that is optimized for your execution environment. For more information, see LAPACK vendors implementations. This example uses the
mwlapack
libraries that MATLAB® provides in matlabroot/extern/lib.Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-Party Hardware. For setting up the environment variables, see Setting Up the Prerequisite Products.
Verify GPU Environment
To verify that the compilers and libraries necessary for running this example are set up correctly, use the coder.checkGpuInstall
function.
envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);
Solve a Linear System by Using Matrix Factorization
In curve fitting applications, the objective is to estimate the coefficients of a low-order polynomial. The polynomial is then used as a model for observed noisy data, which in this example represents the lane boundary of the road ahead of a vehicle. For example, when using a quadratic polynomial, there are three coefficients (, , and ) to estimate:
The polynomial that fits best is defined as the one that minimizes the sum of the squared errors between itself and the noisy data. To solve this least-squares problem, you get and solve an overdetermined linear system. An explicit matrix inverse is not required to solve the system.
In this example, the unknowns are the coefficients of each term in the polynomial. Because the polynomial you use as a model always starts from the current position on the road, the constant term in the polynomial is assumed to be zero. Estimate the coefficients for the linear and higher-order terms. Set up a matrix equation Ax=y such that:
contains the sensor outputs.
contains the polynomial coefficients that we need to obtain.
is a constant matrix related to the order of the polynomial and the locations of the sensors.
Solve the equation using the QR factorization of :
and
where pinv() represents pseudo-inverse. Given the matrix , you can use the following code to implement a solution of this matrix equation. Factoring allows for an easier solution of the system.
[Q,R,P] = qr(A); z = Q' * y; x = R \ z; yhat = A * x;
Use the linsolveQR function to solve the equation using the QR factorization of .
type linsolveQR.m
function [yhat,x] = linsolveQR(A,y) %#codegen % Copyright 2019 The bat365, Inc. [Q,R,P] = qr(A); z = Q' * y; x = R \ z; yhat = A * x; end
Signal Model for the Road
To test the algorithm, a continuously curving road model is used, that is, a sinusoid that is contaminated with additive noise. By varying the frequency of the sinusoid in the model, you can stress the algorithm by different amounts. This code simulates noisy sensor outputs using our road model:
% Duration - Distance that we look ahead % N - Total number of sensors providing estimates of road boundary % Ts - Sample interval % FracPeriod - Fraction of period of sinusoid to match % y - Contains the simulated sensor outputs Duration = 2; N = 25; Ts = Duration / N; FracPeriod = 0.5; y = sin(2*pi* (0:N-1)' * (FracPeriod/N)) + sqrt(0.3) * randn(N,1);
Use this code to form the Vandermonde matrix :
Npoly = 3; % Order of polynomial to use in fitting v = (0:Ts:((N-1)*Ts))'; A = zeros(length(v), Npoly); for i = Npoly : -1 : 1 A(:,i) = v.^i; end
The Vandermonde matrix and sensor outputs matrix are passed as input parameters to the linsolveQR
entry-point function. These inputs are written to comma-separated files and are read from the handwritten main qrmain.cu.
writematrix(reshape(A, 1, 75), 'inputA.csv'); writematrix(reshape(y, 1, 25), 'inputY.csv');
Custom Callback Class for Standalone Code Generation
The qr
function is only partially supported in the cuSOLVER
library. In such cases, GPU Coder™ uses the LAPACK
library for certain linear algebra function calls. LAPACK
is an external software library for numeric linear algebra. For MEX targets, the code generator uses the LAPACK
library included in MATLAB.
For standalone targets, you must define a custom coder.LAPACKCallback
class that specifies the LAPACK libraries along with the header files to use for linear algebra calls in the generated code. In this example, the lapackCallback callback class specifies the paths to these libraries in updateBuildInfo
method. You must modify this file with the library names and paths for the custom LAPACK installation on your computer.
type lapackCallback.m
classdef lapackCallback < coder.LAPACKCallback % % Copyright 2019 The bat365, Inc. methods (Static) function hn = getHeaderFilename() hn = 'lapacke.h'; end function updateBuildInfo(buildInfo, buildctx) [~, libExt] = buildctx.getStdLibInfo(); % Specify path to LAPACK library if ispc lapackLocation = [matlabroot,'\extern']; libName = ['libmwlapack' libExt]; buildInfo.addIncludePaths([lapackLocation,'\include']); libPath = [lapackLocation,'\lib\win64\microsoft\']; else lapackLocation = [matlabroot]; libName = ['libmwlapack' libExt]; buildInfo.addIncludePaths([lapackLocation,'/extern/include']); libPath = [lapackLocation,'/bin/glnxa64']; end % Add include path and LAPACK library for linking buildInfo.addLinkObjects(libName, libPath, 1000, true, true); buildInfo.addDefines('HAVE_LAPACK_CONFIG_H'); buildInfo.addDefines('LAPACK_COMPLEX_STRUCTURE'); end end end
Standalone Code Generation
Generate a standalone executable by the specifying CustomLAPACKCallback
property in the code configuration object and using a handwritten main qrmain.cu
.
cfg = coder.gpuConfig('exe'); cfg.GpuConfig.EnableCUSOLVER = 1; cfg.CustomLAPACKCallback = 'lapackCallback'; cfg.CustomSource = 'qrmain.cu'; cfg.CustomInclude = '.'; codegen -config cfg -args {A,y} linsolveQR -report
Code generation successful: View report
Standalone Code Execution
When you execute the generated standalone executable, the outputs and are computed and written to comma-separated files. Read these outputs back in MATLAB and use the plot
function to visualize the sensor data and fitted curve.
if ispc system('linsolveQR.exe'); else system('./linsolveQR'); end yhat = reshape(readmatrix('outputYhat.csv'), 25, 1); x = reshape(readmatrix('outputX.csv'), 3, 1); figure plot(v, y, 'k.', v, yhat, 'r') axis([0 N*Ts -3 3]); grid; xlabel('Distance Ahead of the Vehicle'); legend('Sensor data','Curve fit'); title('Estimate the Lane Boundary Ahead of the Vehicle');
See Also
Functions
codegen
|coder.gpu.kernel
|coder.gpu.kernelfun
|gpucoder.matrixMatrixKernel
|coder.gpu.constantMemory
|gpucoder.stencilKernel
|coder.checkGpuInstall