--- title: High Performance Computing 25 SP OpenCL Programming date: 2025-08-31T13:51:02.0181970+08:00 tags: - 高性能计算 - 学习资料 --- Open Computing Language. OpenCL is Open Computing Language. - Open, royalty-free standard C-language extension. - For parallel programming of heterogeneous systems using GPUs, CPUs , CBE, DSP and other processors including embedded mobile devices. - Managed by Khronos Group. ![image-20250529185915068](./hpc-2025-opencl/image-20250529185915068.webp) ### Anatomy of OpenCL - Platform Layer APi - Runtime Api - Language Specification ### Compilation Model OpenCL uses dynamic/runtime compilation model like OpenGL. 1. The code is compiled to an IR. 2. The IR is compiled to a machine code for execution. And in dynamic compilation, *step 1* is done usually once and the IR is stored. The app loads the IR and performs *step 2* during the app runtime. ### Execution Model OpenCL program is divided into - Kernel: basic unit of executable code. - Host: collection of compute kernels and internal functions. The host program invokes a kernel over an index space called an **NDRange**. NDRange is *N-Dimensional Range*, and can be a 1, 2, 3-dimensional space. A single kernel instance at a point of this index space is called **work item**. Work items are further grouped into **work groups**. ### OpenCL Memory Model ![image-20250529191215424](./hpc-2025-opencl/image-20250529191215424.webp) Multiple distinct address spaces: Address can be collapsed depending on the device's memory subsystem. Address space: - Private: private to a work item. - Local: local to a work group. - Global: accessible by all work items in all work groups. - Constant: read only global memory. > Comparison with CUDA: > > ![image-20250529191414250](./hpc-2025-opencl/image-20250529191414250.webp) Memory region for host and kernel: ![image-20250529191512490](./hpc-2025-opencl/image-20250529191512490.webp) ### Programming Model #### Data Parallel Programming Model 1. Define N-Dimensional computation domain 2. Work-items can be grouped together as *work group*. 3. Execute multiple work-groups in parallel. #### Task Parallel Programming Model > Data parallel execution model must be implemented by all OpenCL computing devices, but task parallel programming is a choice for vendor. Some computing devices such as CPUs can also execute task-parallel computing kernels. - Executes as s single work item. - A computing kernel written in OpenCL. - A native function. ### OpenCL Framework ![image-20250529192022613](./hpc-2025-opencl/image-20250529192022613.webp) The basic OpenCL program structure: ![image-20250529192056388](./hpc-2025-opencl/image-20250529192056388.webp) **Contexts** are used to contain the manage the state of the *world*. **Command-queue** coordinates execution of the kernels.