feat: rewrite about page for 2026. (#21)
Some checks failed
Build blog docker image / Build-Blog-Image (push) Failing after 14s
Some checks failed
Build blog docker image / Build-Blog-Image (push) Failing after 14s
Signed-off-by: jackfiled <xcrenchangjun@outlook.com> Reviewed-on: #21
This commit is contained in:
100
source/posts/hpc-2025-opencl.md
Normal file
100
source/posts/hpc-2025-opencl.md
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
title: High Performance Computing 25 SP OpenCL Programming
|
||||
date: 2025-08-31T13:51:02.0181970+08:00
|
||||
tags:
|
||||
- 高性能计算
|
||||
- 学习资料
|
||||
---
|
||||
|
||||
|
||||
Open Computing Language.
|
||||
|
||||
<!--more-->
|
||||
|
||||
OpenCL is Open Computing Language.
|
||||
|
||||
- Open, royalty-free standard C-language extension.
|
||||
- For parallel programming of heterogeneous systems using GPUs, CPUs , CBE, DSP and other processors including embedded mobile devices.
|
||||
- Managed by Khronos Group.
|
||||
|
||||

|
||||
|
||||
### Anatomy of OpenCL
|
||||
|
||||
- Platform Layer APi
|
||||
- Runtime Api
|
||||
- Language Specification
|
||||
|
||||
### Compilation Model
|
||||
|
||||
OpenCL uses dynamic/runtime compilation model like OpenGL.
|
||||
|
||||
1. The code is compiled to an IR.
|
||||
2. The IR is compiled to a machine code for execution.
|
||||
|
||||
And in dynamic compilation, *step 1* is done usually once and the IR is stored. The app loads the IR and performs *step 2* during the app runtime.
|
||||
|
||||
### Execution Model
|
||||
|
||||
OpenCL program is divided into
|
||||
|
||||
- Kernel: basic unit of executable code.
|
||||
- Host: collection of compute kernels and internal functions.
|
||||
|
||||
The host program invokes a kernel over an index space called an **NDRange**.
|
||||
|
||||
NDRange is *N-Dimensional Range*, and can be a 1, 2, 3-dimensional space.
|
||||
|
||||
A single kernel instance at a point of this index space is called **work item**. Work items are further grouped into **work groups**.
|
||||
|
||||
### OpenCL Memory Model
|
||||
|
||||

|
||||
|
||||
Multiple distinct address spaces: Address can be collapsed depending on the device's memory subsystem.
|
||||
|
||||
Address space:
|
||||
|
||||
- Private: private to a work item.
|
||||
- Local: local to a work group.
|
||||
- Global: accessible by all work items in all work groups.
|
||||
- Constant: read only global memory.
|
||||
|
||||
> Comparison with CUDA:
|
||||
>
|
||||
> 
|
||||
|
||||
Memory region for host and kernel:
|
||||
|
||||

|
||||
|
||||
### Programming Model
|
||||
|
||||
#### Data Parallel Programming Model
|
||||
|
||||
1. Define N-Dimensional computation domain
|
||||
2. Work-items can be grouped together as *work group*.
|
||||
3. Execute multiple work-groups in parallel.
|
||||
|
||||
#### Task Parallel Programming Model
|
||||
|
||||
> Data parallel execution model must be implemented by all OpenCL computing devices, but task parallel programming is a choice for vendor.
|
||||
|
||||
Some computing devices such as CPUs can also execute task-parallel computing kernels.
|
||||
|
||||
- Executes as s single work item.
|
||||
- A computing kernel written in OpenCL.
|
||||
- A native function.
|
||||
|
||||
### OpenCL Framework
|
||||
|
||||

|
||||
|
||||
The basic OpenCL program structure:
|
||||
|
||||

|
||||
|
||||
**Contexts** are used to contain the manage the state of the *world*.
|
||||
|
||||
**Command-queue** coordinates execution of the kernels.
|
||||
|
||||
Reference in New Issue
Block a user