Before measuring the practical impact, we ran some theoretical tests to see if PCI Express 3.0 fulfills its promise in terms of speeds. Here we used a performance test that is included in AMD’s APP development kit available in version 2.6
This first test, you may remember, is fairly particular in the sense that it attempts to achieve the highest possible transfer speeds using what is known as non-paged pool memory. In effect, on the system side, the tool reserves memory pages so that they can’t be moved. In practice this means that we can be 100% certain, throughout the length of the execution of the programme, that the memory pages will be physically situated in the RAM and never in a swap file.
While this may seem unimportant in theory on a test machine equipped with 16 GB of RAM, in practice this isn’t the case. Of course, the data transferred will remain in the physical memory, but the possibility that it might not be calls for additional burden with respect to memory copying operations. For non-paged pool memory here, AMD uses algorithms optimised to make the most of PCI Express (just like NVIDIA in CUDA).
For this test and the following tests, we measured six distinct cases:
- PCI Express 3.0 x16, x8 and x4
- PCI Express 2.0 x16 and x8
- PCI Express 1.0 x16
From a theoretical point of view, some modes have an equivalent bandwidth:
- PCI Express 3.0 x16
- PCI Express 3.0 x8 and PCI Express 2.0 x16
- PCI Express 3.0 x4, PCI Express 2.0 x8 and PCI Express 1.0 x16
For games and applications that allow it, we also give the results in each of these cases in CrossFire mode. We independently measured the transfer rate from the CPU to the GPU (typical case in games), as well as in the opposite direction (also used in OpenCL).