Now all the data (including the output one) is sorted before compute the interactions
Hence the permutations vectors are not needed anymore in the interactions stage
Some useless output data have been removed
The registers pressure have been optimized
Let’s say goodbye to more than 10% of the simulation time!
For the moment the optimized version can be found in the optimization branch of the git repository.
P.S. I performed this work with CodeXL by AMD, basically due to NVidia suddenly decided to remove the OpenCL support in the profiler in CUDA 5.
NVidia is expending a lot of resources trying to destroy OpenCL, while AMD is beating them in the hardware side (the real world).
I just hope NVidia people change it’s direction before they reach the final which are deserving.
I was waiting like a dumb for the CUSL development stage start in order to upload the initial release, just because the development stage starts when the project is officially accepted (there are not a kick off).