Get rid of CUDA

For those who are following the GIT tree we just merged the GetRidOfCUDA branch to master which had a big impact on the code.

Basically, as the name says, we replaced CUDA with OpenCL for NVidia. Discussions that lead to the change can be found here:

https://github.com/hashcat/oclHashcat/issues/3
https://github.com/hashcat/oclHashcat/issues/4
https://github.com/hashcat/oclHashcat/issues/20

The reasons are:

There's no JIT compiler on CUDA. We can't fully accelerate DEScrypt cracking without it, since it requires to have the salt known at compile time
Preparation for other OpenCL compatible platform support. ATM we support GPUs only, but this should enable the use of CPU and/or FPGA to run oclHashcat when finished
Preparation for restructuring the files to help integration into linux distributions. Having two binaries (oclHashcat and cudaHashcat) is confusing and creates library conflicts
Get rid of two packages for oclHashcat. Namely get rid of cudaHashcat. Both AMD and NV users will use oclHashcat64.bin or oclHashcat64.exe
Distribute the kernels as source. That should greatly reduce the selection of imperfect binary kernels, especially for low-end GPUs
No more need to have two special code bases for AMD and NV, this will reduce maintainance cost
No more dependency on CUDA SDK, should help in building. We can use the OpenCL headers from the AMD SDK, they are fully compatible, even cross-platform
No more precompilation for the developers of all kernel for all GPU types (this took around a hour for each beta)
Reduced package size. For example for NVidia dropped from 89MB to 3MB

While refactorizing I also dropped the support for SIMD code for almost the same reasons:

Make the Kernelcode more compact, therefore more portable. This may get even more important when adding other platforms
Almost no GPUs are left that require SIMD code to reach full performance, namely AMD 4xxx, 5xxx and 6xxx. While 4xxx was already dropped from catalyst, AMD said they are about to drop support for 5xxx and 6xxx as well
In case we really need it back, now with "true" vector datatype support due to OpenCL, even for NVidia, we can use vector datatypes in innerloop kernels
Preparation to enable the port of some of the rules that were only useable in CPU, for example the @ Purge rule or the M Memorize rule which then enables append/prepend memory rules

This refactorization really created some work:

Half of the kernels dropped in speed before optimizing them for OpenCL + NV. For each kernel it was neccessay to analyze the root causes of performace drops and find solutions
NVidias OpenCL runtime does not support C++ code (as AMD does by using -x c++ flag) but a lot of the shared GPU code relied on function overloading etc
The HMS code based almost completely on macro-dependant branches which had to be rewritten to true runtime branches. This also had a big impact on the Makefile and the SDK dependancies
Dropping the SIMD code

Of course such a big change has also a big impact on performance, but we were able to almost completely work around all performance drops. In return we get some huge speed boosts for some other algorithms:

https://docs.google.com/spreadsheets/d/1...li=1#gid=0

Note that these numbers (especially the red boxes) are not final. I'll continue to find solutions for them in the master branch.

Thanks to philsmd for porting the HMS (Fanspeed, Utilization, Temperature) code portion.
Thanks to dropdead, epixoip, philsmd, Rolf and Xanadrel for help with performance tuning.

--
atom

Get rid of CUDA

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112