bfsha1 0.1.2a
This is currently the fastest single hash SHA1 brute forcer (on a GTS 450 the next fastest is
Hashcat-lite v0.10 mine is about 15.6% faster).
I learned something from Atom back in September 28-29, 2012, using one constant is better than multiple constants.
I didn't do a release because I was too lazy anyway here's the faster version, cleaner code, and oh did I mention source code?
bfsha1 {--benchmark|hash} [gpu-device-num]
So the four ways to use this are:
- bfsha1 --benchmark
- bfsha1 --benchmark 0
- bfsha1 ffffffffffffffffffffffffffffffffffffffff
- bfsha1 ffffffffffffffffffffffffffffffffffffffff 0
You only get stats when it finishes:
- Cracks the password
- Finishes brute forcing "[ -~]{8}" (95^8 = 6,634,204,312,890,625 ~ 2^52.56)
- If you use --benchmark (60 * 95 * 95 * blocks * threadsPerBlock passwords)
**** THIS ONLY WORKS WITH NEWER CARDS (Compute Capability 2.x) ****
This may use too high of a block or threads/block to run.
All I know is that it works on a GTS 450 and runs at about 244 M/s (after a fresh restart) on Windows 7, 64 bit with driver version 306.23 and this was compiled with CUDA 4.1.
I did a few things that made it faster but made no sense such as:
d_foundPw[0] = pw0;
d_foundPw[1] = pw1;
d_foundPw[2] = pw2;
vs
d_foundPw[0] = 1;
Apparently writing 12 bytes is faster than writing 4 bytes to global memory.
#define ROL(x,s) (((x) << (s)) + ((x) >> (32 - (s))))
vs
#define ROL(x,s) ((x) << (s)) + ((x) >> (32 - (s)))
I checked to see if I needed parentheses around this and took them out because I though it might be faster but it's slower.
I don't know if things like these are the same across all cards or is just for GTS 450s.
So this may very well be slower than Hashcat-lite on other cards.