	Known issues with using this release.

Under OSX using native "Heimdal" Kerberos5 library, krb5-23 often fails.
This seems to be a bug in the library and not in JtR.

Not working on big-endian CPU architectures (these formats fail
self-test on big-endian CPUs):
* OpenVMS
* SSH
* krb5-18
* krb5-23
* Mozilla
(x86 and x86-64 are little-endian, so they are not affected.)

Some Sparc problems may be compiler bugs and can be worked around using
"-fno-builtin-memcpy -fno-builtin-memset". Some Sparc compilers hang
when compiling unrarppm.c - this can be worked around using -O0 (perhaps
-O1?) when compiling that file. That was seen with gcc 3.4.6.


Some OpenCL formats fail at runtime on Mac OS X (whereas CUDA ones work
fine).  This is clearly caused by driver bugs and it affects nvidia and
AMD devices as well as eg. the Intel HD4000 - but not the CPU device.
The same formats usually work fine on Linux using the same hardware.

OS X also has a problem with run-time compile of kernels that include
header files. A workaround is to cd to the src directory and run each
OpenCL format once. After that, the kernel binary is cached so you can
move away from the src directory. This bug seems to be gone in OS X
Mavericks (10.9).

Intel's OCL SDK 1.5 CPU driver has problems with bfcrypt, mscash2 and
rar. Use a newer version of the SDK.

Some OpenCL-enabled formats (for "slow" hashes and non-hashes) may
sometimes trigger "ASIC hang" errors as reported by AMD/ATI GPU drivers,
requiring system reboot to re-gain access to the GPU.  For example, on
HD 7970 this problem is known to occur with sha512crypt-opencl, but is
known not to occur with mscash2-opencl.  Our current understanding is
that this has to do with OpenCL kernel running time and watchdog timers.
We're working on reducing kernel run times to avoid such occurrences in
the future. Recent drivers don't seem to have this issue.

All CUDA formats substantially benefit from compile-time tuning.
README-CUDA includes some info on this.  In short, on GTX 400 series and
newer NVIDIA cards, you'll likely want to change "-arch sm_10" to "-arch
sm_20" or greater (as appropriate for your GPU) on the NVCC_FLAGS line
in Makefile.  You'll also want to tune BLOCKS and THREADS for the
specific format you're interested in.  These are typically specified in
cuda_*.h files.  README-CUDA includes a handful of pre-tuned settings.
It is not unusual to obtain e.g. a 3x speedup (compared to the generic
defaults) with this sort of tuning.

Some OpenCL formats benefit from run-time tuning, too.  Some OpenCL formats
may benefit from tuning of "keys per crypt", although higher values, while
generally increasing the c/s rate, may create usability issues (more work
lost on interrupted/restored sessions, less optimal order of candidate
passwords being tested). All formats allow run-time tuning using the
environment variables LWS (local work size) and GWS (global work size,
this corresponds to keys per crypt but there may be a factor involved).
Most formats will also allow specifying such values per-format in john.conf
under the [OpenCL] section.

Even though wpapsk-cuda primarily use the GPU, it also does a (small,
but not negligible) portion of the computation on CPU and thus it
substantially benefits from OpenMP-enabled builds.  We intend to reduce
its use of CPU in a future version. The OpenCL version does all
computation on GPU.

Interrupting a cracking session that uses an ATI/AMD GPU with Ctrl-C
often results in:
	../../../thread/semaphore.cpp:87: sem_wait() failed
	Aborted
When this happens, the john.pot and .log files are not updated with
latest cracked passwords.  To mitigate this, reduce the Save setting in
john.conf from the default of 600 seconds to a lower value (e.g., 60).
Recent drivers do not have this problem.

With GPU-enabled formats (and sometimes with OpenMP on CPU as well), the
number of candidate passwords being tested concurrently can be very
large (thousands).  When the format is of a "slow" type (such as an
iterated hash) and the number of different salts is large, interrupting
and restoring a session may result in a lot of work being re-done (many
minutes or even hours).  It is easy to see if a given session is going
to be affected by this or not: watch the range of candidate passwords
being tested as included in the status line printed on a keypress.  If
this range does not change for a long while, the session is going to be
affected since interrupting and restoring it will retry the entire
range, for all salts, including for salts that already had the range
tested against them.

"Single crack" mode is relatively inefficient with GPU-enabled formats
(and sometimes with OpenMP on CPU as well), because it might not be able
to produce enough candidate passwords per target salt to fully utilize a
GPU, as well as because its ordering of candidate passwords from most
likely to least likely is lost when the format is only able to test a
large number of passwords concurrently (before proceeding to doing the
same for another salt).  You may reasonably start with quick "single
crack" mode runs on CPU (possibly without much use of OpenMP) and only
after that proceed to using GPU-enabled formats (or with heavier use of
OpenMP, beyond a few CPU cores), locking those runs to specific cracking
modes other than "single crack". This limitation does not affect MPI.

Some formats lack proper binary_hash() functions, resulting in duplicate
hashes (if any) not being eliminated at loading and sometimes also in
slower cracking (when the number of hashes per salt is large).  When
this happens, the following message is printed:
	Warning: excessive partial hash collisions detected
	(cause: the "format" lacks proper binary_hash() function definitions)
Known to be affected are: dominosec.
Also theoretically present, but less likely to be triggered in practice,
are similar issues in non-hash formats.
