First of all, we're now using ImageMagick to improve the screenshot so
that Tesseract has an esier time to recognize the text. The resulting
image of this post-processing is a scaled up black-and-white version
with the backgrounds almost entirely removed and the text edges a bit
blurred, so the screen shots now more or less resemble an image from a
scanner rather. This is what Tesseract is trained for by default.
As mentioned in the previous commit we now also use Tesseract 4, which
further improves the quality of text recognition.
I've spent countless hours just to test different postprocessing
variants and testing what works best for our tests and this is the one
that worked best so far. It's certainly not perfect and I'd like to
avoid the scaling step but we're way better off than before.
In addition to this, the OCR process is now done without an intermediate
file, solely using pipes.
I've tested this using the following VM tests which have OCR enabled:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
All of the tests still succeed and comparing some of the recognition
results to the earlier results it now also detects a lot more text than
before this commit.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
I've removed that attribute in 68bc260ca2,
because the language files no longer were distributed as seperate files,
but if we for example only want to use the English training data, the
closure size of Tesseract gets quite large (around 1.2 GB), which is a
bit much just to be able to run NixOS VM tests.
For this reason I've also switched the VM tests back to using only the
English language.
Tested using the following VM tests (the ones that have OCR enabled) on
x86_64-linux:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Previously we were using two or three (qemu_kvm, qemu_test, and
qemu_test with a different dbus when minimal.nix is included).
(cherry picked from commit 8bfa4ce82ea7d23a1d4c6073bcc044e6bf9c4dbe)
This option is defined in qemu-vm.nix, but that module is not always
imported.
http://hydra.nixos.org/build/44817443
(cherry picked from commit 03c55005dfd6fbcd5cf8e00128a3bb6336b3bc0f)
From the upstream changelog:
* Tesseract development is now done with Git and hosted at github.com
(Previously we used Subversion as a VCS and code.google.com for
hosting).
So let's move over to the GitHub repository, where the organisation also
includes a full repository for tessdata, so we no longer need to fetch
it one-by-one.
The build also got significantly simpler, because we no longer need to
run autoconf, neither do we need to patch the configure script for
Leptonica headers.
This also has the advantage that we don't need to use the
enableLanguages attribute for the test runner anymore.
Full upstream changelog can be found at:
https://github.com/tesseract-ocr/tesseract/blob/c4d273d33cc36e/ChangeLog
Tested against all NixOS tests with enabled OCR (chromium, emacs-daemon,
installer.luksroot and lightdm).
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @viric
Regression introduced by 4dcb685af9.
Unsetting the environment variable shortly before using it is not going
to end up very well, so let's just filter out the variable from the
output of export and unset it shortly afterwards.
This fixes the runInMachine NixOS test.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The tests need to expand passed variable and very carefully.
I could see no other easy way than to change single-quoting in
makeWrapper to double-quoting.
The tests now fail with the same problem as on master...
Regression introduced by 032f0ffdd0.
The change doesn't look obvious at the first sight why it may cause
problems with lib.makePerlPath, but it introduces a Perl package called
"lib".
And using "with perlPackages;" uses the Perl library "lib" instead of
the lib attribute set from pkgs.
So let's use pkgs.lib.makePerlPath directly in hope that there won't be
a Perl package anytime soon which is called "pkgs".
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Only include the English language for the VM tests, because we most
likely won't need other languages. At least for now.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
By default this is now enabled, and it has to be explicitely enabled
using "enableOCR = true". If it is set to false, any usage of
getScreenText or waitForText will fail with an error suggesting to pass
enableOCR.
This should get rid of the rather large dependency on tesseract which
we don't need for most tests.
Note, that I'm using system("type -P") here to check whether tesseract
is in PATH. I know it's a bashism but we already have other bashisms
within the test scripts and we also run it with bash, so IMHO it's not a
problem here.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Basically, this creates a screenshot and throws tesseract at it to
recognize the characters from the screenshot. In order to produce a
result that is well enough, we're using lanczos scaling and scale the
image up to 400% of its original size.
This provides the base functionality for a new Machine method which will
be called waitForText. I originally had that idea long ago when writing
the VM tests for VirtualBox and Chromium, but thought it would be
disproportionate to the case.
The downside however is that VM tests now depend on tesseract, but given
the average runtime of our tests it really shouldn't have a too big
impact and it's only a runtime dependency after all.
Another issue is that the OCR process takes quite some time to finish,
but IMHO it's better (as in more deterministic) than to rely on sleep().
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The current way test reports get jquery,
src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"
only works when getting reports over http:// or https://, not file://.
Change it so that it works for all protocols by using a local copy of
jquery.
This fixes the issue where locally created and browsed test reports
cannot be navigated properly; clicking the '+' symbol to expand
sub-sections doesn't work.
Now you can just say:
$ nix-build '<nixos/tests/login.nix>'
You can still get the driver script for interactive testing:
$ nix-build '<nixos/tests/login.nix>' -A driver
$ ./result/bin/nixos-test-driver
You can now run a test in the nixos/tests directory directly using
nix-build, e.g.
$ nix-build '<nixos/tests/login.nix>' -A test
This gets rid of having to add the test to nixos/tests/default.nix.
(Of course, you still need to add it to nixos/release.nix if you want
Hydra to run the test.)
It requires a writable /nix/store to store the build result. Also,
wait until we've reached multi-user.target before doing the build, and
do a sync at the end to ensure all data to $out is properly written.
http://hydra.nixos.org/build/6496716