First of all, we're now using ImageMagick to improve the screenshot so
that Tesseract has an esier time to recognize the text. The resulting
image of this post-processing is a scaled up black-and-white version
with the backgrounds almost entirely removed and the text edges a bit
blurred, so the screen shots now more or less resemble an image from a
scanner rather. This is what Tesseract is trained for by default.
As mentioned in the previous commit we now also use Tesseract 4, which
further improves the quality of text recognition.
I've spent countless hours just to test different postprocessing
variants and testing what works best for our tests and this is the one
that worked best so far. It's certainly not perfect and I'd like to
avoid the scaling step but we're way better off than before.
In addition to this, the OCR process is now done without an intermediate
file, solely using pipes.
I've tested this using the following VM tests which have OCR enabled:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
All of the tests still succeed and comparing some of the recognition
results to the earlier results it now also detects a lot more text than
before this commit.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
I've removed that attribute in 68bc260ca2,
because the language files no longer were distributed as seperate files,
but if we for example only want to use the English training data, the
closure size of Tesseract gets quite large (around 1.2 GB), which is a
bit much just to be able to run NixOS VM tests.
For this reason I've also switched the VM tests back to using only the
English language.
Tested using the following VM tests (the ones that have OCR enabled) on
x86_64-linux:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This makes make-disk-image.nix slightly more consistent with other image
builders we have. Unfortunately I duplicated some code in doing so, but
this is temporary duplication on the path to consolidating everything.
See https://github.com/NixOS/nixpkgs/issues/23052 for more details on that.
I'm also exposing the option in the amazon-image.nix maintainer module.
A long-time issue and one of the reasons I've never used that function
before. So let's remove that todo-comment and escape the contents
properly.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @edolstra
`configuration` seems to be a reference to an argument that was
removed seven years ago in commit 2892aed7.
`configuration.nix` makes it a big more clear what we're referring to.
Ensure that archive members are added in sorted order with a fixed
mtime. This allows `nix-build --check` to succeed (when building a
tarball of a simple system configuration).
We also remove env-vars which doesn't appear to do much apart from
capture a bunch of store paths we probably don't want.
This is an alternative to
4b78a5b5fb
Previously we were using two or three (qemu_kvm, qemu_test, and
qemu_test with a different dbus when minimal.nix is included).
(cherry picked from commit 8bfa4ce82ea7d23a1d4c6073bcc044e6bf9c4dbe)
This option is defined in qemu-vm.nix, but that module is not always
imported.
http://hydra.nixos.org/build/44817443
(cherry picked from commit 03c55005dfd6fbcd5cf8e00128a3bb6336b3bc0f)
From the upstream changelog:
* Tesseract development is now done with Git and hosted at github.com
(Previously we used Subversion as a VCS and code.google.com for
hosting).
So let's move over to the GitHub repository, where the organisation also
includes a full repository for tessdata, so we no longer need to fetch
it one-by-one.
The build also got significantly simpler, because we no longer need to
run autoconf, neither do we need to patch the configure script for
Leptonica headers.
This also has the advantage that we don't need to use the
enableLanguages attribute for the test runner anymore.
Full upstream changelog can be found at:
https://github.com/tesseract-ocr/tesseract/blob/c4d273d33cc36e/ChangeLog
Tested against all NixOS tests with enabled OCR (chromium, emacs-daemon,
installer.luksroot and lightdm).
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @viric
tune2fs marks the filesystem as clean to prevent resize2fs from
complaining.
But we were invoking it before we mounted the filesystem, so the
counters would increase to 1 and it broke the functionality.
By moving the call after the mount, I have confirmed it works by:
$ nix-build nixos/tests/ec2.nix
cc @rbvermaa @edolstra
Regression introduced by 4dcb685af9.
Unsetting the environment variable shortly before using it is not going
to end up very well, so let's just filter out the variable from the
output of export and unset it shortly afterwards.
This fixes the runInMachine NixOS test.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
- Replace hand-rolled version of nixos-install in make-disk-image by an
actual call to nixos-install
- Required a few cleanups of nixos-install
- nixos-install invokes an activation script which the hand-rolled version
in make-disk-image did not do. We remove /etc/machine-id as that's
a host-specific, impure, output of the activation script
Testing:
nix-build '<nixpkgs/nixos/release.nix>' -A tests.installer.simple passes
Also tried generating an image with:
nix-build -E 'let
pkgs = import <nixpkgs> {};
lib = pkgs.lib;
nixos = import <nixpkgs/nixos> {
configuration = {
fileSystems."/".device = "/dev/disk/by-label/nixos";
boot.loader.grub.devices = [ "/dev/sda" ];
boot.loader.grub.extraEntries = '"''"'
menuentry "Ubuntu" {
insmod ext2
search --set=root --label ubuntu
configfile /boot/grub/grub.cfg
}
'"''"';
};
};
in import <nixpkgs/nixos/lib/make-disk-image.nix> {
inherit pkgs lib;
config = nixos.config;
diskSize = 2000;
partitioned = false;
installBootLoader = false;
}'
Then installed the image:
$ sudo df if=./result/nixos.img of=/dev/sdaX bs=1M
$ sudo resize2fs /dev/disk/by-label/nixos
$ sudo mount /dev/disk/by-label/nixos /mnt
$ sudo mount --rbind /proc /mnt/proc
$ sudo mount --rbind /dev /mnt/dev
$ sudo chroot /mnt /nix/var/nix/profiles/system/bin/switch-to-configuration boot
[ … optionally do something about passwords … ]
and successfully rebooted to that image.
Was doing all this from inside a Ubuntu VM with a single user nix install.
The tests need to expand passed variable and very carefully.
I could see no other easy way than to change single-quoting in
makeWrapper to double-quoting.
The tests now fail with the same problem as on master...
Regression introduced by d84741a4bf.
The mentioned commit actually is a good thing, because we now get the
output from the X session.
Unfortunately, for the i3wm test, the i3-config-wizard prints out the
raw keyboard symbols directly coming from xcb, so the output isn't
necessarily proper UTF-8.
As the XML::Writer already expects valid UTF-8 input, we assume that
everything that comes into sanitise() will be UTF-8 from the start. So
we just decode() it using FB_DEFAULT as the check argument so that
every invalid character is replaced by the unicode replacement
character:
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
We simply re-oncode it again afterwards and return it, so we should
always get out valid UTF-8 in the log XML.
For more information about FB_DEFAULT and FB_CROAK, have a look at:
http://search.cpan.org/~dankogai/Encode-2.84/Encode.pm#Handling_Malformed_Data
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We now generate a qcow2 image to prevent hitting Hydra's output size
limit. Also updated /root/user-data -> /etc/ec2-metadata/user-data.
http://hydra.nixos.org/build/33843133
Previously this was done in three derivations (one to build the raw
disk image, one to convert to OVA, one to add a hydra-build-products
file). Now it's done in one step to reduce the amount of copying
to/from S3. In particular, not uploading the raw disk image prevents
us from hitting hydra-queue-runner's size limit of 2 GiB.
- Enforce that an option declaration has a "defaultText" if and only if the
type of the option derives from "package", "packageSet" or "nixpkgsConfig"
and if a "default" attribute is defined.
- Enforce that the value of the "example" attribute is wrapped with "literalExample"
if the type of the option derives from "package", "packageSet" or "nixpkgsConfig".
- Warn if a "defaultText" is defined in an option declaration if the type of
the option does not derive from "package", "packageSet" or "nixpkgsConfig".
- Warn if no "type" is defined in an option declaration.
This prevents seeing lots of warnings about missing hashes/sizes in the
database when running "nix-store --verify --check-contents" for the
first time.