This is from the commit message I've written for the upstream pull
request (jflesch/pyocr#62):
This is a bit more involved, because Tesseract 3.05.00 comes not
only with improvements but also with a few quirks we need to deal
with.
The first quirk is that the order arguments of the `tesseract'
command now matters and the list of configurations has to be at the
end of the command line. So we add a new attribute tesseract_flags
to the BaseBuilder class that contains a list of all the flags to
pass to `tesseract', the tesseract_configs attribute however remains
pretty much the same but now only really contains a list of configs
instead of being mixed with flag arguments.
Another quirk has to do with Leptonica >= 1.74 which Tesseract
3.05.00 now requires. Leptonica has special handling of files that
reside in /tmp and assumes that it's an internal temporary file of
Leptonica. In order to deal with it, we now run Tesseract in a
temporary directory, which contains the input/output files and use
the relative name of these files because Leptonica only searches for
path names beginning with /tmp.
Fortunately the last item we need to address is not really a quirk,
but an API change. In Tesseract 3.05.00 there is now a new function
called TessBaseAPIDetectOrientationScript(), which doesn't fill the
OSResults object anymore but now allows to pass the values we're
interested in directly by reference. We need to use this new
function because the old function TessBaseAPIDetectOS() now *always*
returns false.
I've tested this specifically on NixOS and in conjunction with Paperwork
(the only package that's using pyocr so far) and all the tests of the
dependency chain are now succeeding. However, I didn't do manual tests
of Paperwork though.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Upstream changes for version 0.4.5:
* Clean up exceptions raised when OCR fails:
* Now, all tools raise only exceptions inheriting from
pyocr.PyocrException
* There is now one and only one TesseractError (shared between
pyocr.libtesseract and pyocr.tesseract)
Upstream changes for version 0.4.6:
* hOCR outputs: Generate valid XHTML files
The full upstream changelog can be found at:
https://github.com/jflesch/pyocr/blob/master/ChangeLog
Note that because of the version bump of Tesseract neither version 0.4.4
nor version 0.4.6 succeed to build, so we need to fix this up soon.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The changes are a bit too big to include it here in the commit message,
so if you want the details of what changed, please visit this URL:
http://leptonica.org/source/version-notes.html
I have also provided openjpeg, giflib and libwebp as dependencies so
that Leptonica is able to read/write those file formats.
Additionally I've added a patch that uses pkgconfig to resolve all
dependencies (except giflib), because unlike AC_CHECK_LIB() the
PKG_CHECK_MODULES() macro defines *_LIBS variables to include the linker
search path.
Unfortunately that patch alone is not enough, because the *_LIBS
variable are substituted by the upstream configure.ac to *not* include
the linker search paths, so we need to remove the AC_SUBST() calls
within PKG_CHECK_MODULES().
The only dependency that's not yet using PKG_CHECK_MODULES() is giflib,
because giflib doesn't have a pkg-config description file, therefore
we're using substituteInPlace to insert the linker search path after the
lept.pc file was generated by configure.
Another thing that we no longer need is the dependency on libpng version
1.2, because Leptonica now also works with more recent libpng versions.
Tested by building the package itself and also the following packages
that immediately depend on leptonica:
* k2pdfopt
* tesseract
* jbig2enc
All of these packages succeeded to build on x86_64-linux.
The main reason why I'm bumping Leptonica to version 1.74.1 is that we
need at least version 1.74 to bump Tesseract to the latest upstream
version.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
There are a few dozen new failures on Darwin, probably related to
updates of stdenv's llvm and/or pkgconfig.
Still the total number of successes increases.
Including apple_sdk.sdk is generally a recipe for a bad time on LLVM 3.8
and above, since you end up with bad headers in the wrong place that hurt
the new libc++ in 3.8 and above. In this case, qt only wanted the super-
generic SDK for CUPS headers, which we can just depend on directly now.