1. Gnumeric has unbalanced XML tags in its doc translations.
2. itstool's XML error handler tries to print this error with context.
3. libxml2's context snipper treats the data as bytes, not UTF-8.
4. python3Packages.libxml2 casts the context to a UTF-8 Python string.
5. itstool dereferences a null pointer.
This patch intervenes at #4.
In https://bugzilla.gnome.org/show_bug.cgi?id=789714#c4 , upstream
suggests that intervening at #3 would be better -- that each of the four
copies of xmlParserPrintFileContextInternal() have four additional UTF-8
problems, one of which is that the caret indicator ought to count
"unicode characters" not bytes. But to position a caret correctly, a
character count is not sufficient -- this would need to use icu's BiDi
logic (with fallback to doing something wrong when libxml2 is configured
not to use icu) -- which makes a 'correct' fix a much larger project
than this simple band-aid.
To get python3 support. #63174 flipped itstool to python3, but itstool
doesn't support python3 until 2.0.3 (and perhaps does not support it
well until 2.0.5).
Pressing forward instead of rolling back at worldofpeace's suggestion,
who mentions that other distros seem to be able to ship recent versions
of itstool.
Tensions in this space seem two-fold. One set centers around libxml2
being a low-level C library with sharp edges, manual memory management,
and performance concerns; the python libxml2 wrapper being quite thin
(the most dubious character in this drama); and python's sentiment that
it ought to be quite hard to crash the interpreter casually. This comes
to a head in https://gitlab.gnome.org/GNOME/libxml2/issues/12 , where a
use-after-free problem in idiomatic-looking python code is declared
working-as-designed.
The other set is around python3 being more UTF-8-aware than libxml2's
python wrapper, such as https://bugzilla.gnome.org/show_bug.cgi?id=789714
and https://src.fedoraproject.org/rpms/libxml2/blob/master/f/libxml2-2.9.8-python3-unicode-errors.patch
itstool is caught in this crossfire merely for being a widely-used
python program that uses XML.