mailing-list for TeXmacs Users

Text archives Help


Re: Dealing with Cork (e.g., in Emacs)


Chronological Thread 
  • From: Sebastian Miele <address@hidden>
  • To: Massimiliano Gubinelli <address@hidden>
  • Cc: TeXmacs <address@hidden>
  • Subject: Re: Dealing with Cork (e.g., in Emacs)
  • Date: Wed, 29 Jul 2020 14:02:18 +0200

Massimiliano Gubinelli <address@hidden> writes:

> This program seems to be able to deal with cork:
>
> https://github.com/nijel/enca/

Thanks for the pointer.

enca adds encoding-detection heutistics to recode, which is used for
actual conversion. However, enca does not correctly detect (at least)
somewhat perverse examples of .tm files. But there is a more fundamental
problem.

I tried 'recode' directly. In general, it does support CORK to UTF-8 in
a precise an predictable way.

But: TeXmacs does not really use Cork! As usual on Linux, it uses 0x0a
for newlines. But 0x0a in Cork is the diacritic 'point above', which
recode indeed faithfully converts to Unicode. The resulting file has no
newlines and funny sequences of points instead. So there is at least one
divergence from Cork in what TeXmacs actually uses.

According to https://en.wikipedia.org/wiki/Cork_encoding, Cork seems to
not contain a newline character at all (the empty box at 0x20 in the
table most probably means 0x20 = Cork space = ASCII space). A search for
an official document describing Cork was frustrating and unsuccessful.

For now, I just leave it that way.

Since the XML serialization of TeXmacs trees does use Unicode, there
already is all that I want in TeXmacs itself, including a kind of formal
specification of the TeXmacs (Cork-like) encoding in terms of code
converting to Unicode. When I know enough (deeper) TeXmacs, I will
either write a converter to UTF-8 TeXmacs Scheme, or, better, directly
help to port TeXmacs to Unicode.



Archive powered by MHonArc 2.6.19.

Top of page