mailing-list for TeXmacs Users

Text archives Help


Re: [TeXmacs] Problem with polish language


Chronological Thread 
  • From: Jakub Kuźniar <address@hidden>
  • To: address@hidden
  • Subject: Re: [TeXmacs] Problem with polish language
  • Date: Sun, 13 Mar 2011 22:39:52 +0100
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:subject:date:user-agent:references:in-reply-to:mime-version :content-type:content-transfer-encoding:message-id; b=Aks0OGsi10uSErTNTnUetKl1MqMrwbt+BAj6TkMc9YX2DYOU9RscRtD48lK+dwDSgE yCL8D73LVeHCSWegmaN2e24WF/mYReV6FXxgzrlHtpNV8UKxC0k8BX/kCh5etqQsHl5W WnzTccgs5YoE+UANg3QaxD8VjeMxagzfJgcSU=

Dnia piątek, 11 marca 2011 o 19:19:15 napisałeś:
> Dnia piątek, 11 marca 2011 o 10:05:52 Sam Liddicott napisał(a):
> > On 11/03/11 08:54, Jakub Kuźniar wrote:
> > > 2011/3/11 Sam Liddicott <address@hidden <mailto:address@hidden>>
> > >
> > > On 10/03/11 22:42, Jakub Kuźniar wrote:
> > > Hello,
> > >
> > > I have a problem using TeXmacs with polish language.
> > > The problem concerns doing copy/paste text from other programs,
> > > into TeXmacs. After pasting, the polish diactric letters are
> > > shown as
> > > question marks.
> > >
> > > sample of pasted text:
> > >
> > > "Indywidualne ?rodki ostro?no?ci"
> > >
> > > When using Paste from ->Verbatim
> > > text is also displayed not correctly but there are used
> > > some other unreadable letters.
> > >
> > > Everything is fine when I write polish letters in TeXmacs. It
> > > displays
> > > correctly polish letters. What is interesting that when I
> > > save such file, and
> > > open it with e.g kwrite, I cannot match encoding. This is
> > > neither UTF-8 nor
> > > iso8859-2.
> > >
> > > TeXmacs version: 1.0.7.9, qt
> > >
> > > I think TeXmacs uses Cork encoding which has some symbols that
> > > UTF-8 doesn't even have.
> > >
> > > I think if you cut-n-paste UTF-8 it will work - do you know that
> > > your source is UTF-8? What OS are you using?
> > >
> > > And also if you save as tml format (TeXmacs-xml) the text will be
> > > UTF-8 but of course the full format will be XML.
> > >
> > > Sam
> > >
> > > Hello ,
> > > thank you for resposne.
> > > I am using opensuse 11.3, utf-8 here is a default enconding. I am
> > > doing copy paste from different sources,
> > > text editor like KDE kwrite where I explicitly set UTF-8 enconfig, or
> > > from the web browser or libreoffice writer.
> > > The TeXmacs exhibit the same behaviour in native windows port. The
> > > polish diactric letters are displayed as question marks
> > > after pasting. TeXmacs on Linux behaves in that way since many
> > > years as I remember. I always try to rewrite some part of the polish
> > > text wchich I need to paste into TeXmacs document.
> > > I will test saving as tml this evening.
> >
> > Now I think some more, what I do when I want to import UTF-8 symbols
> > into texmacs is use openoffice-writer to select the symbol, I then
> > cut-n-paste to gedit and save that as text, and import it into TeXmacs
> > as a new document and then cut-n-paste from that int my main document.
> >
> > Sam
>
> Sam,
>
> Thank you, your workaround works. Also after saving as *.tmml, the letters
> are saved as utf8, but this has no practical sense for me, I was just
> courious. Now I have some way do import bigger portions of text into
> texmacs.
>
> I am wondering, that problem with copy paste, it is some kind of a bug, or
> a technical problem for which some kind of solution should be implemented
> ?
>
> I am not sure but just after starting to play with texmacs long time ago,
> it was about year 2002, I think that this copy paste functionality worked
> with texts in non latin encondings. But maybe I am wrong...
>
> Kuba


I found the answer on my questions. This is a problem with implementation.
After downloading the source of TeXmacs, and searching through the code,
I found that in file qt_qui.cpp in function get_selection all the text,
whenever it is utf8 or other it is converted to bare ASCII by qt function
toAscii.
When I commented out this function, and after applying some other
modifications, the program work as expected. After pasting the
text from the web browser or other programs, the polish letters where
preserved.

The problem is that this what I have done is not a right solution. TeXmacs
after importing utf-8 from external source, tries to convert to cork
encoding.
Program differentiates between types of conversion based upon the
document language. So when my document language is polish it assumes, that
pasted text is in iso8859-2 (not utf8) and runs function il2_to_cork()
(latin2 to cork) from edit_select.cpp. So it tries to convert utf8 text to
cork using latin2 to cork function.

The solution would be to convert directly from external encoding to cork. I
believe that this function il2_to_cork was created it times when no Linux
distro used utf and polish was encoded in iso8859-2.

So, I could do something with it, but I am not sure if it is a general
problem
which requires some more coding.


Thanks for help.
Kuba













Archive powered by MHonArc 2.6.19.

Top of page