Skip to Content.
Sympa Menu

texmacs-users - XML conversion from TeXmacs documents

Subject: mailing-list for TeXmacs Users

List archive

XML conversion from TeXmacs documents


Chronological Thread 
  • From: David Allouche <address@hidden>
  • To: address@hidden
  • Subject: XML conversion from TeXmacs documents
  • Date: Thu, 7 Nov 2002 02:27:50 +0100

Over time, several users have request a XML export filter for TeXmacs
documents. I have decided it was time to throw a quick hack around and
see what happen.

Attached to this mail is a shell script named tm-to-xml.
Usage: tm-to-xml file.tm
Output the XML conversion of file.tm to standard output.

Conversion rules:

-- All texmacs elements are converted to XML elements of the same
name and arity.

-- String nodes are converted to special elements <s> (meaning
String).

-- When a string node contain universal symbols, the universal
symbols are converted to special element <u s="name"> (meaning
Universal Symbol) where name is the symbol name.

-- Quote characters \" in name attributes are replaced by &quot;.

Conversion example:

Hello <with|mode|math|\<alpha\>> world!
-->
<concat><s>Hello </s><with><s>mode</s><s>math</s><s><u
s="alpha"/></s></with><s> world!</s></concat>

How it is done:

1. texmacs is invoked (with the -b option) to export file.tm in
Scheme format to a temporary file.

2. The temporary file is processed by a small guile script which
produces the XML output

The use of the Scheme format as an intermediate format makes an
output which is much more faithful to the internal tree structure
than what user are generally used to.

Caveats:

No pretty printing of the output. There are hundreds of tools around
to do that job. Anyway when one want XML, that is generally for
automatic processing, so pretty-printing is not a requirement.

No unicode support. That means at least two things:

-- Universal symbols which have a unicode representation are not
converted to anything directly usable. However the <u> elements
could be straightforwardly substituted by the appropriate
numeric entities. We just need someone to figure out the
conversion tables.

-- The encoding of the resulting XML is the encoding internally
used by TeXmacs. If the original document contains non-ASCII
letters, I do not now which encoding is used (Joris?).

Thanks:

http://www.shelldorado.com/goodcoding/tempfiles.html
How to handle temporary files properly in scheme scripts.

Jorik Blaas (address@hidden)
For texmacs-printer.scm, which made me discover when TeXmacs
eventually supported batch processing.

--
David Allouche | GNU TeXmacs -- Writing is a pleasure
Free software engineer | http://www.texmacs.org
http://ddaa.net | http://alqua.com/tmresources
address@hidden | address@hidden
TeXmacs is NOT a LaTeX front-end and is unrelated to emacs.

#!/bin/sh

in_file="$1"

buf_init="${TMPDIR:=/tmp}/stx$$1"
scm_xml="${TMPDIR:=/tmp}/stx$$2"
scm_file="${TMPDIR:=/tmp}/stx$$3"

# Assure the file is removed at program termination
# or after we received a signal:
trap 'rm -f "$buf_init" "$scm_xml" "$scm_file" >/dev/null 2>&1' 0
trap "exit 2" 1 2 3 13 15

cat > "$buf_init" <<-EOF
(save-scheme-buffer "${scm_file}")
(quit-TeXmacs)
EOF

texmacs -b "$buf_init" $in_file >/dev/null

cat > "$scm_xml" <<-EOF
(define (list-starts? l what)
(cond ((null? what) #t)
((null? l) #f)
(else (and (equal? (car l) (car what))
(list-starts? (cdr l) (cdr what))))))

(define (list-replace l what by)
(cond ((null? l) l)
((list-starts? l what)
(let ((tail (list-tail l (length what))))
(append by (list-replace tail what by))))
(else (cons (car l) (list-replace (cdr l) what by)))))

(define (string-head s n)
(substring s 0 n))
(define (string-tail s n)
(substring s n (string-length s)))

(define (string-replace s what by)
(list->string
(list-replace
(string->list s)
(string->list what)
(string->list by))))

(define (escape-quotes s)
(string-replace s "\\"" "&quot;"))

(define (print-xml-string s)
(let ((i (string-index s #\\<)))
(if (not i)
(display s)
(let ((head (string-head s i))
(sym-and-tail (string-tail s (1+ i))))
(display head)
(display "<u s=\\"")
(let ((j (string-index sym-and-tail #\\>)))
(let ((sym (string-head sym-and-tail j))
(tail (string-tail sym-and-tail (1+ j))))
(display (escape-quotes sym))
(display "\\"/>")
(print-xml-string tail)))))))

(define (print-xml x)
(if (pair? x)
(begin
(display "<")
(display (symbol->string (car x)))
(display ">")
(for-each print-xml (cdr x))
(display "</")
(display (symbol->string (car x)))
(display ">"))
(begin
(display "<s>")
(print-xml-string x)
(display "</s>"))))

(print-xml (read))
(newline)
EOF

cat "$scm_file" | guile -s "$scm_xml"

  • XML conversion from TeXmacs documents, David Allouche, 11/07/2002

Archive powered by MHonArc 2.6.19.

Top of Page