Discussion:
LaTeX + formulas & equations + graphics --> Dokbook + MathML + SVG
Albretch Mueller
2008-10-18 01:40:28 UTC
Permalink
Hi,
~
I need to do lots of reformatting from latex to Docbook. The thing is
that the latex files contain quite a bit of equations and formulas as
well as raster graphics
~
Converting the graphics in the latex files to SVG I think will be a
matter of labor that I will have to somehow optimize
~
I have heard that these formats' markup do not exactly match up
~
Any cases you know of people that have done similar things?
~
Any best practices doing these kinds of reformatting?
~
Thanks
lbrtchx
Albretch Mueller
2008-10-18 09:39:56 UTC
Permalink
So I'm interested to hear that you're going the other way ...
~
Hi Stephen/list
~
I see why we are looking at the same thing from different perspectives
~
Your main interest as a (print) journal editor is
typesetting/formatting/lay out, while I am trying to use docbook as a
CMS repository, also the actual text I got was in latex format
~
At least docbook's xml is good at reformatting to many other formats
including text (html and rtf) and image-based formats such as pdf and
ps
~
I am curious about the specific problems you are having because I
have always heard complaints of people going the other way around,
from latex to, say, html, because latex features are a superset not
well represented in the html format
~
lbrtchx
Stephen Taylor
2008-10-18 11:52:26 UTC
Permalink
Albretch

You are right: we have a print journal's interest in math typesetting.
However, we also have a 25-year archive to curate. And we want it online.

We have wondered whether recent improvements to browsers and search engines
have by now eroded our long-standing objections to serving articles in PDF
instead of HTML. Our terse and unusual programming languages use Unicode
points far outside ASCII; cut/paste from articles is particularly valuable.
Encouragingly, the PDFs we've produced from DocBook seem to do this
correctly. (Adrian, take note.) Thus our interest in DocBook as a source
format from which to produce camera-ready PDF and web-ready HTML.

We do appreciate that DocBook is designed for technical documentation and
rightly contents itself with a lower standard of typesetting than TeX's. So
that's an open question for us: are the XML-FO formatters up to it? We
wonder about the XEP line-break algorithm that put some truly awful
hyphenation in Vector 23:4 <http://www.vector.org.uk/archive/v234/v234.pdf>.
We have yet to tackle any serious CMN (common math notation) typesetting
using MathML. All the CMN in the current issue was kluged, but eventually we
have to set some real CMN.

In choosing to try DocBook before LaTeX, I noticed the existence of
aLaTeX=>HTML process<http://www.latex2html.org/>,
but paid it little attention as the site looked neglected. That might have
been a mistake. Knuth considers typesetting a finite problem completely
addressed by TeX, and since TeX version 3, the version numbers have
converged on π. So perhaps there aren't any loose ends to work on.

Did you consider storing the LaTeX source and using the LaTeX and latex2html
processes to generate other formats from it? Or does your CMS somehow
preclude that?

Or perhaps you took the view that any kind of XML was a better medium than
TeX for an archive?
*
Vector* has slender production resources, so we want our usually tech-savvy
contributors to submit articles in a format that minimises our work. A
subset of XHTML might be a better markup for contributors: authors can get
immediate visual feedback with a browser and stylesheet. We already know we
can get camera-ready PDF by importing XHTML into Word (a fall-back), and
imagine we can write XSL to convert that XHTML subset into DocBook and
whatever HTML the site design-de-jour calls for.

We might produce an issue using LaTeX before drawing a conclusion.

Any views on our production and archiving strategy most welcome.

Stephen Taylor
***@vector.org.uk
www.vector.org.uk
Post by Albretch Mueller
So I'm interested to hear that you're going the other way ...
~
Hi Stephen/list
~
I see why we are looking at the same thing from different perspectives
~
Your main interest as a (print) journal editor is
typesetting/formatting/lay out, while I am trying to use docbook as a
CMS repository, also the actual text I got was in latex format
~
At least docbook's xml is good at reformatting to many other formats
including text (html and rtf) and image-based formats such as pdf and
ps
~
I am curious about the specific problems you are having because I
have always heard complaints of people going the other way around,
from latex to, say, html, because latex features are a superset not
well represented in the html format
~
lbrtchx
W. Martin Borgert
2008-10-18 13:05:43 UTC
Permalink
Post by Stephen Taylor
Any views on our production and archiving strategy most welcome.
For DocBook-to-PDF, take a look at http://dblatex.sourceforge.net/,
which supports math nicely, if I remember correctly. For the HTML
side of things, I don't know. dblatex is very powerful and easy to
customise, if you have LaTeX and/or XSL knowledge.
Albretch Mueller
2008-10-18 13:14:35 UTC
Permalink
We do appreciate that DocBook is designed for technical documentation and rightly contents itself with a lower standard of typesetting than TeX's ...
Or perhaps you took the view that any kind of XML was a better medium than TeX for an archive?
~
Stephen I would still disagree and I will again be "kind enough" ;-)
to explain to you why in details.
~
I think these are "skinning the cat"-types of problems, not a problem
with the cat, docbook, itself. I am not claiming to be a docbook guru,
but I am a professional (Java, ANSI C, C++) programmer and since
docbook is xml-based, I would:
~
1) depending on the client type of the sought output; which you could
get from the client based on their user-agent/browser and/or request
URL (what do they need a web page?, if so?, ... or a pdf or ps file?,
...);
~
2) channel the docbook through custom xsl or some code using
xpath-based addressing (which is based on MSO, so you will be able
able to safely point to any section of your docbook) this custom
preprocessing will, as Vincent Hennebert pointed out to me, if needed,
produce first latex and then use extensions like pstricks or
pdftricks; to then,
~
3) produce the pdf, ps, html or whatever the output should be with
the processors that there are out there
~
Please, notice that I did describe the steps as an explanation of how
to do it, but it is not a process spec this does not have to happen
each time a client hits your servers. You could and should easily case
your clients and (pre-)produce their output type once
~
This would effectively be a poor's man CMS that would let you have
and eat your cake
~
lbrtchx

Loading...