Extensible Hypertext Markup Language (XHTML)


HTML als XML Anwendung

Warum XHTML?

Bedingungen für XHTML konforme Dokumente

Beispiel

<?xml version="1.0"?>
<!DOCTYPE html 
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
<html xmlns="http://www.w3.org/TR/xhtml1">
<head>
<title>Browser Titel</title>
</head>
<body>
<h1>Dokument Titel</h1>
<p>
Ein Paragraph <br />
auf zwei Zeilen.
</p>
<math xmlns="http://www.w3.org/TR/REC-MathML">
   ... Text in MathML ...
</math>
</body>
</html>

Verhalten von XML User Agents (Browser)

Unterschiede zu HTML 4.0

Diese Datei in XHTML.

Tips und Hinweise

Aber ...

From: Tim Berners-Lee <timbl@w3.org>
To: w3c-ac-members@w3.org <w3c-ac-members@w3.org>
Date: Wednesday, November 03, 1999 11:22 AM
Subject: W3C: XHTML 1.0 returned to HTML WG

Dear Advisory Committee Member,

XHTML 1.0 is hereby sent back to the HTML working group for further work.

On 24th August 1999, we asked for review of the XHTML 1.0 Proposed
Recommendation. The review period lasted until September 22nd. It
was encouraging to see so many member organizations planning on
delivering products supporting XHTML.

There was, however, a significant lack of consensus around a number of
points, based on the feedback received during the review. The HTML
working group is being asked to address the issues raised and to
present a revised specification for further member review as soon
as possible.

In summary, W3C Members wanted the HTML working group to revise
the XHTML 1.0 specification to utilize a single namespace.
There are a number of separate questions involved in this,
such as whether a namespace identifier should be changed
(a) between versions of the same specification and (b) between
different strict, transitional and frameset document types of the
original HTML 4 spec on which xHTML is based.

A few respondents were also concerned about the use of the text/xml
media type for delivering xHTML, considering this to be "premature".
If a document conforming to XML 1.0 and XML Namespaces is not to be
considered "text/xml", this raises an important issue as to what is.

Whatever decision is made, XHTML as a specification must of
course define some conformance phrase (what it is that it is defining)
and the constraints on and meaning of a conforming document.

We will be sending out a separate message on the HTML 4.01
specification, which was reviewed at the same time as XHTML 1.0.

Tim Berners-Lee
Director W3C
Quelle: http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Nov-1999/0106.html

Arbeitsweise des W3C

Warum der Rückzug?

Ausblick


HTML Tidy

Tidy-Logo

Beispiele für die Arbeitsweise von HTML Tidy

Beispiel für schlechtes HTML bad.html und das Ergebnis nach Bearbeitung mit HTML Tidy good.html.

Fehlermeldungen aus dem Beispiel

> tidy  exam/bad.html >exam/good.html

Tidy (vers 19th October 1999) Parsing "exam/bad.html"
line 3 column 1 - Warning: inserting missing 'title' element
line 5 column 2 - Warning: replacing unexpected <h2> by </h1>
line 5 column 37 - Warning: discarding unexpected </h3>
line 7 column 42 - Warning: replacing unexpected </i> by </b>
line 8 column 15 - Warning: replacing unexpected </b> by </i>
line 10 column 45 - Warning: missing </i> before </h2>
line 12 column 4 - Warning: inserting implicit <i>
line 14 column 2 - Warning: missing </i> before <p>
line 14 column 4 - Warning: inserting implicit <i>
line 14 column 43 - Warning: discarding unexpected <a>
line 16 column 2 - Warning: missing </a> before <li>
line 16 column 2 - Warning: missing </i> before <li>
line 16 column 2 - Warning: inserting implicit <ul>
line 24 column 1 - Warning: unknown attribute "tidy"
line 30 column 1 - Warning: <img> lacks "alt" attribute

"exam/bad.html" appears to be HTML proprietary
15 warnings/errors were found!

The alt attribute should be used to give a short description
of an image; longer descriptions should be given with the
longdesc attribute which takes a URL linked to the description.
These measures are needed for people using non-graphical browsers.

For further advice on how to make your pages accessible
see "http://www.w3.org/WAI/GL". You may also want to try
"http://www.cast.org/bobby/" which is a free Web-based
service for checking URLs for accessibility.

HTML & CSS specifications are available from http://www.w3.org/
To learn more about Tidy see http://www.w3.org/People/Raggett/tidy/
Please send bug reports to Dave Raggett care of <html-tidy@w3.org>
Lobby your company to join W3C, see http://www.w3.org/Consortium

Aufruf und Verwendung von HTML Tidy

> tidy  [[options] files]*

tidy: file1 file2 ...
Utility to clean up & pretty print html files
see http://www.w3.org/People/Raggett/tidy/

options for tidy released on 19th October 1999
  -config <file>  set options from config file
  -indent or -i   indent element content
  -omit   or -o   omit optional endtags
  -wrap 72        wrap text at column 72 (default is 68)
  -upper  or -u   force tags to upper case (default is lower)
  -clean  or -c   replace font, nobr & center tags by CSS
  -raw            leave chars > 128 unchanged upon output
  -ascii          use ASCII for output, Latin-1 for input
  -latin1         use Latin-1 for both input and output
  -iso2022        use ISO2022 for both input and output
  -utf8           use UTF-8 for both input and output
  -mac            use the Apple MacRoman character set
  -numeric or -n  output numeric rather than named entities
  -modify or -m   to modify original files
  -errors or -e   only show errors
  -quiet or -q    suppress nonessential output
  -f <file>       write errors to <file>
  -xml            use this when input is wellformed xml
  -asxml          to convert html to wellformed xml
  -slides         to burst into slides on h2 elements
  -help   or -h   list command line options
Input/Output default to stdin/stdout respectively
Single letter options apart from -f may be combined
as in:  tidy -f errs.txt -imu foo.html
For further info on HTML see http://www.w3.org/MarkUp

Einige wichtige Optionen

markup: yes, no
Erzeugen des verbesserten Markups.
wrap: number
Zeilenumbruch bei angegebener Spalte. 0 = abgeschaltet.
input-xml: yes, no
Einlesen als XML.
output-xml: yes, no
Ausgabe von XML.
output-xhtml: yes, no
Ausgabe von XHTML.
doctype: omit, auto, strict, loose or <fpi>
Festlegen des DOCTYPE in der Ausgabe.
char-encoding: raw, ascii, latin1, utf8 or iso2022
Festlegen des Zeichensatzes in der Ausgabe.
fix-backslash: yes, no
Wandelt "\" in URLs zu "/".
word-2000: yes, no
Versucht Müll, der von Word 2000 produziert wird zu entfernen.
clean: yes, no
Versucht überflüssigen Präsentations-Markup durch Stilregeln (CSS) oder Struktur-Markup zu ersetzen.
logical-emphasis: yes, no
Ersetzt i durch em, b durch strong, impliziert clean.
enclose-text: yes, no
Fasst Text auf Body-Level in Paragraphen. Wichtig für funktionierende Stilvorlagen.
split: yes, no
Teilt die Datei an h2 Elementen in einzelne "Folien".
new-empty-tags: tag1, tag2, tag3
new-inline-tags: tag1, tag2, tag3
new-blocklevel-tags: tag1, tag2, tag3
new-pre-tags: tag1, tag2, tag3
Definition von neuen Tags der entsprechenden Art.

Beispiel für ein Config-File

/* HTML Tidy configuration file */
markup: yes
wrap: 0
doctype: strict
break-before-br: yes
logical-emphasis: yes
enclose-text: yes
/* eof */

Was in Arbeit ist


© Universität Mannheim, Rechenzentrum, 1998-2000.

Heinz Kredel
Last modified: Wed Feb 23 22:08:41 MET 2000