www.destructor.de

About | Contact | Impressum


Home |  Code |  Articles |  Misc |  x
XML Parser |  TAR Library |  Linked Lists |  WinSock 1.1 |  x
General |  Downloads |  Documentation |  History |  x

XML Parser FAQ

Frequently Asked Questions about the XML Parser

↓ How can I write XML with the parser?

↓ I have Empty-Element-Tags in my XML (like <br/>) -- how can I parse them?

↓ How can I retrieve the attribute values from my start tags/empty-element tags?

↓ Why are the attribute functions case sensitive?

↓ What is the difference between TXmlParser and TXmlScanner?

↓ Does the parser work with C++Builder, Kylix, C#, Delphi for .NET ?

↓ How can I find out the line number of the current line?

↓ How can I find out the depth of a node in the XML structure?

↓ How can I process UTF-8 based XML files?

↓ What's the difference between Text Content and CDATA sections?

↓ How can I display a progress bar during parsing?

↓ Can I use the parser for commercial projects?


Q: How can I write XML with the parser?

A: The parser is just that: a parser. There is nothing there to write XML. (But hey, you can read XML quite fast ;-)

As XML is so simple, my preferred method for writing XML is Writeln.


Q: I have Empty-Element-Tags in my XML (like <br/>) -- how can I parse them?

A: They are handled as special parts:

TXmlParser will report them as a ptEmptyTag part type.

TXmlScanner will report them as an OnEmptyTag event.


Q: How can I retrieve the attribute values from my start tags/empty-element tags?

A: TXmlParser has a property named CurAttr and TXmlScanner passes an Attributes parameter, both of which are of type TAttrList. You can access the attributes by name or index from there:

Edit1.Text := CurAttr.Value ('name');   // To get the value of the 'name' attribute
Name := CurAttr.Name (0);               // To get the name of the first attribute
Value := CurAttr.Value (0);             // To get the value of the first attribute

The number of attributes can (of course) be found in the .Count property of TAttrList.


Q: Why are the attribute functions case sensitive?

A: Everything in XML is case sensitive by definition. As XML is meant to be used for Unicode applications (which has been even expanded in XML 1.1), all XML element, entity and attribute names are case sensitive. In a lot of non-latin scripts there is no such thing as casing, so it would be hard to compare two strings in a non-case-sensitive way.

(I know that this is hard for Delphi language programmers, especially on a PC. But that's the XML way ...)


Q: What is the difference between TXmlParser and TXmlScanner?

A: TXmlParser is a Delphi CLASS for easy parsing of an XML file. You have to create an instance of this class and use its methods and properties to read your XML.

TXmlScanner is a VCL wrapper for TXmlParser. So you have a non-visual component which represents the parser and you have events like "OnStartTag" which are fired when there is a start tag found in the XML. To start parsing, you have to call the Execute method.

Advantage of TXmlScanner: Very easy to use: put component on your form, fill out events, call .Execute

Advantage of TXmlParser: you have a local loop, so you can handle everything in local variables. 

To sum it up: Use TXmlParser for serious work and TXmlScanner for "quick hacks".


Q: Does the parser work with C++Builder, Kylix, C#, Delphi for .NET ?

A: 

C++Builder: Yes

Kylix: Yes

C#/.NET: No -- there's too much specific code inside (this is why it's so fast ...)


Q: How can I find out the line number of the current line?

There is nothing in the parser to find out the line number. So you'll have to code that on your own.

(Note that the parser does not convert all line breaks to Linefeed (#x0A) characters before parsing as defined by the XML specification).


Q: How can I find out the depth of a node in the XML structure?

There's nothing there for this. You'll have to code this on your own. I don't think that this is really bad, because then you can code it in a way that suits your application best.


Q: How can I process UTF-8 based XML files?

A for Version 2: Just parse your file.

A for Version 1: The virtual method TXmlParser.TranslateEncoding is responsible for transcoding from the source character set of your XML to the destination character set of your application.

The default method tries to translate UTF-8 to Windows-1252, which is not a good idea if you use characters outside the Windows-1252 range. You should override TranslateEncoding with a method that just passes UTF-8 through:

FUNCTION TMyOwnXmlParser.TranslateEncoding (CONST Source : STRING) : STRING;  // OVERRIDE;
BEGIN
  Result := Source;
END;

In this case, your application must be prepared to process UTF-8 strings.


Q: What's the difference between Text Content and CDATA sections?

You as an application programmer who reads XML can treat them the same. 

CDATA sections are there to help with text content that has a lot of characters which would have to be escaped otherwise. However, as there is the character sequence ]]> which is not allowed in CDATA sections (because it terminates them), you'll have to be careful, too. So, for me, there's no use for CDATA sections when you write XML.


Q: How can I display a progress bar during parsing?

A: There are two or three important member variables for this:

DocBuffer (PChar) points to the first character of your XML

CurStart (PChar) always points to the first character of the part you are currently scanning

CurFinal (PChar) always points to the last character of the part you are currently scanning

So, for example, for a ptStartTag, CurStart points to the opening angle bracket (<) and CurFinal points to the closing angle bracket (>) of your tag.

You can use the difference between CurFinal and DocBuffer to show the progress of parsing

var
  DocSize  : integer;   // Document Size in bytes
  Progress : integer;   // Progress in percent
begin
  XP.LoadFromFile (...);
  DocSize := StrLen (XP.DocBuffer);
  while XP.Scan do begin
    case XP.CurPartType of
      ...
      end;
    Progress := Trunc ((XP.CurFinal - XP.DocBuffer) / DocSize * 100.0);
    end;

The progress is calculated using PChar arithmetics: Subtracting DocBuffer from CurFinal retrieves an integer value giving the "distance" between the two characters. This is related to DocSize and multiplied with 100. (The 100 is noted as "100.0" so there will be no integer arithmetics involved in this real type expression.)


Q: Can I use the parser for commercial projects?

A: Yes, there is no limitation and no royalty. You are not obliged to publish your source code or your work.

The XML parser is subject to my own DSL Licence, which says that you can do practically everything with my code.