7/9/08


Java and XML: parsing XML

SAX - simple API for XML processing:

SAXParser saxParser = SAXParserFactory.newInstance().newSAXPar\ser();
saxParser.parse(new ByteArrayInputStream("<aaa>bbb</aaa>".getBytes()), new DefaultHandler());

Above is simplest sample. Now I am going to show parsing with validation by SAX.
Let's create simple SAX parser handler:

class PhoneBookHandler extends DefaultHandler{
 public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
  System.out.println(localName);
  super.startElement(uri, localName, name, attributes);
 }
 public void error(SAXParseException e) throws SAXException {
  e.printStackTrace();
  super.error(e);
 }

And main code:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;

public class Main {
 public static void main(String argv[]) throws Exception {
  SAXParserFactory pf = SAXParserFactory.newInstance();
  pf.setNamespaceAware(true);
  pf.setValidating(true);
  SAXParser p = pf.newSAXParser();
  p.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
  p.parse("D:/dev/Temp_java/src/com/PhoneBook.xml", new PhoneBookHandler());
 }
}

If you do not have schemaLocation in your xml file, you can define schema location in parser property:

p.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", new File("%PATH_TO_YOUR_XSD%/PhoneBook.xsd"));

schemaSource property in parser is more important than schemaLocation in xml instance.

DOM - document object model

import javax.xml.parsers.*;
...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("D:/dev/Temp_java/src/com/xml.xml"); System.out.println(doc.getElementsByTagName("Record").item(0).getFirstChild().getNextSibling());

JDOM - java-oriented (not XML) representation of an XML document
JDOM is the Java-based solution for accessing, manipulating, and outputting XML data from Java code. It corresponds JSR-102 - API for easy and efficient reading, manipulation, and writing of XML documents and XML data.

SAXBuilder builder = new SAXBuilder();
Document doc = builder.build("D:/dev/Temp_java/src/com/PhoneBook.xml");
System.out.println(doc.getRootElement().getChildren().get(0).toString());

StAX - Streaming API for XML
JSR 173 defines a pull streaming model, StAX (short for "Streaming API for XML"), for processing XML documents. In this model, unlike in SAX, the client can start, proceed, pause, and resume the parsing process. The client has complete control.
A StAX Implementation:
- Sun's Implementation - SJSXP
- BEA Reference Implementation
- WoodSToX XML Processor
- Oracle StAX Pull Parser Preview
- Codehaus StAX
Sun Java Streaming XML Parser - SJSXP - is an implementation of the StAX API. We need sjsxp.jar and jsr173_1.0_api.jar in class path

import java.io.FileInputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public class Main {

 public static void main(String argv[]) throws Exception {
  FileInputStream fileInputStream = new FileInputStream("D:/dev/Temp_java/src/com/PhoneBook.xml");
  XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(fileInputStream);

  while (true) {
   int event = xmlStreamReader.next();
   if (event == XMLStreamConstants.END_DOCUMENT) {
    xmlStreamReader.close();
    break;
   }
   if (event == XMLStreamConstants.START_ELEMENT) {
    System.out.println(xmlStreamReader.getLocalName());
   }
  }
 }
}

Result:

PhoneBook
BookRecord
id
name
address
email
phone
BookRecord
id
name
address
email
phone
BookRecord
id
name
address
email
phone

Apache Axiom - AXis Object Model - the XML object model that uses StAX as its underlying XML parsing methodology.
XML infoset refers to the information included inside the XML, and for programmatic manipulation it is convenient to have a representation of this XML infoset in a language specific manner. For an object oriented language the obvious choice is a model made up of objects. DOM and JDOM are two such XML models. Axiom is too, but it uses "pull parsing" - a recent trend in XML processing. Axiom is based on StAX (JSR 173 ), which is the standard streaming pull parser API. Axiom needs JAXP 1.3 so java 5 should be used, or -Djava.endorsed.dirs=D:/env/xml/jaxp-1_3 should be set.

import java.io.FileInputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import org.apache.axiom.om.OMElement;
import org.apache.axiom.om.impl.builder.StAXOMBuilder;

public class Main {

 public static void main(String argv[]) throws Exception {
  String xmlFName = "D:/dev/Temp_java/src/com/PhoneBook.xml";

  XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream(xmlFName));
  StAXOMBuilder builder = new StAXOMBuilder(parser);
  OMElement documentElement = builder.getDocumentElement();
  System.out.println(documentElement.getChildElements().next());
 }
}

Xerces - Apache parsers that supports standard APIs - most popular SAX and DOM parser.
Xerces is a family of software packages for parsing and manipulating XML, it provides both XML parsing and generation.
Creating a DOM Parser:

import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.Document;

public class Main {

 public static void main(String argv[]) throws Exception {
  String xmlFile = "D:/dev/Temp_java/src/com/PhoneBook.xml";
  DOMParser parser = new DOMParser();
  parser.parse(xmlFile);
  Document document = parser.getDocument();
  System.out.println(document.getChildNodes().item(0).getNodeName());
 }
}

Result:

pb:PhoneBook

Creating a SAX Parser:

import org.xml.sax.AttributeList;
import org.xml.sax.DocumentHandler;
import org.xml.sax.Locator;
import org.xml.sax.Parser;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.ParserFactory;

public class Main implements DocumentHandler {

 public static void main(String argv[]) throws Exception {
  String xmlFile = "D:/dev/Temp_java/src/com/PhoneBook.xml";
  String parserClass = "org.apache.xerces.parsers.SAXParser";
  Parser parser = ParserFactory.makeParser(parserClass);
  parser.setDocumentHandler(new Main());
  parser.parse(xmlFile);
 }

 public void characters(char[] ch, int start, int length) throws SAXException {}
 public void endDocument() throws SAXException {}
 public void endElement(String name) throws SAXException {}
 public void ignorableWhitespace(char[] ch, int start, int length){}
 public void processingInstruction(String target, String data)throws SAXException {}
 public void setDocumentLocator(Locator locator) {}
 public void startDocument() throws SAXException {}
 public void startElement(String name, AttributeList atts) throws SAXException {
  System.out.println(name);
 }
}

Result:

pb:PhoneBook
pb:BookRecord
pb:id
pb:name
pb:address
pb:email
pb:phone
pb:BookRecord
pb:id
pb:name
pb:address
pb:email
pb:phone
pb:BookRecord
pb:id
pb:name
pb:address
pb:email
pb:phone

Crimson XML – A faster SAX and DOM parser
...
Sparta XML – A fast and small SAX and DOM parser also includes an XPath subset
...
StelsXML is a JDBC type 4 driver that allows to perform SQL queries and other JDBC operations on XML files
...

No comments: