Parsing XML files in Perl using XML::Simple


The Perl XML::Simple module provides an easy way for parsing XML files. Using the provided XMLin function, the whole XML file is converted into a hash table which can then be used to access the XML entries. The following article uses a very simple example to explain how to use the XML::Simple module. The example script converts a catalog of books stored in an XML file into a HTML table.


A simple XML parser using XML::Simple

As an example, let's consider the following xml file.

<?xml version="1.0"?>
<catalog>
 
  <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in
      detail, with attention to XML DOM interfaces, XSLT processing,
      SAX and more.</description>
  </book>
 
  <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
  </book>
 
  <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters,
      battle one another for control of England. Sequel to
      Oberon's Legacy.</description>
  </book>
 
</catalog>
</xml>

The above xml file is a database that stores a catalog of books.

Using the XML::Simple module, we are going to implement a simple Perl script that parses the xml file and renders its contents as an HTML table.

This is the script :

use XML::Simple;
 
my $file = './file.xml';
 
# Here we create a new parser object
my $parser= XML::Simple->new();
 
# We tell the parser to parse the input file into the
# variable $doc.
my $doc = $parser->XMLin($file);
 
open (HTML, '>output.html');
 
printf HTML "<html>\n";
printf HTML "<body>\n";
 
printf HTML "<table border=1px cellpadding=5px>\n";
printf HTML "<tr bgcolor=\"#a6a6a6\">\n";
 
printf HTML "<td style=\"font-weight:bold\">";
printf HTML "Title </td>\n";
printf HTML "<td style=\"font-weight:bold\">";
printf HTML "Author</td>\n";
printf HTML "<td style=\"font-weight:bold\">";
printf HTML "Description</td>\n\n";
 
printf HTML "</tr>\n";
 
 
# The list of books is parsed as a hash in the variable $doc
# We access the book entries using the hash keys as follows
 
foreach my $key (keys (%{$doc->{book}})) {
 
 
	printf HTML "<tr>\n";
 
        # For each book we display the title, author and a discription
	printf HTML "<td>".$doc->{book}->{$key}->{title}."</td>\n";
	printf HTML "<td>".$doc->{book}->{$key}->{author}."</td>\n";
	printf HTML "<td>".$doc->{book}->{$key}->{description}."</td>\n";
 
	printf HTML "</tr>\n";
 
 
}
 
printf HTML "</table>\n";
printf HTML "</body>\n";
printf HTML "</html>\n";
close (HTML);

The output is as follows :

<html>
<body>
<table border=1px cellpadding=5px>
<tr bgcolor="#a6a6a6">
<td style="font-weight:bold">Title </td>
<td style="font-weight:bold">Author</td>
<td style="font-weight:bold">Description</td>
</tr>
 
<tr>
<td>MSXML3: A Comprehensive Guide</td>
<td>O'Brien, Tim</td>
<td>The Microsoft MSXML3 parser is covered in
      detail, with attention to XML DOM interfaces, XSLT processing,
      SAX and more.</td>
</tr>
<tr>
<td>Creepy Crawlies</td>
<td>Knorr, Stefan</td>
<td>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</td>
</tr>
<tr>
<td>The Sundered Grail</td>
<td>Corets, Eva</td>
<td>The two daughters of Maeve, half-sisters,
      battle one another for control of England. Sequel to
      Oberon's Legacy.</td>
</tr>
</table>
 
</body>
</html>

The HTML rendered table of our XML data would look as follows :

Title Author Description
MSXML3: A Comprehensive Guide O'Brien, Tim The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.
Creepy Crawlies Knorr, Stefan An anthology of horror stories about roaches, centipedes, scorpions and other insects.
The Sundered Grail Corets, Eva The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.



Labels: , Wireless Internet Security Coding Network Monitoring

Comment

Enter your comment (wiki syntax is allowed):
GCOUM

Wireless Internet Security Performance RADIUS server Wireless Internet Security Performance RADIUS server