Parsing XML files in Perl using XML::Simple
The Perl XML::Simple module provides an easy way for parsing XML files. Using the provided XMLin function, the whole XML file is converted into a hash table which can then be used to access the XML entries. The following article uses a very simple example to explain how to use the XML::Simple module. The example script converts a catalog of books stored in an XML file into a HTML table.
A simple XML parser using XML::Simple
As an example, let's consider the following xml file.
<?xml version="1.0"?> <catalog> <book id="bk111"> <author>O'Brien, Tim</author> <title>MSXML3: A Comprehensive Guide</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2000-12-01</publish_date> <description>The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.</description> </book> <book id="bk108"> <author>Knorr, Stefan</author> <title>Creepy Crawlies</title> <genre>Horror</genre> <price>4.95</price> <publish_date>2000-12-06</publish_date> <description>An anthology of horror stories about roaches, centipedes, scorpions and other insects.</description> </book> <book id="bk105"> <author>Corets, Eva</author> <title>The Sundered Grail</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2001-09-10</publish_date> <description>The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.</description> </book> </catalog> </xml>
The above xml file is a database that stores a catalog of books.
Using the XML::Simple module, we are going to implement a simple Perl script that parses the xml file and renders its contents as an HTML table.
This is the script :
use XML::Simple; my $file = './file.xml'; # Here we create a new parser object my $parser= XML::Simple->new(); # We tell the parser to parse the input file into the # variable $doc. my $doc = $parser->XMLin($file); open (HTML, '>output.html'); printf HTML "<html>\n"; printf HTML "<body>\n"; printf HTML "<table border=1px cellpadding=5px>\n"; printf HTML "<tr bgcolor=\"#a6a6a6\">\n"; printf HTML "<td style=\"font-weight:bold\">"; printf HTML "Title </td>\n"; printf HTML "<td style=\"font-weight:bold\">"; printf HTML "Author</td>\n"; printf HTML "<td style=\"font-weight:bold\">"; printf HTML "Description</td>\n\n"; printf HTML "</tr>\n"; # The list of books is parsed as a hash in the variable $doc # We access the book entries using the hash keys as follows foreach my $key (keys (%{$doc->{book}})) { printf HTML "<tr>\n"; # For each book we display the title, author and a discription printf HTML "<td>".$doc->{book}->{$key}->{title}."</td>\n"; printf HTML "<td>".$doc->{book}->{$key}->{author}."</td>\n"; printf HTML "<td>".$doc->{book}->{$key}->{description}."</td>\n"; printf HTML "</tr>\n"; } printf HTML "</table>\n"; printf HTML "</body>\n"; printf HTML "</html>\n"; close (HTML);
The output is as follows :
<html> <body> <table border=1px cellpadding=5px> <tr bgcolor="#a6a6a6"> <td style="font-weight:bold">Title </td> <td style="font-weight:bold">Author</td> <td style="font-weight:bold">Description</td> </tr> <tr> <td>MSXML3: A Comprehensive Guide</td> <td>O'Brien, Tim</td> <td>The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.</td> </tr> <tr> <td>Creepy Crawlies</td> <td>Knorr, Stefan</td> <td>An anthology of horror stories about roaches, centipedes, scorpions and other insects.</td> </tr> <tr> <td>The Sundered Grail</td> <td>Corets, Eva</td> <td>The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.</td> </tr> </table> </body> </html>
The HTML rendered table of our XML data would look as follows :
| Title | Author | Description |
| MSXML3: A Comprehensive Guide | O'Brien, Tim | The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more. |
| Creepy Crawlies | Knorr, Stefan | An anthology of horror stories about roaches, centipedes, scorpions and other insects. |
| The Sundered Grail | Corets, Eva | The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy. |
| Labels: coding, howto |
|

Comment