Serializing XML With PHP

Build nested XML documents from PHP data structures with XML_Serializer.

Letting The Creative Juices Flow

PHP has always been ahead of the curve when it comes to supporting new technologies - and XML is no exception. Early versions of PHP came with basic XML support built in; newer versions improved on this by adding support for new XML protocols and technologies like the DOM, WDDX and SOAP, making PHP one of the most versatile and flexible tools for XML application development.

Now, by default, all newer versions of PHP come with the XML SAX parser enabled; however, the DOM module needs to be explicitly turned on at compile time. If you don't have the ability to recompile your PHP build - and if you're sharing space on a server, it's quite likely you won't - then you're up a creek without a paddle if your application needs to dynamically create XML document instances.

So, a creative solution is needed. Which is where this article comes in.

Over the next few pages, I'm going to be introducing you to a free PHP class named XML_Serializer, which allows you to create XML documents from PHP data structures like arrays and objects without requiring you to first recompile your PHP build for DOM support. As you might imagine, this can come in handy in certain situations - for example, if you need a quick and dirty way to build an XML document tree from an external data source, like a MySQL database or a structured text file. So keep reading - you might find the rest of this show interesting!

A Twist In The Tale

The XML_Serializer class comes courtesy of PEAR, the PHP Extension and Application Repository (http://pear.php.net), and has been developed by Stephan Schmidt of phptools.de fame. In case you didn't know, PEAR is an online repository of free PHP software, including classes and modules for everything from data archiving to XML parsing. When you install PHP, a whole bunch of PEAR modules get installed as well.

In case your PHP distribution didn't include XML_Serializer, you can get yourself a copy from the official PEAR Web site, at http://pear.php.net - simply unzip the distribution archive into your PEAR directory and you're ready to roll!

Note that in order to use XML_Serializer, you will need to have the XML_Util package already installed. If you don't already have it, you can get it from the Web site above.

Let's begin with something simple - dynamically constructing an XML document from a PHP array. Here's the code:

<?php

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create array to be serialized
$xml = array( "book" => array(
                "title" => "Oliver Twist",
                "author" => "Charles Dickens"));

// perform serialization
$result = $serializer->serialize($xml);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

Don't worry if it didn't make too much sense - all will be explained shortly. For the moment, just feast your eyes on the output (note that you may need to use the "View Source" feature of your browser to see this):

<array>
<book>
<title>Oliver Twist</title>
<author>Charles Dickens</author>
</book>
</array>

As you can see, the output of the script is a well-formed XML document - all created using PHP code!

Anatomy Class

Let's take a closer look at how I accomplished this.

  1. The first step is, obviously, to include the XML_Serializer class file:
<?php

// include class file
include("XML/Serializer.php");

?>

You can either provide an absolute path to this file, or do what most lazy programmers do - include the path to your PEAR installation in PHP's "include_path" variable, so that you can access any of the PEAR classes without needing to type in long, convoluted file paths.

  1. Next, an object of the class needs to be initialized, and assigned to a PHP variable.
<?php

// create object
$serializer = new XML_Serializer();

?>

This variable serves as the control point for future manipulation of XML_Serializer properties and methods.

  1. Next, you need to put together the data that you plan to encode in XML. The simplest way to do this is to create a nested set of arrays whose structure mimics that of the final XML document you desire.
<?php

// create array to be serialized
$xml = array( "book" => array(
                "title" => "Oliver Twist",
                "author" => "Charles Dickens"));

?>
  1. With all the pieces in place, all that's left is to perform the transformation. This is done via the object's serialize() method, which accepts a PHP structure and returns a result code indicating whether or not the serialization was successful.
<?php

// perform serialization
$result = $serializer->serialize($xml);

?>
  1. Once the serialization is complete, you can do something useful with it - write it to a file, pass it through a SAX parser or - as I've done here - simply output it to the screen for all to admire:
<?php

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

The getSerializedData() method returns the serialized XML document tree as is, and serves a very useful purpose in debugging - you'll see it often over the next few pages.

Total Satisfaction

Now, if you're a nitpicker, the output of the example on the previous page still won't satisfy you. Here's why:

  1. The serialized XML document does not contain the XML declaration at the top.

  2. The root element of the document is called <array>, whereas what you actually want is for it to be <library>.

  3. The XML document is not correctly indented.

In order to account for these requirements, XML_Serializer comes with a setOption() method, which allows you to customize the behaviour of the serializer to your needs. To illustrate, consider the following example, which solves the first problem noted above:

<?php

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create array to be serialized
$xml = array( "book" => array(
                "title" => "Oliver Twist",
                "author" => "Charles Dickens"));

// add XML declaration
$serializer->setOption("addDecl", true);

// perform serialization
$result = $serializer->serialize($xml);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

Here's the output:

<?xml version="1.0"?>
<array>
<book>
<title>Oliver Twist</title>
<author>Charles Dickens</author>
</book>
</array>

Thus, the setOption() method takes two arguments - a variable and its value - and uses that information to tell the serializer how to return the XML document.

Next, how about fixing the root element and the indentation?

<?php

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create array to be serialized
$xml = array( "book" => array(
                "title" => "Oliver Twist",
                "author" => "Charles Dickens"));

// add XML declaration
$serializer->setOption("addDecl", true);

// indent elements
$serializer->setOption("indent", "    ");

// set name for root element
$serializer->setOption("rootName", "library");

// perform serialization
$result = $serializer->serialize($xml);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

And here's the result:

<?xml version="1.0"?>
<library>
    <book>
        <title>Oliver Twist</title>
        <author>Charles Dickens</author>
    </book>
</library>

Pretty, isn't it?

No Attribution

Now, what about those pesky attributes? Well, XML_Serializer comes with an option that allows you to represent array keys as attributes of the enclosing element (instead of elements themselves). Take a look:

<?php

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create array to be serialized
$xml = array( "book" => array(
                "title" => "Oliver Twist",
                "author" => "Charles Dickens"));

// add XML declaration
$serializer->setOption("addDecl", true);

// indent elements
$serializer->setOption("indent", "    ");

// set name for root element
$serializer->setOption("rootName", "library");

// represent scalar values as attributes instead of element
$serializer->setOption("scalarAsAttributes", true);

// perform serialization
$result = $serializer->serialize($xml);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

Here's the output:

<?xml version="1.0"?>
<library>
    <book author="Charles Dickens" title="Oliver Twist" />
</library>

Note that in order for this to work, the array key which is to be represented as an attribute should point to a single scalar value and not another array or object. To understand this better, consider the following example, which demonstrates the difference:

<?php

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create array to be serialized
$xml = array( "book" => array(
                "title" => "Oliver Twist",
                "author" => "Charles Dickens",
                "price" => array(   "currency" => "USD",
                            "amount" => 24.50)));

// add XML declaration
$serializer->setOption("addDecl", true);

// indent elements
$serializer->setOption("indent", "    ");

// set name for root element
$serializer->setOption("rootName", "library");

// represent scalar values as attributes instead of element
$serializer->setOption("scalarAsAttributes", true);

// perform serialization
$result = $serializer->serialize($xml);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

And here's the revised output:

<?xml version="1.0"?>
<library>
    <book author="Charles Dickens" title="Oliver Twist">
        <price amount="24.5" currency="USD" />
    </book>
</library>

To add attributes to the root node, set them with the "rootAttributes" option, as below:

<?php

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create array
$xml = array("name" => "John Doe", "age" => 34, "sex" => "male");

// add XML declaration
$serializer->setOption("addDecl", true);

// indent elements
$serializer->setOption("indent", "    ");

// set name for root element
$serializer->setOption("rootName", "person");

// set attributes for root element
$serializer->setOption("rootAttributes", array("id" => 346747));

// perform serialization
$result = $serializer->serialize($xml);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

Here's the output:

<?xml version="1.0"?>
<person id="346747">
    <name>John Doe</name>
    <age>34</age>
    <sex>male</sex>
</person>

An Object Lesson

You can also serialize objects, in much the same way as you serialize arrays. Take a look at the following example, which demonstrates how:

<?php

// object definition
class Automobile
{

    // object properties
    public $color;
    public $year;
    public $model;

    public function setAttributes($c, $y, $m)
    {
        $this->color = $c;
        $this->year = $y;
        $this->model = $m;
    }
}

// include class file
include("XML/Serializer.php");

// create object
$serializer = new XML_Serializer();

// create object to be serialized
$car = new Automobile;
$car->setAttributes("blue", 1982, "Mustang");

// add XML declaration
$serializer->setOption("addDecl", true);

// indent elements
$serializer->setOption("indent", "    ");

// set name for root element
$serializer->setOption("rootName", "car");

// perform serialization
$result = $serializer->serialize($car);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

In this example, I've first defined a class called Automobile, and created some methods and properties for it. Then, further down in the script, I've instantiated an object of the class and set some very specific values for the object's properties. This object has then been serialized via XML_Serializer's serialize() method.

Here's the result:

<?xml version="1.0"?>
<car>
    <color>blue</color>
    <year>1982</year>
    <model>Mustang</model>
</car>

Not My Type

One of XML_Serializer's other interesting features is its ability to store data type information along with each value in the XML document. Called "type hints", this data type information can help in distinguishing between the integer 6 and the string "6", and comes in handy if your XML application is strongly typed.

To enable type hints, you need to simply set the "typeHints" option to true. The following example illustrates:

<?php

// include class file
include("XML/Serializer.php");

// set options
$options = array(   "addDecl" => true,
            "indent" => "    ",
            "rootName" => "car",
            "typeHints" => true);

// create object
$serializer = new XML_Serializer($options);

// create array
$car = array("color" => "blue", "year" => 1982, "model" => "Mustang", "price" => 15000.00);

// perform serialization
$result = $serializer->serialize($car);

// check result code and display XML if success
if ($result === true) {
    echo $serializer->getSerializedData();
}

?>

Once type hints are enabled, every element within the XML document will bear an additional attribute indicating the data type of the value contained within it. Here's what the output of the example above looks like:

<?xml version="1.0"?>
<car _type="array">
    <color _type="string">blue</color>
    <year _type="integer">1982</year>
    <model _type="string">Mustang</model>
    <price _type="double">15000</price>
</car>

Note that in the example above, I've used a slightly different method to set serializer options - I've created an array of options and values, and passed the array to the object constructor. When you have a large number of options to set, this method can save you a few lines of code.

Travelling In Reverse

Good things come in twos - Mickey and Donald, Tom and Jerry, yin and yang - and so it's no surprise that XML_Serializer has a doppelganger of its own. Called XML_Unserializer, this class can take an XML document and convert it into a series of nested PHP structures, suitable for use in a PHP script.

In order to understand how this works, consider the following XML document:

<?xml version='1.0'?>
<library>
    <book id="MFRE001">
        <title>The Adventures of Sherlock Holmes</title>
        <author>Arthur Conan Doyle</author>
        <price currency="USD">24.95</price>
    </book>
    <book id="MFRE002">
        <title>Life of Pi</title>
        <author>Yann Martel</author>
        <price currency="USD">7.99</price>
    </book>
    <book id="MFRE003">
        <title>Europe on a Shoestring</title>
        <author>Lonely Planet</author>
        <price currency="USD">16.99</price>
    </book>
</library>

Now, in order to convert this XML document into a PHP structure, simply put XML_Unserializer to work on it, as below:

<?php

// include class file
include("XML/Unserializer.php");

// create object
$unserializer = &new XML_Unserializer();

// unserialize the document
$result = $unserializer->unserialize("library.xml", true);

// dump the result
$data = $unserializer->getUnserializedData();
print_r($data);

?>

Here, the unserialize() method accepts either a string containing XML data or an XML file (set the second argument to false or true depending on which one you are passing) and returns a PHP structure representing the XML document. Here's what the output looks like:

Array
(
    [book] => Array
        (
            [0] => Array
                (
                    [title] => The Adventures of Sherlock Holmes
                    [author] => Arthur Conan Doyle
                    [price] => 24.95
                )

            [1] => Array
                (
                    [title] => Life of Pi
                    [author] => Yann Martel
                    [price] => 7.99
                )

            [2] => Array
                (
                    [title] => Europe on a Shoestring
                    [author] => Lonely Planet
                    [price] => 16.99
                )

        )

)

Now, in order to access the title of the third book (for example), you would use the notation

$data['book'][2]['title'];

which would return

Europe on a Shoestring

Note that XML_Unserializer uses the type hints generated in the serialization process to accurately map XML elements to PHP data types. If these hints are unavailable (as in the example above), XML_Unserializer will "guess" the type of each value. A look at the source code of the class reveals that "complex structures will be arrays and tags with only CData in them will be strings."

Keeping It Simple

It's also possible to convert an XML document into a PHP object instead of a nested set of arrays, simply by setting appropriate options for the unserializer. Consider the following example, which demonstrates how this may be done:

<?php

// include class file
include("XML/Unserializer.php");

// tell the unserializer to create an object
$options = array("complexType" => "object");

// create object
$unserializer = &new XML_Unserializer($options);

// unserialize the document
$result = $unserializer->unserialize("library.xml", true);

// dump the result
print_r($unserializer->getUnserializedData());

?>

Here's the output:

stdClass Object
(
    [book] => Array
        (
            [0] => stdClass Object
                (
                    [title] => The Adventures of Sherlock Holmes
                    [author] => Arthur Conan Doyle
                    [price] => 24.95
                )

            [1] => stdClass Object
                (
                    [title] => Life of Pi
                    [author] => Yann Martel
                    [price] => 7.99
                )

            [2] => stdClass Object
                (
                    [title] => Europe on a Shoestring
                    [author] => Lonely Planet
                    [price] => 16.99
                )

        )

)

In this format, you can use standard object notation to access (for example) the title of the last book - the notation

$obj->book[2]->title

would return

Europe on a Shoestring

Employment Options

Now, while all this is fine and dandy, how about using all this new-found knowledge for something practical?

This next example does just that, demonstrating how the XML_Serializer class can be used to convert data stored in a MySQL database into an XML document, and write it to a file for later use. Here's the MySQL table I'll be using,

mysql> SELECT * FROM employees;
+-----+--------+--------+-----+-----+----------------+---------+
| id  | lname  | fname  | age | sex | department     | country |
+-----+--------+--------+-----+-----+----------------+---------+
|  54 | Doe    | John   |  27 | M   | Engineering    | US      |
| 127 | Jones  | Sue    |  31 | F   | Finance        | UK      |
| 113 | Woo    | David  |  26 | M   | Administration | CN      |
| 175 | Thomas | James  |  34 | M   | Finance        | US      |
| 168 | Kent   | Jane   |  29 | F   | Administration | US      |
|  12 | Kamath | Ravina |  35 | F   | Finance        | IN      |
+-----+--------+--------+-----+-----+----------------+---------+
6 rows in set (0.11 sec)

and here's what I want my target XML document to look like:

<?xml version="1.0"?>
<employees>
    <employee>
        <lname>Doe</lname>
        <fname>John</fname>
        <age>27</age>
        <sex>M</sex>
        <department>Engineering</department>
        <country>US</country>
    </employee>
    <employee>
        <lname>Jones</lname>
        <fname>Sue</fname>
        <age>31</age>
        <sex>F</sex>
        <department>Finance</department>
        <country>UK</country>
    </employee>
    <employee>
        <lname>Woo</lname>
        <fname>David</fname>
        <age>26</age>
        <sex>M</sex>
        <department>Administration</department>
        <country>CN</country>
    </employee>
    <employee>
        <lname>Thomas</lname>
        <fname>James</fname>
        <age>34</age>
        <sex>M</sex>
        <department>Finance</department>
        <country>US</country>
    </employee>
    <employee>
        <lname>Kent</lname>
        <fname>Jane</fname>
        <age>29</age>
        <sex>F</sex>
        <department>Administration</department>
        <country>US</country>
    </employee>
    <employee>
        <lname>Kamath</lname>
        <fname>Ravina</fname>
        <age>35</age>
        <sex>F</sex>
        <department>Finance</department>
        <country>IN</country>
    </employee>
</employees>

With XML_Serializer, accomplishing this is a matter of a few lines of code. Here they are:

<?php

// include class file
include("XML/Serializer.php");

// set output filename
$filename = 'employees.xml';

// set options
$options = array(   "addDecl" => true,
            "defaultTagName" => "employee",
            "indent" => "    ",
            "rootName" => "employees");

// create object
$serializer = new XML_Serializer($options);

// open connection to database
$connection = mysql_connect("localhost", "user", "secret") or die("Unable to connect!");

// select database
mysql_select_db("db1") or die("Unable to select database!");

// execute query
$query = "SELECT * FROM employees";
$result = mysql_query($query) or die("Error in query: $query. " . mysql_error());

// iterate through rows and print column data
while ($row = mysql_fetch_array($result)) {
    $xml[] = array( "lname" => $row[1],
                "fname" => $row[2],
                "age" => $row[3],
                "sex" => $row[4],
                "department" => $row[5],
                "country" => $row[6]);
}

// close database connection
mysql_close($connection);

// perform serialization
$result = $serializer->serialize($xml);

// open file
if (!$handle = fopen($filename, 'w')) {
    print "Cannot open file ($filename)";
    exit;
}

// write XML to file
if (!fwrite($handle, $serializer->getSerializedData())) {
    print "Cannot write to file ($filename)";
    exit;
}

// close file
fclose($handle);

?>

Pretty simple, once you know how it works. First, I've opened up a connection to the database and retrieved all the records from the table. Then I've instantiated a new document tree and iterated over the result set, adding a new set of nodes to the tree at each iteration. Finally, once all the rows have been processed, the dynamically generated tree is written to a file for later use.

Linking Out

And that's about it for this article. Over the last few pages, I showed you how you to build an XML document tree even if your PHP build doesn't support the XML DOM, via the free add-on XML_Serializer class from PEAR. I showed you how to programmatically create an XML document from an array or an object, how to indent XML document nodes, how to attach attributes to elements, and how to customize the behaviour of the serializer. I also showed you to how to reverse-serialize XML documents into PHP arrays or objects for use within a PHP script, together with examples of how type hints could help to make this a more accurate process. Finally, I wrapped things up with a composite example that demonstrated a practical, real-world use for all this code - converting the data in a MySQL database into XML and writing it to a file.

All this is, of course, only the tip of the iceberg - there are an infinite number of possibilities with power like this at your disposal. To find out what else you can do with XML and PHP, I'd encourage you to visit the following links:

XML Basics, at http://www.melonfire.com/community/columns/trog/article.php?id=78

XSL Basics, at http://www.melonfire.com/community/columns/trog/article.php?id=82

Using PHP With XML, at http://www.melonfire.com/community/columns/trog/article.php?id=71

XSLT Transformation With PHP And Sablotron, at http://www.melonfire.com/community/columns/trog/article.php?id=97

Building XML Trees With PHP, at http://www.melonfire.com/community/columns/trog/article.php?id=180

The XML and PHP book, at http://www.xmlphp.com/

Till next time...be good!

This article was first published on12 Feb 2004.