I feel like I'm the last developer in the world who is dipping his toes into XML. So what I have to say about it is probably useless to the rest of you, but here are my latest experiences anyway.
So how's this for not falling prey to speculative generality or premature abstraction: I don't like to make tools until a clear need is demonstrated, when you can tell that a content creator is going to be a bottleneck and they need workflow improvements. On Schizoid and through much of our current project code was always the bottleneck. (Actually, just recently it may have become art. Time to make some art tools.)
To that end, on Schizoid and through much of this project we've had content creators do their creation directly in the code base, editing .h files (well, with Schizoid, C# files) and data structure declarations, piles of curly braces and all.
That all changed recently because Richard's machine had trouble installing Visual Studio, so I thought it was time to do the right thing and put our data in text files. Not to mention, we're eventually going to have to, for Downloadable Content purposes. It's been many years since I've written code to read data from a text file. My MO has always been to use fscanf.
So, time to catch up with the new millennium. Do I really want XML, for starters. There was a whole XML - JSON thing going around a while back - my friend Bryan McNett was a big JSON advocate, and he did really cool stuff in the Spidey 3 codebase where you could tag data structures in your h files and a text processor would create the JSON importing code for you. He's written articles about it but I can't seem to find them.
In the end, I decided to try XML rather than JSON, because there seemed to be more support (XML handily won the Googlefight), but the deciding factor was being able to easily load XML files into Excel. Having all of our unit data in one big table that's nicely formatted (the alternating dark blue and light blue rows soothe my soul) won me over. And McNett's .h file markups may have been cool, but we don't have enough different data structures to make the effort of adding another step to our build process worth it. (Not to mention there's no reason you couldn't theoretically do the same thing with xml.)
The next question becomes - which XML library do I use with C++? With C# it's built in, yo. Again I mostly relied on Google - when in doubt, do what's popular. The second hit was TinyXML, which seemed great. Free license and small. Hooking it up and getting started was fairly painless. I usually felt like it took 2-3 lines of code to do what should take 1, but some small wrappers and I was ok.
It took me about a day and a half to learn pidgin xml, hook up the library, and get our data converted. I tried creating a schema to assist in the XML importing to Excel and threw up my hands - it seems to do a fine job on its own, anyway. We are schemaless. A schema, I know, would help with data validation. We do some in code, but not enough, yet.
That's one of the big things I miss about the old way - typing our data straight into the code - is that the compiler validates our data for us, and does a great job.
One thing I didn't like about the old way was simply formatting the file - you end up with a lot of comma separation and it's easy to lose track of: is this float here his hitpoints or his armor? Etc. The new files are much easier to read and maintain.
What about XML vs. rolling our own format? This also feels like a win. Not a huge win--here I am writing a lot of XML glue code instead of a parser--but a material one.
And so, I've finally caught up with the new millenium.



Careful with XML on embedded platforms. I found that parsers like TinyXML make tons of tiny allocations as it parses and is a recipe for fragmentation. I actually wound up making my own parser that would perform an analysis pass to determine the memory needed to represent the the DOM and didn't copy any strings, just pointed back to the original allocated chunk the file lived in. It used 2 allocations to parse a file.
Posted by: Paul | May 25, 2009 at 10:44 AM
I was recently looking into xml parsers for console and found
RapidXML.
Considerably faster than TinyXML and seems to have better allocation behavior, but I haven't tried it out yet.
Posted by: Vince | May 25, 2009 at 11:07 AM
The fundamental problem with XML and data structure serialization is that there is no single obvious way to represent numbers or arrays.
If you hand a C++ struct definition to two strangers, lock them in separate rooms, and ask one to write an XML writer and the other to write an XML reader for the struct, there is a near 0% chance that the programs will be compatible.
This is not true of image files - if you hand an image definition (size, color depth) to the two strangers, and ask them to write a GIF writer and GIF reader, compatibility is guaranteed.
This is true of JSON as well. Generally speaking, there is only one obvious way to transmit a simple data structure via JSON.
Since your application is C#, you are spared the task of writing "glue code" to serialize structures as XML, because it is in the standard library.
In this sense, XML as a data language is platform-specific.
Yes there are many extra technologies on top of XML to support numbers and arrays, but again there is no single obvious choice, and few are implemented well across all platforms as XML is.
Posted by: Bryan McNett | May 25, 2009 at 04:21 PM
Riffing off seeing Excel and XML in the same post, in my last 2 projects we used Excel's XML files (yes, it will save spreadsheets in XML) and then wrote parsers to read the files. If you have an XML library it's literally just a few lines of code to read them or at least the parts you're likely to care about.
On one project I used them for the artists to enter animation information like
animationname, filename, startframe, endframe
On another I used to generate translation tables and images.
In both cases some compile time tool would read and parse the Excel XML file and use the data or create something that eventually turned into binary.
In my last project though, the tech team had made a network filing system so both the PS3 and 360 could see our network. Added to that they both had file change notification. So, for this project we had excel XML file parsing directly in the game and if we noticed the file changed we'd reload on the fly. Much of the AI was stored in Excel, so a designer would change some fields, save and the game would start using his changes immediately as he was testing his AI.
If you want to know how to parse Excel's XML files there's an example here.
http://greggman.com/games/excel-python-xml.htm
Note that you have to save as Excel 2003 XML because as of Office 2007 they've made the default XML format much much harder to parse. Still, once you save once if you re-open that file it will automatically keep saving to the same format.
My personal preference is I'd use XML only during development and save out some binary format for release but in the last project time didn't allow us to take that step and yet the game shipped.
Posted by: gman | May 26, 2009 at 12:16 AM
Try LLamaXML. its a pull parser ( not push like SAX or memory-hungry DOM )
I manage to process gigabytes of xml-format server logfiles on a winmobile-running PDA with that.
BTW, TinyXML also has completely adorable C++ interfaces, for instance "ticpp"
Posted by: kert | May 26, 2009 at 04:19 PM
For me the main difference between XML and JSON is that XML has roots in *markup* languages: you have a document with text and you want to annotate that text with things like “this is a header” or “make this bold”. JSON on the other hand came from *data structures*. You can see this when you compare the fundamental structures: XML gives you an element, which has both an unordered set of attributes, and an ordered list of children, and then there are text nodes too; JSON gives you arrays, maps, numbers, strings, booleans.
Yes, XML has more support and that's a good reason to use it, but for data serialization, JSON “fits” better because it matches the data structures used in most programming languages.
Posted by: Amit Patel | June 11, 2009 at 01:57 PM
* Leightweight XML parser:
http://rapidxml.sourceforge.net/
* XML alternatives:
http://stackoverflow.com/questions/44207/what-are-good-alternative-data-formats-to-xml
Posted by: BQ | June 30, 2009 at 01:48 AM
The coolest XML technology is probably
vtd-xml
http://vtd-xml.sf.net
Posted by: Tom | February 09, 2010 at 03:44 PM