Gravatar Ben Ramey's Blog
Scripture, programming problems, solutions and stories.

Formatting multiple XML files at a time

I recently had a situation where I needed to compare many XML files generated by a program of one version to the same set of XML files produced by a previous version of the same program. Unfortunately, the sets of XML files were formatted differently and so doing a file comparison with Beyond Compare (a GREAT file comparison tool, by the way) was going to be useless.

So, I started looking for a way to quickly format all the files in each set the same way with one program. I looked into using Notepad++ which has a great XML Tools plugin (look for it under Plugins > Plugin Manager). I tried combining the plugin’s formatting commands with a macro that would format the XML file, save it and close it. So, I could easily open the few hundred files I had to format in Notepad++ (one set at a time), then run the macro multiple times (Macro > Run a Macro Multiple Times…). This would run through each file until all were formatted and closed. However, after working with it for a while, I couldn’t get the Notepad++ macro system to actual perform the XML Tools plugin format command. The macro would successfully run, saving and closing the file. But, when I checked the files they had not been formatted. I worked with it for a while, but could not figure out what the matter was.

I knew the real solution had to be some type of command-line utility and a batch file. So, I started looking into that. The solution I ended up with was just that.

First of all, I found HTML Tidy which I could run from the Windows command line to format a file. Using a configuration file for the tidy.exe (placed in the same directory as tidy.exe and named tidcfg.ini–although neither matters, see below) that looked like this:

indent:yes
indent-attributes:yes

I got the formatting I wanted.

Now, all I had to do was brush up on my Windows batch command skills to run tidy.exe on multiple files. Easy enough! This is what my batch file looked like:

for /d %%X in (C:\<path_to_parent_directory>\*) do (c:\<path_to_tidy.exe>\tidy.exe -m -xml -config c:\<path_to_tidy.ini_file>\tidycfg.ini %%X\<xml_file_name>.xml)

I had a folder structure where there were hundreds of directories inside this one parent directory. Each of the child directories had a single XML file in it. Therefore, I needed the C:<path_to_parent_directory>* wildcard.

So, what this batch file does is simply look at each child directory (with the /d switch) in my parent directory. In each directory it runs (do) the tidy.exe program, tells it to modify the input file itself (-m) instead of saving the formatted XML to another file, tells it that the input file is valid XML (-xml) and then tells it where the tidycfg.ini file is (-config). Finally, it tells tidy.exe to take the current directory (%%X) and use the .xml file as the input file to format.

This little set up worked very well and quickly formatted all of my files in the same way so that I could successfully compare them with Beyond Compare.

Comments