kSQL 

Shredding XML Containing Different Roots & Nodes

While coding a recent report (the same one mentioned in my post on Querying AD with Linked Servers) I needed to shred a column with XML values. The challenge was twofold: the data was stored as VARCHAR(MAX), not XML, and contained malformed XML insofar as there were different root elements and different nodes inside the roots. My first attempt involved a cobbled together mis-mash of LIKE statements and nodes() methods sprawled over thousands of lines of code that would fail if any row with a new root element was entered. It was insufficient to say the least. I approached our BI department and Grace Politano was kind enough to look over my situation and provide me with the elegant solution you see below. I wasn't able to find an answer for this on Google so I'm very excited to be able to punt this into the blogosphere.


First, let's put an example of the strings into a temp table. Note that the datatype of the string must be NVARCHAR(MAX) because in order to convert it to XML it must be Unicode.

IF OBJECT_ID('tempdb.dbo.#Data','U'IS NOT NULL DROP TABLE #Data; 

CREATE TABLE #Data( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, String NVARCHAR(MAX));

INSERT INTO #Data(String) 

VALUES('<?xml version="1.0" encoding="utf-16"?><BMW xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Model>135i</Model><Color>White</Color></BMW>')

,('<?xml version="1.0" encoding="utf-16"?><Audi xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Model>A5</Model><Year>2012</Year><Color>Black</Color></Audi>')

,('<?xml version="1.0" encoding="utf-16"?><Audi xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Model>R8</Model><Color>Silver</Color><Price>ArmLeg</Price></Audi>')

,('<?xml version="1.0" encoding="utf-16"?><Lexus xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Model>ISF</Model></Lexus>')

,('<?xml version="1.0" encoding="utf-16"?><Infiniti xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Model>G37S</Model><Color>Blue</Color><Interior>Grey</Interior></Infiniti>');


If you select from this table you should see a simple two column output with the rows of unconverted XML as strings. Now let's create a second temp table and get down to the business of converting these strings.


IF OBJECT_ID('tempdb.dbo.#xmlData','U'IS NOT NULL DROP TABLE #xmlData;

CREATE TABLE #xmlData( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, xmlString XML);

INSERT INTO #xmlData(xmlString) 

SELECT CONVERT(XML,String) 

FROM #Data;


Now when you select from our new table, #xmlData, you should see the same rows but now converted to XML. We're halfway there; now for the fun part - shredding!


SELECT DISTINCT ID, MAX(Node), MAX(NodeValue) 

FROM ( 

SELECT ID,

x.value('local-name(.)','VARCHAR(50)'AS 'Node',

x.value('.','VARCHAR(50)') AS 'NodeValue'

FROM #xmlData 

CROSS APPLY xmlString.nodes('/*/*') y(x)) AS x 

GROUP BY ID;

‚Äč

This outputs distinct nodes and node values using a wildcard " * " for the root element instead of explicitly stating it. MAX limits the output to distinct values. Also note that you'll need to use an appropriate length for the datatype within the subquery to avoid truncation.