Filtering XML data in a DataSet

by CameronM 27. October 2010 23:11

In the post Read XML File into DataSet we demonstrated how to read every row of data from an XML document by using the ReadXml method. While reading every row of data may be suitable for a smalll set of data, there are few times when you will actually want to return every record from the XML document. There are a number of ways to filter data from an XML document, such as using XQuery which filters the data at the XML Document level. In this example however we will conduct the filtering at the DataSet level, once it has been populated from the XML Document. 

To filter out only the records we want from the DataSet, we will use the DataTable.Select method, as outlined below.

        //declare and load the DataSet from the XML document

        DataSet ds = new DataSet();

        ds.ReadXml("AppSettings.xml");

 

        //only select the rows where the name is CurrentInstance

        DataRow[] rows = ds.Tables[0].Select("name='CurrentInstance'");

 

        //Loop thru the rows to access each row that matches

        foreach (DataRow row in rows)

        {

            //TODO: do something with each row

        }

As you can see the DataTable.Select method returns an array of DataRow objects. We are using the constructor that takes a filterExpression as an argument. In this example we are using a simple equality filterExpression - return all records where name is equal to CurrentInstance. Depending on the datatype of the columns in your DataTable you could create a valid expression using any common operators such as Like, > (greater than), < (less than) and even join multiple exressions together using AND/OR.

Tags: , ,

C# | XML

Reading XML files with child elements

by CameronM 11. March 2010 17:41

In the previous post we looked at how to create a DataSet by reading from a simple XML file. In this port we will investigate how .NET handles more complex XML files and in particular how the .NET DataSet handles XML files containing multiple child elements under the root.

<?xml version="1.0" encoding="utf-8" ?>
<AppSettings>
  <setting name="CurrentInstance" serializeAs="String">
    <value>1</value>
    <currentStatus>
      <status>live</status>
      <lastUpdated>01-01-2010</lastUpdated>
    </currentStatus>
  </setting>
  <setting name="OutputDirectory" serializeAs="String">
    <value>\\SV-OCRMGR\mbrc\ocr\xmloutput</value>
    <currentStatus>
      <status>live</status>
      <lastUpdated>01-01-2010</lastUpdated>
    </currentStatus>
  </setting>
</AppSettings>

As we found in the previous post, when .NET parses the XML into the DataSet it creates a DataTable for the root element, in thise case the table is called AppSettings. With the more complex XML shown above, simple elements, like value are parsed into a column of the AppSettings table. Elements that contain child elements themselves, such as currentStatus are parsed into a new DataTable within the DataSet. .NET also creates a relationship, similar to a foreign key relationship you may be familiar with from database development.

We can modify the code from the previous post to take advantage of this functionality.

//declare a new DataSet
DataSet ds = new DataSet();
 
//read the XML file into the DataSet
ds.ReadXml("AppSettings.xml");
 
 
DataRow[] drChildren;
DataRelation dr;
 
//access each node in the XML file using a DataRow
//in this example the nodes we want are in the settings table
foreach (DataRow drResult in ds.Tables["setting"].Rows)
{
    //access the elements of the XML file using the DataRow columns
    //we will just write them to Trace for now
    System.Diagnostics.Trace.Write(drResult["name"].ToString() + " " + drResult["value"].ToString());
 
    //declare the relationship
    dr = ds.Relations[0];
 
    //although we only expect one row of data GetChildRows returns an array
    drChildren = drResult.GetChildRows(dr);
 
    //iterate thru the array of DataRows to access the values
    for (int i = 0; i < drChildren.Length; i++)
    {
        System.Diagnostics.Trace.Write(" " + drChildren[i]["status"].ToString());
    }
    System.Diagnostics.Trace.WriteLine("");
}

 The output from this file (show below) shows that we have returned the status element for each row in the setting table:

CurrentInstance 1 live
OutputDirectory \\SV-OCRMGR\mbrc\ocr\xmloutput live

What isn't clear is how all this works. To illustrate this better, we can write some code to reveal the relationship .NET created between our original setting table and the child element currentStatus.

//declare a new DataSet
DataSet ds = new DataSet();
 
//read the XML file into the DataSet
ds.ReadXml("AppSettings.xml");
 
//iterate through the relationships
foreach (DataRelation dr in ds.Relations)
{
    System.Diagnostics.Trace.WriteLine(dr.RelationName + " " + dr.ParentTable + " " + dr.ChildTable);
 
    foreach (DataColumn dc in dr.ParentColumns)
    {
        System.Diagnostics.Trace.WriteLine(dc.ColumnName);
    }
}

The output of this codes is:

setting_currentStatus setting currentStatus
setting_Id

.NET has created a DataRelation called setting_currentStatus which links the setting table to the currentStatus table in a field called setting_Id. You'll notice the XML file does not have a field called setting_Id, .NET created that when it parsed the file into the DataSet. In our earlier code we took advantage of this relationship by calling the GetChildRows method on each DataRow in setting. GetChildRows takes as its parameter a DataRelation object, which tells .NET everything it needs to know to grab the related child rows.

As we only had one complex child element in our XML, we were able to use the define the correct DataRelation by simply pointing to the first DataRelation in the collection using the code dr = ds.Relations[0]. Obvisously, when our XML has more than one complex child element, we need a way to determine the DataRelation we want to use. Lets modofy the XML file again and add a new element called currentScope.

<?xml version="1.0" encoding="utf-8" ?>
<AppSettings>
  <setting name="CurrentInstance" serializeAs="String">
    <value>1</value>
    <currentStatus>
      <status>live</status>
      <lastUpdated>01-01-2010</lastUpdated>
    </currentStatus>
    <currentScope>
      <scope>user</scope>
      <lastUpdated>01-01-2010</lastUpdated>
    </currentScope>
  </setting>
  <setting name="OutputDirectory" serializeAs="String">
    <value>\\SV-OCRMGR\mbrc\ocr\xmloutput</value>
    <currentStatus>
      <status>live</status>
      <lastUpdated>01-01-2010</lastUpdated>
    </currentStatus>
    <currentScope>
      <scope>application</scope>
      <lastUpdated>01-01-2010</lastUpdated>
    </currentScope>
  </setting>
</AppSettings>

We have a couple of options when defining the DataRelation, one is to use the index as we did previously, which will return the related children in the order they appear in the XML. The second is to use the name of the DataRelation, which as we saw earlier is created by combining the names of the parent and child tables. Either way, we need to know the structure of the XML file and be confident that it is not going to change too dramatically. Lets modify the code to use the DataRelation name to retrieve the child rows from the newly added currentScope element.

//declare a new DataSet
DataSet ds = new DataSet();
 
//read the XML file into the DataSet
ds.ReadXml("AppSettings.xml");
 
 
DataRow[] drChildren;
DataRelation dr;
 
//access each node in the XML file using a DataRow
//in this example the nodes we want are in the settings table
foreach (DataRow drResult in ds.Tables["setting"].Rows)
{
    //access the elements of the XML file using the DataRow columns
    //we will just write them to Trace for now
    System.Diagnostics.Trace.Write(drResult["name"].ToString() + " " + drResult["value"].ToString());
 
    //declare the relationship
    dr = ds.Relations["setting_currentScope"];
 
    //although we only expect one row of data GetChildRows returns an array
    drChildren = drResult.GetChildRows(dr);
 
    //iterate thru the array of DataRows to access the values
    for (int i = 0; i < drChildren.Length; i++)
    {
        System.Diagnostics.Trace.Write(" " + drChildren[i]["scope"].ToString());
    }
    System.Diagnostics.Trace.WriteLine("");
}

The output from this code:

CurrentInstance 1 user
OutputDirectory \\SV-OCRMGR\mbrc\ocr\xmloutput application

Tags: , , , , , , , , , , , , , , ,

ASP.NET | ASP.NET | ASP.NET | C# | ASP.NET | C# | C# | XML | C# | XML | XML | XML

Read XML File into DataSet

by CameronM 9. March 2010 17:55

There are a number of ways to read an XML file and use the contents in .NET. One nice way, for people who are familiar with using DataSets is the ReadXML method of the DataSet Class. This method reads an XML file into a DataSet that can then be used to retreive and manipulate the elements via standard DataTable and DataRow methods.

In this example we have created a simple XML file used to store application settings that need to be shared amongst several applications. It is bascially a trimmed-down version of the app.config file Visual Studio creates whenever you add settings to s project via the IDE. 

<?xml version="1.0" encoding="utf-8" ?>
<AppSettings>
    <setting name="CurrentInstance" serializeAs="String">
      <value>1</value>
    </setting>
    <setting name="OutputDirectory" serializeAs="String">
      <value>\\SV-OCRMGR\mbrc\ocr\xmloutput\test2</value>
    </setting>
</AppSettings>

To retrieve the value of any setting, we need to grab the setting node from the XML file and return the child element called value.

When we load the DataSet, in this simple case it consists of just one DataTable, called settings. The settings tables consists of two rows. One interesing thing to note is that when the child element returns a single value, there is not distinction made between child elements and attributes. The child element value and the attribute name will both be treated as columns in the resulting DataTable.

//declare a new DataSet
DataSet ds = new DataSet();
 
//read the XML file into the DataSet
ds.ReadXml("AppSettings.xml");
 
//access each node in the XML file using a DataRow
//in this example the nodes we want are in the settings table
foreach (DataRow drResult in ds.Tables["setting"].Rows)
{
     //access the elements of the XML file using the DataRow columns
     //we will just write them to Trace for now
     System.Diagnostics.Trace.Write(drResult["name"].ToString() + " ");
     System.Diagnostics.Trace.WriteLine(drResult["value"].ToString());
}

The output for this procedure is shown below;

CurrentInstance 1
OutputDirectory \\SV-OCRMGR\mbrc\ocr\xmloutput\test2

To see how .NET DataSets handle more complex XML, we can modify the original XML file so that in addition to the value child element, setting also had a complex child element called currentStatus. This element has a number of sub-elements, namely status and lastUpdated.

<?xml version="1.0" encoding="utf-8" ?>
<AppSettings>
    <setting name="CurrentInstance" serializeAs="String">
      <value>1</value>
      <currentStatus>
        <status>live</status>
        <lastUpdated></lastUpdated>
      </currentStatus>
    </setting>
    <setting name="OutputDirectory" serializeAs="String">
      <value>\\SV-OCRMGR\mbrc\ocr\xmloutput\test2</value>
      <currentStatus>
        <status>test</status>
        <lastUpdated></lastUpdated>
      </currentStatus>
    </setting>
</AppSettings>

If we run our code, nothing seems to have changed, however if we add a breakpoint we will see the the DataSet contains two DataTables, setting and currentStatus. In addition, .NET has added a new column to both tables called setting_Id. This is the primary-foreign key between the two tables that is used to link the values in setting to those in currentStatus. Read the next post as we investigate how .NET handles more complicated XML files.

Tags: , , ,

VB.NET | VB.NET | XML | XML