Extracting Groups from a MatchCollection




Extracting Groups from a MatchCollection

Problem

You have a regular expression that contains one or more named groups, such as the following:

	\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\

where the named group TheServer will match any server name within a UNC string, and TheService will match any service name within a UNC string.

You need to store the groups that are returned by this regular expression in a keyed collection (such as a Dictionary<string, Group>) in which the key is the group name.

Solution

The ExtractGroupings method shown in Figure obtains a set of Group objects keyed by their matching group name.

ExtractGroupings method

using System;
using System.Collections;
using System.Collections.Generics;
using System.Text.RegularExpressions;

public static List<Dictionary<string, Group>> ExtractGroupings(string source,
                                                           string matchPattern,
                                                           bool wantInitialMatch)
{
    List<Dictionary<string, Group>> keyedMatches = 
        new List<Dictionary<string, Group>>();
    int startingElement = 1;
    if (wantInitialMatch)
    {
        startingElement = 0;
    }

    Regex RE = new Regex(matchPattern, RegexOptions.Multiline);
    MatchCollection theMatches = RE.Matches(source);

    foreach(Match m in theMatches)
    {
        Dictionary<string, Group> groupings = new Dictionary<string, Group>();

        for (int counter = startingElement; counter < m.Groups.Count; counter++)
        {
            // If we had just returned the MatchCollection directly, the
            // GroupNameFromNumber method would not be available to use.
            groupings.Add(RE.GroupNameFromNumber(counter), m.Groups[counter]);
        }

        keyedMatches.Add(groupings);
    }

    return (keyedMatches);
}

The ExtractGroupings method can be used in the following manner to extract named groups and organize them by name:

	public static void TestExtractGroupings()
	{
	    string source = @"Path = ""\\MyServer\MyService\MyPath;
	                              \\MyServer2\MyService2\MyPath2\""";
	    string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";

	    foreach (Dictionary<string, Group> grouping in
	             ExtractGroupings(source, matchPattern, true))
	    {
	        foreach (KeyValuePair kvp in grouping)
	            Console.WriteLine("Key / Value = " + kvp.Key + " / " + kvp.Value);
	        Console.WriteLine("");
	    }
	}

This test method creates a source string and a regular expression pattern in the MatchPattern variable. The two groupings in this regular expression are highlighted here:

	string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";

The names for these two groups are: TheServer and TheService. Text that matches either of these groupings can be accessed through these group names.

The source and matchPattern variables are passed in to the ExTRactGroupings method, along with a Boolean value, which is discussed shortly. This method returns a List<T>; containing Dictionary<string,Group> objects. These Dictionary<string,Group> objects contain the matches for each of the named groups in the regular expression, keyed by their group name.

This test method, TestExtractGroupings, returns the following:

	Key / Value = 0 / \\MyServer\MyService\
	Key / Value = TheService / MyService
	Key / Value = TheServer / MyServer

	Key / Value = 0 / \\MyServer2\MyService2\
	Key / Value = TheService / MyService2
	Key / Value = TheServer / MyServer2

If the last parameter to the ExtractGroupings method were to be changed to false, the following output would result:

	Key / Value = TheService / MyService
	Key / Value = TheServer / MyServer

	Key / Value = TheService / MyService2
	Key / Value = TheServer / MyServer2

The only difference between these two outputs are that the first grouping is not displayed when the last parameter to ExtractGroupings is changed to false. The first grouping is always the complete match of the regular expression.

Discussion

Groups within a regular expression can be defined in one of two ways. The first way is to add parentheses around the subpattern that you wish to define as a grouping. This type of grouping is sometimes labeled as unnamed. This grouping can later be easily extracted from the final text in each Match object returned by running the regular expression. The regular expression for this recipe could be modified, as follows, to use a simple unnamed group:

	string matchPattern = @"\\\\(\w*)\\(\w*)\\";

After running the regular expression, you can access these groups using a numeric integer value starting with 1.

The second way to define a group within a regular expression is to use one or more named groups. A named group is defined by adding parentheses around the subpattern that you wish to define as a grouping and, additionally, adding a name to each grouping, using the following syntax:

	(?<Name>\w*)

The Name portion of this syntaxis the name you specify for this group. After executing this regular expression, you can access this group by the name Name.

To access each group, you must first use a loop to iterate each Match object in the MatchCollection. For each Match object, you access the GroupCollection's indexer, using the following unnamed syntax:

	string group1 = m.Groups[1].Value;
	string group2 = m.Groups[2].Value;

or the following named syntax where m is the Match object:

	string group1 = m.Groups["Group1_Name"].Value;
	string group2 = m.Groups["Group2_Name"].Value;

If the Match method was used to return a single Match object instead of the MatchCollection, use the following syntax to access each group:

	// Unnamed syntax
	string group1 = theMatch.Groups[1].Value;
	string group2 = theMatch.Groups[2].Value;

	// Named syntax
	string group1 = theMatch.Groups["Group1_Name"].Value;
	string group2 = theMatch.Groups["Group2_Name"].Value;

where theMatch is the Match object returned by the Match method.

See Also

See the ".NET Framework Regular Expressions" and "Hashtable Class" topics in the MSDN documentation.