Parsing a URI




Parsing a URI

Problem

You need to split a uniform resource identifier (URI) into its constituent parts.

Solution

Construct a System.Net.Uri object and pass the URI to the constructor. This class constructor parses out the constituent parts of the URI and allows access to them via the Uri properties. You can then display the URI pieces individually, as shown in Figure.

ParseURI method

public static void ParseUri(string uriString)
{
    try
    {
        // Use just one of the constructors for the System.Net.Uri class.
        // This will parse it for us.
        Uri uri = new Uri(uriString);
        // Look at the information we can get at now…
        StringBuilder uriParts = new StringBuilder();
        uriParts.AppendFormat("AbsoluteURI: {0}{1}",
                            uri.AbsoluteUri,Environment.NewLine);
        uriParts.AppendFormat("AbsolutePath: {0}{1}",
                            uri.AbsolutePath,Environment.NewLine);
        uriParts.AppendFormat("Scheme: {0}{1}",
                            uri.Scheme,Environment.NewLine);
        uriParts.AppendFormat("UserInfo: {0}{1}",
                            uri.UserInfo,Environment.NewLine);
        uriParts.AppendFormat("Authority: {0}{1}",
                            uri.Authority,Environment.NewLine);
        uriParts.AppendFormat("DnsSafeHost: {0}{1}",
                            uri.DnsSafeHost,Environment.NewLine);
        uriParts.AppendFormat("Host: {0}{1}",
                            uri.Host,Environment.NewLine);
        uriParts.AppendFormat("HostNameType: {0}{1}",
                            uri.HostNameType.ToString(),Environment.NewLine);
        uriParts.AppendFormat("Port: {0}{1}",uri.Port,Environment.NewLine);
        uriParts.AppendFormat("Path: {0}{1}",uri.LocalPath,Environment.NewLine);
        uriParts.AppendFormat("QueryString: {0}{1}",uri.Query,Environment.NewLine);
        uriParts.AppendFormat("Path and QueryString: {0}{1}",
                            uri.PathAndQuery,Environment.NewLine);
        uriParts.AppendFormat("Fragment: {0}{1}",uri.Fragment,Environment.NewLine);
        uriParts.AppendFormat("Original String: {0}{1}",
                            uri.OriginalString,Environment.NewLine);
        uriParts.AppendFormat("Segments: {0}",Environment.NewLine);
        for (int i = 0; i < uri.Segments.Length; i++)
            uriParts.AppendFormat(" Segment {0}:{1}{2}",
                            i, uri.Segments[i], Environment.NewLine);

        // GetComponents can be used to get commonly used combinations
        // of URI information.
        uriParts.AppendFormat("GetComponents for specialized combinations: {0}",
                            Environment.NewLine);
        uriParts.AppendFormat("Host and Port (unescaped): {0}{1}", 
                            uri.GetComponents(UriComponents.HostAndPort, 
                            UriFormat.Unescaped),Environment.NewLine);
        UriParts.AppendFormat("HttpRequestUrl (unescaped): {0}{1}", 
                            uri.GetComponents(UriComponents.HttpRequestUrl, 
                            UriFormat.Unescaped),Environment.NewLine);
        UriParts.AppendFormat("HttpRequestUrl (escaped): {0}{1}", 
                            uri.GetComponents(UriComponents.HttpRequestUrl, 
                            UriFormat.UriEscaped),Environment.NewLine);
        UriParts.AppendFormat("HttpRequestUrl (safeunescaped): {0}{1}",
                            uri.GetComponents(UriComponents.HttpRequestUrl, 
                            UriFormat.SafeUnescaped),Environment.NewLine);
        UriParts.AppendFormat("Scheme And Server (unescaped): {0}{1}", 
                            uri.GetComponents(UriComponents.SchemeAndServer, 
                            UriFormat.Unescaped),Environment.NewLine);
        UriParts.AppendFormat("SerializationInfo String (unescaped): {0}{1}",
                            uri.GetComponents(UriComponents.SerializationInfoString,
                            UriFormat.Unescaped),Environment.NewLine);
        UriParts.AppendFormat("StrongAuthority (unescaped): {0}{1}", 
                            uri.GetComponents(UriComponents.StrongAuthority, 
                            UriFormat.Unescaped),Environment.NewLine);
        UriParts.AppendFormat("StrongPort (unescaped): {0}{1}",
                            uri.GetComponents(UriComponents.StrongPort,
                            UriFormat.Unescaped),Environment.NewLine);

        // Write out our summary.
        Console.WriteLine(UriParts.ToString());
    }
    catch(ArgumentNullException e)
    {
        // UriString is a null reference (Nothing in Visual Basic).
        Console.WriteLine("Uri string object is a null reference: {0}",e);
    }
    catch(UriFormatException e)
    {
        Console.WriteLine("Uri formatting error: {0}",e); }
    }
}

Discussion

The Solution code uses the Uri class to do the heavy lifting. The constructor for the Uri class can throw two types of exceptions: an ArgumentNullException and a UriFormatException. The ArgumentNullException is thrown when the uri argument passed is null. The UriFormatException is thrown when the uri argument passed is of an incorrect or indeterminate format. Here are the error conditions that can throw a UriFormatException:

  • An empty Uri was passed in.

  • The scheme specified in the Uri is not correctly formed. See CheckSchemeName.

  • The URI passed in contains too many slashes.

  • The password specified in the passed-in URI is invalid.

  • The hostname specified in the passed-in URI is invalid.

  • The filename specified in the passed-in URI is invalid.

  • The username specified in the passed-in URI is invalid.

  • The host or authority name specified in the passed-in URI cannot be terminated by backslashes.

  • The port number specified in the passed-in URI is invalid or cannot be parsed.

  • The length of the passed-in URI exceeds 65,534 characters.

  • The length of the scheme specified in the passed-in URI exceeds 1023 characters.

  • There is an invalid character sequence in the passed-in URI.

There is no actual validation that occurs for the username, host or authority name, password or port number to insure that they exist or are correct. The validation is simply that they are in the correct format according to the URI specification (RFC 2396).


System.Net.Uri provides methods to compare URIs, parse URIs, and combine URIs. It is all you should ever need for URI manipulation and is used by other classes in the Framework when a URI is called for. The syntax for the pieces of a URI is this:

	[scheme]://[user]:[password]@[host/authority]:[port]/[path];[params]?
	[query string]#[fragment]

If you pass the following URI to ParseUri:

http://user:password@localhost:8080/www.abc.com/home.htm?item=1233#stuff

it will display the following items:

	AbsoluteURI: http://user:password@localhost:8080/www.abc.com/home%20page.htm?
	item=1233#stuff
	AbsolutePath: /www.abc.com/home%20page.htm
	Scheme: http
	UserInfo: user:password
	Authority: localhost:8080
	DnsSafeHost: localhost
	Host: localhost
	HostNameType: Dns
	Port: 8080
	Path: /www.abc.com/home page.htm
	QueryString: ?item=1233
	Path and QueryString: /www.abc.com/home%20page.htm?item=1233
	Fragment: #stuff
	Original String: http://user:password@localhost:8080/www.abc.com/home%20page.htm?
	item=1233#stuff
	Segments:
	    Segment 0: /
	    Segment 1: www.abc.com/
	    Segment 2: home%20page.htm
	GetComponents for specialized combinations:
	Host and Port (unescaped): localhost:8080
	HttpRequestUrl (unescaped): http://localhost:8080/www.abc.com/home page.htm?
	item=1233
	HttpRequestUrl (escaped): http://localhost:8080/www.abc.com/home%20page.htm?
	item=1233
	HttpRequestUrl (safeunescaped): http://localhost:8080/www.abc.com/home page.htm?
	item=1233
	Scheme And Server (unescaped): http://localhost:8080
	SerializationInfo String (unescaped): http://user:password@localhost:8080/
	www.abc.com/home page.htm?item=1233#stuff
	StrongAuthority (unescaped): user:password@localhost:8080
	StrongPort (unescaped): 8080

See Also

See the "Uri Class," "ArgumentNullException Class," and " UriFormatException Class" topics in the MSDN documentation.