C# Archives - Teklibri

Converting Between Absolute & Relative Paths in MadCap Flare: Sample C# Code

I regularly use MadCap Flare for the production of technical documentation. Flare is a sophisticated content authoring tool, which stores all its topic and control files using XML. This makes it relatively easy to process the content of the files programmatically, as in the example of CSS class analysis that I described in a previous post.

The Flare software is based on Microsoft’s .NET framework, so the program runs only under Windows. For that reason, this discussion will be restricted to Windows file systems.

In Windows, the “path” to a file consists of a hierarchical list of subfolders beneath a root volume, for example:

c:\MyRoot\MyProject\Content\MyFile.htm

Sometimes, however, it’s convenient to specify a path relative to another location. For example, if the file at:

c:\MyRoot\MyProject\Content\Subsection\MyTopic.htm

contained a link to MyFile.htm as above, the relative path could be specified as:

..\MyTopic.htm

In the syntax of relative paths, “..” means “go up one folder level”. Similarly, “.” means “this folder level”, so .\MyFile.htm refers to a file that’s in the same folder as the file containing the relative path.

If you’ve ever examined the markup in Flare files, you’ll have noticed that extensive use is made of “relative paths”. For example, a Flare topic may contain a hyperlink to another topic in the same project, such as:

<MadCap:xref href="..\MyTopic.htm">Linked Topic</MadCap:xref>

Similarly, Flare’s Table-Of-Contents (TOC) files (which have .fltoc extensions) are XML files that contain trees of TocEntry elements. Each TocEntry element has a Link attribute that contains the path to the topic or sub-TOC that appears at that point in the TOC. All the Link attribute paths start at the project’s Content (for linked topics) or Project (for linked sub-TOCs) folder, so in that sense they are relative paths.

An example of a TocEntry element would be:

<TocEntry Title="Sample Topic" Link="/Content/Subsection/MyTopic.htm" />

When I’m writing code to process these files (for example to open and examine each topic in a Flare TOC file), I frequently have to convert Flare’s relative paths into absolute paths (because the XDocument.Load() method, as described in my previous post, will accept only an absolute path), and vice versa if I want to insert a path into a Flare file. Therefore, I’ve found it very useful to create “library” functions in C# to perform these conversions. I can then call the functions AbsolutePathToRelativePath() and RelativePathToAbsolutePath() without having to think again about the details of how to convert from one format to the other.

I’m sure that there are probably similar functions available in other programming languages. For example, I’m told that Python includes a built-in conversion function called os.path.relpath, which would make it unnecessary to create custom code. Anyway, my experience as a programmer suggests that you can never have too many code samples, so I’m offering my own versions here to add to the available set. I have tested both functions extensively and they do work as listed.

The methods below are designed as static methods for inclusion in a stringUtilities class. You could place them in any class, or make them standalone functions.

AbsolutePathToRelativePath

This static method converts an absolute file path specified by strTargFilepath to its equivalent path relative to strRootDir. strRootDir must be a directory tree only, and must not include a file name.

For example, if the absolute path strTargFilepath is:

c:\folder1\folder2\subfolder1\filename.ext

And the root directory strRootDir is:

c:\folder1\folder2\folder3\folder4

The method returns the relative file path:

..\..\subfolder1\filename.ext

Note that there must be some commonality between the folder tree of strTargFilepath and strRootDir. If there is no commonality, then the method just returns strTargFilepath unchanged.

The path separator character that will be used in the returned relative path is specified by strPreferredSeparator. The default value is correct for Windows.

using System.IO;

public static string AbsolutePathToRelativePath(string strRootDir, string strTargFilepath, string strPreferredSeparator = "\\")
{
	if (strRootDir == null || strTargFilepath == null)
		return null;

 	string[] strSeps = new string[] { strPreferredSeparator };

 	if (strRootDir.Length == 0 || strTargFilepath.Length == 0)
		return strTargFilepath;

 	// Convert to arrays
	string[] strRootFolders = strRootDir.Split(strSeps, StringSplitOptions.None);
	string[] strTargFolders = strTargFilepath.Split(strSeps, StringSplitOptions.None);
	if (string.Compare(strRootFolders[0], strTargFolders[0], StringComparison.OrdinalIgnoreCase) != 0)
		return strTargFilepath;

 	// Count common root folders
	int i = 0;
	List<string> listRelFolders = new List<string>();
	for (i = 0; i < strRootFolders.Length; i++)
	{
		if (string.Compare(strRootFolders[i], strTargFolders[i], StringComparison.OrdinalIgnoreCase) != 0)
			break;
	}
	
	for (int k = i; k < strTargFolders.Length; k++)
		listRelFolders.Add(strTargFolders[k]);

	System.Text.StringBuilder sb = new System.Text.StringBuilder();
	if (i > 0)
	{
		// Note: the last element of strTargFolders is actually the filename, so must adjust count for that
		for (int j = 0; j < strRootFolders.Length - i; j++)
		{
			sb.Append("..");
			sb.Append(strPreferredSeparator);
		}
	}

	return sb.Append(string.Join(strPreferredSeparator, listRelFolders.ToArray())).ToString();
}

RelativePathToAbsolutePath

This static method converts a relative file path specified by strTargFilepath to its equivalent absolute path using strRootDir. strRootDir must be a directory tree only, and must not include a file name.

For example, if the relative path strTargFilepath is:

..\..\subfolder1\filename.ext

And the root directory strRootDir is:

c:\folder1\folder2\folder3\folder4

The method returns the absolute file path:

c:\folder1\folder2\subfolder1\filename.ext

If strTargFilepath starts with “.\” or “\”, then strTargFilepath is simply appended to strRootDir

The path separator character that will be used in the returned relative path is specified by strPreferredSeparator. The default value is correct for Windows.

using System.IO;

public static string RelativePathToAbsolutePath(string strRootDir, string strTargFilepath, string strPreferredSeparator = "\\")
{
	if (string.IsNullOrEmpty(strRootDir) || string.IsNullOrEmpty(strTargFilepath))
		return null;
	
	string[] strSeps = new string[] { strPreferredSeparator };

 	// Convert to lists
	List<string> listTargFolders = strTargFilepath.Split(strSeps, StringSplitOptions.None).ToList<string>();
	List<string> listRootFolders = strRootDir.Split(strSeps, StringSplitOptions.None).ToList<string>();

	// If strTargFilepath starts with .\ or \, delete initial item
	if (string.IsNullOrEmpty(listTargFolders[0]) || (listTargFolders[0] == "."))
		listTargFolders.RemoveAt(0);
	while (listTargFolders[0] == "..")
	{
		listRootFolders.RemoveAt(listRootFolders.Count - 1);
		listTargFolders.RemoveAt(0);
	}
	if ((listRootFolders.Count == 0) || (listTargFolders.Count == 0))
		return null;

 	// Combine root and subfolders
	System.Text.StringBuilder sb = new System.Text.StringBuilder();
	foreach (string str in listRootFolders)
	{
		sb.Append(str);
		sb.Append(strPreferredSeparator);
	}
	for (int i = 0; i < listTargFolders.Count; i++)
	{
		sb.Append(listTargFolders[i]);
		if (i < listTargFolders.Count - 1)
			sb.Append(strPreferredSeparator);
	}

	return sb.ToString();
}

[7/1/16] Note that the method above does not check for the case where a relative path contains a partial overlap with the specified absolute path. If required, you would need to add code to handle such cases.

For example, if the relative path strTargFilepath is:

folder4\subfolder1\filename.ext

and the root directory strRootDir is:

c:\folder1\folder2\folder3\folder4

the method will not detect that folder4 is actually already part of the root path.

Analyzing CSS Styles in a Large Set of XML Documents

This post explains how I created a small program for the automated processing of text, so there won’t be anything about graphics or spelling here! I’ve created many such programs, large and small, over the years, so this is just intended as a sample of what’s possible. This program analyzes CSS styles applied to a large set of XML-based documents.

In my work, I frequently need to be able to analyze and modify large sets of XML-encoded documents (typically 10,000+ documents in a single project). It’s simply not practical to do this by opening and editing each document individually; the only realistic way is to write a script or program to perform the analysis and editing automatically.

Recently, I needed to update a Cascading Style Sheet (CSS file) that was linked to a large number of XHTML topic documents in a publishing project. The CSS file had been edited by several people over many years, with little overall strategy except to “keep it going”, so I was aware that it probably contained many “junk” style definitions. I wanted to be able to delete all the junk definitions from the CSS file, but how could I determine which styles were needed and which were not? I needed a way to scan through all 10,000+ files to find out exactly which CSS classes were used at least once in the set.

Years ago, I gained a great deal of experience of programming in C++, using Microsoft’s MFC framework. About 8 years ago, for compatibility with other work that was being done by colleagues, I began to transition to programming in C# using Microsoft’s .NET architecture. Thus, I decided to use C#/.NET to create a program that would allow me to scan huge numbers of XHTML files rapidly and create a list of the CSS styles actually found in the topic files.

Until the advent of .NET 3.5, I’d become accustomed to working with the class XmlDocument. While this was a definite improvement over previous “XML” handling classes, it could still be awkward for certain operations, such as, for example, constructing and inserting new XML snippets in an existing document. I was delighted, then, to discover the new XDocument class that was introduced with .NET 3.5, and I now use the newer class almost exclusively. (For some discussion of the differences, see http://stackoverflow.com/questions/1542073/xdocument-or-xmldocument)

Analyzing CSS Styles: Code Sample

Here are the critical methods of the class that I created to walk the XML tree in each document. The first method below, walkXMLTree(), executes the tree walking operation. To do that, it obtains the root element of the XML tree, then calls the second method, checkChildElemsRecursive(), which actually traverses the tree in each document.

using System.IO;
using System.Xml.Linq;

public int walkXMLTree(string strRootFolderpath, ref SortedList<string, string> setClasses)
{
    string[] strFilepaths = Directory.GetFiles(strRootFolderpath, "*.htm", SearchOption.AllDirectories);

    List<string> listDocsFailedToLoad = new List<string>();

    int iElemsChecked = 0;

    foreach (string strFilepath in strFilepaths)
    {
        try
        {
            _xdoc = XDocument.Load(strFilepath);
            _xelemDocRoot = _xdoc.Root;
            iElemsChecked += checkChildElemsRecursive(_xelemDocRoot, ref setClasses, strFilepath);
        }
        catch
        {
            listDocsFailedToLoad.Add(strFilepath);
        }
   }

   return iElemsChecked;
}

private int checkChildElemsRecursive(XElement xelemParent, ref SortedList<string, string> setClasses, string strFilename)
{
    int iElemsChecked = 0;
    string strClass;
    XAttribute xattClass;

    IEnumerable<XElement> de = xelemParent.Elements();

    foreach (XElement el in de)
    {
        // Find class attribute if any
        xattClass = el.Attribute("class");
        if (xattClass != null)
        {
            strClass = el.Name + "." + xattClass.Value;
            if (!setClasses.ContainsKey(strClass))
                setClasses.Add(strClass, strFilename);
        }
        iElemsChecked++;

        iElemsChecked += checkChildElemsRecursive(el, ref setClasses, strFilename);
    } 

    return iElemsChecked;
}

[Code correction 6/20/16: I changed xelemParent.Descendants() to xelemParent.Elements(). By using Descendants, I was doing the work twice, because Descendants returns child elements at all sub-levels, instead of just the first level. The code works correctly either way, but if you use Descendants, the recursion is unnecessary.]

The use of the System.IO and System.Xml.Linq libraries is declared at the top of the code.

The basic method is walkXMLTree(), which generates a sorted list setClasses of CSS classes used in every one of the XHTML files under the root folder strRootFolderpath. In this implementation, the returned list contains the CSS class name in the first element of each item (for example, “span.link”), and, in the second element, the file path of the first topic file that was found to contain that class.

The method walkXMLTree() contains a loop that examines every strFilepath in the array strFilepaths. Although every topic file under strRootFolderpath is expected to contain valid XML, it’s always possible that a file contains invalid XML markup. In that case, the XDocument.Load() method throws an exception, which stops program execution. To avoid crashing the program in such a case, I wrapped XDocument.Load() in a try-catch loop. If the method fails for a particular file, the code adds the path and name of that file to listDocsFailedToLoad, then moves on to the next file. When all the files have been scanned, I can then examine listDocsFailedToLoad to see how many files couldn’t be opened (hopefully not a large number, and usually it isn’t).

For each XHTML topic that it succeeds in opening, walkXMLTree() calls the method checkChildElemsRecursive() to traverse the element tree in that document. Note that the checkChildElemsRecursive() method is indeed recursive, since it calls itself in its own foreach loop. When checkChildElemsRecursive() is initially called from walkXMLTree(), the xelemParent parameter that is passed in is the root element of the XML tree in the document being scanned.

When control finally returns from checkChildElemsRecursive() to walkXMLTree(), the variable iElemsChecked contains the complete number of XML elements that were examined. This is likely to be a huge number; in one recent test, more than 8 million elements were processed.

The final content of setClasses will be a list of every class that’s used in at least one of the topic files. In the example above, I also set each item to show the filepath of the first file that was found that included that class, because I wasn’t expecting too many surprises! To obtain a complete analysis, you could, of course, make the second item in the SortedList a sublist that would include the file path of every topic using that class.