Converting Between Absolute & Relative Paths in MadCap Flare: Sample C# Code

I regularly use MadCap Flare for the production of technical documentation. Flare is a sophisticated content authoring tool, which stores all its topic and control files using XML. This makes it relatively easy to process the content of the files programmatically, as in the example of CSS class analysis that I described in a previous post.

The Flare software is based on Microsoft’s .NET framework, so the program runs only under Windows. For that reason, this discussion will be restricted to Windows file systems.

In Windows, the “path” to a file consists of a hierarchical list of subfolders beneath a root volume, for example:

c:\MyRoot\MyProject\Content\MyFile.htm

Sometimes, however, it’s convenient to specify a path relative to another location. For example, if the file at:

c:\MyRoot\MyProject\Content\Subsection\MyTopic.htm

contained a link to MyFile.htm as above, the relative path could be specified as:

..\MyTopic.htm

In the syntax of relative paths, “..” means “go up one folder level”. Similarly, “.” means “this folder level”, so .\MyFile.htm refers to a file that’s in the same folder as the file containing the relative path.

If you’ve ever examined the markup in Flare files, you’ll have noticed that extensive use is made of “relative paths”. For example, a Flare topic may contain a hyperlink to another topic in the same project, such as:

<MadCap:xref href="..\MyTopic.htm">Linked Topic</MadCap:xref>

Similarly, Flare’s Table-Of-Contents (TOC) files (which have .fltoc extensions) are XML files that contain trees of TocEntry elements. Each TocEntry element has a Link attribute that contains the path to the topic or sub-TOC that appears at that point in the TOC. All the Link attribute paths start at the project’s Content (for linked topics) or Project (for linked sub-TOCs) folder, so in that sense they are relative paths.

An example of a TocEntry element would be:

<TocEntry Title="Sample Topic" Link="/Content/Subsection/MyTopic.htm" />

When I’m writing code to process these files (for example to open and examine each topic in a Flare TOC file), I frequently have to convert Flare’s relative paths into absolute paths (because the XDocument.Load() method, as described in my previous post, will accept only an absolute path), and vice versa if I want to insert a path into a Flare file. Therefore, I’ve found it very useful to create “library” functions in C# to perform these conversions. I can then call the functions AbsolutePathToRelativePath() and RelativePathToAbsolutePath() without having to think again about the details of how to convert from one format to the other.

I’m sure that there are probably similar functions available in other programming languages. For example, I’m told that Python includes a built-in conversion function called os.path.relpath, which would make it unnecessary to create custom code. Anyway, my experience as a programmer suggests that you can never have too many code samples, so I’m offering my own versions here to add to the available set. I have tested both functions extensively and they do work as listed.

The methods below are designed as static methods for inclusion in a stringUtilities class. You could place them in any class, or make them standalone functions.

AbsolutePathToRelativePath

This static method converts an absolute file path specified by strTargFilepath to its equivalent path relative to strRootDir. strRootDir must be a directory tree only, and must not include a file name.

For example, if the absolute path strTargFilepath is:

c:\folder1\folder2\subfolder1\filename.ext

And the root directory strRootDir is:

c:\folder1\folder2\folder3\folder4

The method returns the relative file path:

..\..\subfolder1\filename.ext

Note that there must be some commonality between the folder tree of strTargFilepath and strRootDir. If there is no commonality, then the method just returns strTargFilepath unchanged.

The path separator character that will be used in the returned relative path is specified by strPreferredSeparator. The default value is correct for Windows.

using System.IO;

public static string AbsolutePathToRelativePath(string strRootDir, string strTargFilepath, string strPreferredSeparator = "\\")
{
	if (strRootDir == null || strTargFilepath == null)
		return null;

 	string[] strSeps = new string[] { strPreferredSeparator };

 	if (strRootDir.Length == 0 || strTargFilepath.Length == 0)
		return strTargFilepath;

 	// Convert to arrays
	string[] strRootFolders = strRootDir.Split(strSeps, StringSplitOptions.None);
	string[] strTargFolders = strTargFilepath.Split(strSeps, StringSplitOptions.None);
	if (string.Compare(strRootFolders[0], strTargFolders[0], StringComparison.OrdinalIgnoreCase) != 0)
		return strTargFilepath;

 	// Count common root folders
	int i = 0;
	List<string> listRelFolders = new List<string>();
	for (i = 0; i < strRootFolders.Length; i++)
	{
		if (string.Compare(strRootFolders[i], strTargFolders[i], StringComparison.OrdinalIgnoreCase) != 0)
			break;
	}
	
	for (int k = i; k < strTargFolders.Length; k++)
		listRelFolders.Add(strTargFolders[k]);

	System.Text.StringBuilder sb = new System.Text.StringBuilder();
	if (i > 0)
	{
		// Note: the last element of strTargFolders is actually the filename, so must adjust count for that
		for (int j = 0; j < strRootFolders.Length - i; j++)
		{
			sb.Append("..");
			sb.Append(strPreferredSeparator);
		}
	}

	return sb.Append(string.Join(strPreferredSeparator, listRelFolders.ToArray())).ToString();
}

RelativePathToAbsolutePath

This static method converts a relative file path specified by strTargFilepath to its equivalent absolute path using strRootDir. strRootDir must be a directory tree only, and must not include a file name.

For example, if the relative path strTargFilepath is:

..\..\subfolder1\filename.ext

And the root directory strRootDir is:

c:\folder1\folder2\folder3\folder4

The method returns the absolute file path:

c:\folder1\folder2\subfolder1\filename.ext

If strTargFilepath starts with “.\” or “\”, then strTargFilepath is simply appended to strRootDir

The path separator character that will be used in the returned relative path is specified by strPreferredSeparator. The default value is correct for Windows.

using System.IO;

public static string RelativePathToAbsolutePath(string strRootDir, string strTargFilepath, string strPreferredSeparator = "\\")
{
	if (string.IsNullOrEmpty(strRootDir) || string.IsNullOrEmpty(strTargFilepath))
		return null;
	
	string[] strSeps = new string[] { strPreferredSeparator };

 	// Convert to lists
	List<string> listTargFolders = strTargFilepath.Split(strSeps, StringSplitOptions.None).ToList<string>();
	List<string> listRootFolders = strRootDir.Split(strSeps, StringSplitOptions.None).ToList<string>();

	// If strTargFilepath starts with .\ or \, delete initial item
	if (string.IsNullOrEmpty(listTargFolders[0]) || (listTargFolders[0] == "."))
		listTargFolders.RemoveAt(0);
	while (listTargFolders[0] == "..")
	{
		listRootFolders.RemoveAt(listRootFolders.Count - 1);
		listTargFolders.RemoveAt(0);
	}
	if ((listRootFolders.Count == 0) || (listTargFolders.Count == 0))
		return null;

 	// Combine root and subfolders
	System.Text.StringBuilder sb = new System.Text.StringBuilder();
	foreach (string str in listRootFolders)
	{
		sb.Append(str);
		sb.Append(strPreferredSeparator);
	}
	for (int i = 0; i < listTargFolders.Count; i++)
	{
		sb.Append(listTargFolders[i]);
		if (i < listTargFolders.Count - 1)
			sb.Append(strPreferredSeparator);
	}

	return sb.ToString();
}

[7/1/16] Note that the method above does not check for the case where a relative path contains a partial overlap with the specified absolute path. If required, you would need to add code to handle such cases.

For example, if the relative path strTargFilepath is:

folder4\subfolder1\filename.ext

and the root directory strRootDir is:

c:\folder1\folder2\folder3\folder4

the method will not detect that folder4 is actually already part of the root path.

How to Avoid Mosquitoes (in Compressed Bitmap Images)

In this post, I’m going to explain how you can avoid mosquitoes. However, if you happen to live in a humid area, I’m afraid my advice won’t help you, because the particular “mosquitoes” I’m talking about are undesirable artifacts that occur in bitmap images.

For many years now, my work has included the writing of user assistance documents for various hardware and software systems. To illustrate such documents, I frequently need to capture portions of the display on a computer or device screen. As I explained in a previous post, the display on any device screen is a bitmap image. You can make a copy of the screen image at any time for subsequent processing. Typically, I capture portions of the screen display to illustrate the function of controls or regions of the software I’m describing. This capture operation seems like it should be simple, and, if you understand bitmap image formats and compression schemes, it is. Nonetheless, I’ve encountered many very experienced engineers and writers who were “stumped” by the problem described here, hence the motivation for my post.

Below is the sample screen capture that I’ll be using as an example in this post. (The sample shown is deliberately enlarged.) As you can see, the image consists of a plain blue rectangle, plus some black text and lining, all on a plain white background.

Screen Capture Example
Screen Capture Example

Sometimes, however, someone approaches me complaining that a screen capture that they’ve performed doesn’t look good. Instead of the nice, clean bitmap of the screen, as shown above, their image has an uneven and fuzzy appearance, as shown below. (In the example below, I’ve deliberately made the effect exceptionally bad and magnified the image – normally it’s not this obvious!)

Poor Quality Screen Capture, with Mosquitoes
Poor Quality Screen Capture, with Mosquitoes

In the example above, you can see dark blemishes in what should be the plain white background around the letters, and further color blemishes near the colored frame at the top. Notice that the blemishes appear only in areas close to sharp changes of color in the bitmap. Because such blemishes appear to be “buzzing around” details in the image, they are colloquially referred to as “mosquitoes”.

Typically, colleagues present me with their captured bitmap, complete with mosquitoes, and ask me how they can fix the problems in the image. I have to tell them that it actually isn’t worth the effort to try to fix these blemishes in the final bitmap, and that, instead, they need to go back and redo the original capture operation in a different way.

What Causes Mosquitoes?

Mosquitoes appear when you apply the wrong type of image compression to a bitmap. How do you know which is the right type of compression and which is wrong?

There are many available digital file compression schemes, but most of them fall into one of two categories:

  • Block Transform Compression
  • Lossless Huffman & Dictionary-Based Compression

Block Transform Compression Schemes

Most people who have taken or exchanged digital photographs are familiar with the JPEG (Joint Photographic Experts Group) image format. As the name suggests, this format was specifically designed for the compression of photographs; that is, images taken with some type of camera. Most digitized photographic images display certain characteristics that affect the best choice for compressing them. The major characteristics are:

  • Few sharp transitions of color or luminance from one pixel to the next. Even a transition that looks sharp to the human eye actually occurs over several pixels.
  • A certain level of electrical noise in the image. This occurs due to a variety of causes, but it has the effect that pixels in regions of “solid” color don’t all have exactly the same value. The presence of this noise adds high-frequency information to the image that’s actually unnecessary and undesirable. In most cases, removing the noise would actually improve the image quality.

As a result, it’s usually possible to remove some of the image’s high-frequency information without any noticeable reduction in its quality. Schemes such as JPEG achieve impressive levels of compression, partially by removing unnecessary high-frequency information in this way.

JPEG analyzes the frequency information in an image by dividing up the bitmap into blocks of 16×16 pixels. Within each block, high-frequency information is removed or reduced. The frequency analysis is performed by using a mathematical operation called a transform. The problem is that, if a particular block happens to contain a sharp transition, removing the high-frequency components tends to cause “ringing” in all the pixels in the block. (Technically, this effect is caused by something called the Gibbs Phenomenon, the details of which I won’t go into here.) That’s why the “mosquitoes” cluster around areas of the image where there are sharp transitions. Blocks that don’t contain sharp transitions, such as plain-colored areas away from edges in the example, don’t contain so much high-frequency information, so they compress well and don’t exhibit mosquitoes.

In the poor-quality example above, you can actually see some of the 16×16 blocks in the corner of the blue area, because I enlarged the image to make each pixel more visible.

Note that the removal of high-frequency information from the image results in lossy compression. That is, some information is permanently removed from the image, and the original information can never be retrieved exactly.

Huffman Coding & Dictionary-Based Compression Schemes

Computer screens typically display bitmaps that have many sharp transitions from one color to another, as shown in the sample screen capture. These images are generated directly by software; they aren’t captured via a camera or some other form of transducer.

If you’re reading this article on a computer screen, it’s likely that the characters you’re viewing are rendered with very sharp black-to-white transitions. In fact, modern fonts for computer displays are specifically designed to be rendered in this way, so that the characters will appear sharp and easy to read even when the font size is small. The result is that the image has a lot of important high-frequency information. Similarly, such synthesized images have no noise, because they were not created using a transducer that could introduce noise.

Applying block-transform compression to such synthesized bitmaps results in an image that, at best, looks “fuzzy” and at worst contains mosquitoes. Text in such bitmaps can quickly become unreadable.

If you consider the pixel values in the “mosquito-free” sample screen capture above, it’s obvious that the resulting bitmap will contain many pixels specifying “white”, many specifying “black”, and many specifying the blue shade. There’ll also be some pixels with intermediate gray or blue shades, in areas where there’s a transition from one color to another, but far fewer of those than of the “pure” colors. For synthesized images such as this, an efficient form of compression is that called Huffman Coding. Essentially, this coding scheme compresses an image by assigning shorter codewords to the pixel values that appear more frequently, and longer codewords to values that are less frequent. When an image contains a large number of similar pixels, the overall compression can be substantial.

Another lossless approach is to create an on-the-fly “dictionary” of pixel sequences that appear repeatedly in the image. Again, in bitmaps that contain regions with repeated patterns, this approach can yield excellent compression. The details of how dictionary compression works can be found in descriptions of, for example, the LZW algorithm.

Unlike many block transform schemes, such compression schemes are lossless. Even though all the pixel values are mapped from one coding to another, there is no loss of information, and, by reversing the mapping, it’s possible to restore the original image, pixel-for-pixel, in its exact form.

One good choice for a bitmap format that offers lossless compression is PNG (Portable Network Graphics). This format uses a two-step compression method, by applying firstly dictionary-based compression, then following that by Huffman coding of the results.

A Mosquito-Free Result

Here is the same screen capture sample, but this time I saved the bitmap as a PNG file instead of as a JPEG file. Although PNG does compress the image, the compression is lossless and there’s no block transform. Hence, there’s no danger that mosquitoes will appear.

High Quality Screen Capture without Artifacts
High Quality Screen Capture without Artifacts

Avoiding Mosquitoes: Summary

As I’ve shown, the trick to avoiding mosquitoes in screen capture bitmaps or other computer-generated imagery is simply to avoid using file formats or compression schemes that are not suitable for this kind of image. The reality is that bitmap formats were designed for differing purposes, and are not all equivalent to each other.

  • Unsuitable formats include those that use block-transform and/or lossy compression, such as JPEG.
  • Suitable formats are those that use lossless Huffman coding and/or dictionary-based compression, or no compression at all, such as PNG.