Bugfixing my podcast management script

Like most folk wroking in IT, the last few months at work have been a mad scramble to cope with unexpected changes, so I haven’t been doing a lot of scripting work.

I did recently have to fix a couple of bugs I’d found in my podcast management script which were interesting in that they boil down to an ongoing challenge with PowerShell – the variability of escape characters depending on what module or cmdlet you’re using.

The first problem I had was with invalid character detection. The original version of my code was very simple:

$invalidchars=@("/",":","*","?","<",">","|")
foreach ($char in $invalidchars) {
	if ($targetfile -match $("\$char")) {
		$targetfile=$targetfile -replace $("\$char"),"_"
	}
	if ($targetfile -match '"') {
		$targetfile=$targetfile -replace '"',"'"
	}
}

A simple array containing the forbidden characters I want to exclude from filenames, and a loop to check each that character is present in the target filename. If it is, it’s replaced with an underscore. And a separate special case check for quotes to replace them with single quotes.

These characters are forbidden because they won’t get parse correctly as part of a filename when using Rename-Item (for example). To ensure that the interpreter checks each character
as itself (rather than as its interpreted version), a backslash is used as to escape the character. And this is where the problem is introduced.

Not all of the characters on my list need to be escaped, but for the ones that do the escaping isn’t always the same. For example, the backslash character itself – when checking for the presence of a backslash character, you need to use “\\\\” as the check – two backslashes (which are converted to a single backslash by the interpreter), then [i]another[/i] two backslashes (which are converted to a second single backslash by the interpreter, now allowing the first backslash to be treated as as an escape character). Whereas the double-quote character ” needs to be escaped with a tilde, `. Attempting to escape it with a backslash will cause the interpreter to treat it as the start of a string which is not correctly terminated, and usually result in a large number of confusing errors.

I’ve run into this particular quirk before, but apparently forgot about it in this context.

In an effort to rationalise the code and make it easier to expand in future, I decided to replace the array with an ArrayList of invalid characters:

if ($PSScriptRoot) {
	[System.Collections.ArrayList]$invalidchars=Import-CSV -Path $($PSScriptRoot+"\invalidchars.csv")
} else {
	[System.Collections.ArrayList]$invalidchars=Import-CSV -Path $("<full path to invalidchars.csv>")
}

Only one line is needed for this, but the above approach means I can test changes for parts of this code without having to run as an entire script – PSScriptRoot only has a value when invoking a script, otherwise it returns as null.

The array itself contains the escaped version of the character to be checked for, and what replacement character to use:

foreach ($char in $invalidchars) {
	if ($char.Replace -eq "") {
		$replace="_"
	} else {
		$replace=$char.Replace
	}
	if ($targetfile -match $char.Escaped) {
		$targetfile=$targetfile -replace $($char.Escaped),$replace
	}
	Remove-Variable -name replace -Force -ErrorAction SilentlyContinue
}

If no replacement character is specified, the default within the script – an underscore – is used. If I decide to change this in future, I only need to change one occurence within the script rather than each relevant entry in the CSV file being imported. This method also allows the use of strings as well as single characters to be specified – for example, in order to maintain internal consistency within the script I wanted to make sure that when a colon is replaced, the replacement string starts with a space. This method also allows me to eliminate the separate check for quotation marks in the title.

The second issue is a logical issue at the end of the Add-Podcast function, which ultimately boils down to unwarranted assumptions on my part.

The problem is in this particular line of code, which sometimes returns an error about indexing into a null array:

$eps=($feed.rss.channel.item | ? {($_.Title -match $podcast.TitleFilter) -and ((($_.Title -split " ")[0] -as [int]) -is [int])})[0]

This does the following:
[ol][li] the $feed.rss.channel.item object contents are piped into a where-Object check,
[li]the object contents are filtered by the TitleFilter regex for the podcast,
[li]the object contents are filtered by the outcome of splitting each item’s Title by a space character, casting the first fragment as an integer, and checking if the fragment is a valid integer.[/ol]

The root cause is that the above code assumes that, for podcasts whose titles have an episode number in them, there will be a space after the episode number. If that is not the case, e.g. podcasts using “001:” as their numbering format, the split operation will be performed correctly, but the first fragment will be “001:” which cannot be cast as an integer. Hence there being no results to populate the array nor instantiate it with, and the error about indexing into a null array.

My current fix for this is to remove the filtering:

$eps=($feed.rss.channel.item | ? {$_.Title -match $podcast.TitleFilter})[0]

Which resolves the issue well enough for now. Strictly speaking, it allows for inclusion of episodes which don’t have a number at the start of the title (e.g. promotional episodes of different podcasts), but that can already be addressed using the TitleFilter functionality and a suitable regular expression.

Another option would be to amend the split command as below:

$eps=($feed.rss.channel.item | ? {($_.Title -match $podcast.TitleFilter) -and ((($_.Title -split " |:")[0] -as [int]) -is [int])})[0]

There are only 2 characters different here – the “|:” in the parameter passed to the split command. The reason this works is because the split command, like the match command, can accept multiple parameters separated by a pipe. So effectively this line will perform the split operation twice, once splitting on spaces and the other splitting on colons. Since only one of those two operations has any output, it doesn’t matter which order the split characters are specified in.

In the end I opted against using this approach because it solves a specific instance of the problem rather than the overall issue.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.