Mass extraction of Zip files

This is another “scripts I write to make my life easier” post. This time around, it’s a straightforward task – bulk-extracting zip files, with a bit of finessing.

I buy music from artists on Bandcamp regularly and the downloads are almost always zip files. I wanted a script that would take care of the following:

  1. Find all zip files in a given directory
  2. For each zip file:
  3. Check if it has already been extracted (i.e. a directory with the desired name exists and contains mp3s)
  4. If not, validate the file name and then create a new folder with the same name
  5. Extract the files into the new directory
  6. Validate the names of the extracted files

All fairly simple, and it turns out that due to being able to re-use code, it didn’t take long to write, either.

First up, a couple of function declarations:

Function Extract-Zip {
    param(
        [string]$file,
        [string]$location,
        [array]$extractlist,
        [boolean]$cleanup=$false
    )
    if (!(Test-Path $location)) {
        try {
            mkdir $location | out-Null
        } catch {
            Write-Host "Unable to create folder $location, error was:`n$($_.Exception.Message)" -foregroundcolor red
			"Unable to create folder $location, error was:`n$($_.Exception.Message)" | Out-file -Filepath $global:logfile -append
        }
    }
    if ($extractlist) {
        Write-Host "Specific file extraction selected. Only the following files will be extracted:";$extractlist
    } else {
        Write-Host "Default mode selected, extracting all files..."
    }
        # Instantiate a new shell object and namespace
    if ((Test-Path $file) -and (Test-Path $location)) {
        $shell=New-Object -com Shell.Application
        $zip=$shell.NameSpace($file)
        # Check if the $extractlist parameter is set, and extract files accordingly.
        if (!$extractlist) {
            # Extract list is not set so default to extracting all files.
            try {
                foreach ($item in $zip.items()) {
                    $shell.Namespace($location).Copyhere($item)
                }
                Write-Host "Finished extracting contents of $file to $location." -foregroundcolor green
			    "Finished extracting contents of $file to $location." | Out-file -Filepath $global:logfile -append
            } catch {
                Write-Host "An error occured while extracting the contents of $file to $location; the error message was:`n$($_.Exception.Message)" -foregroundcolor red
			    "An error occured while extracting the contents of $file to $location; the error message was:", "`n", "$($_.Exception.Message)" | Out-file -Filepath $global:logfile -append
            }
        } else {
            # Extract list is set, so iterate through each name in the array and extract that file from the zip. Items in extractlist are not assumed to be unique matches, so a list of matching contents is generated for each item and a foreach loop iterates through the list, extracting each match individually.
            foreach ($e in $extractlist) {
                $list=@($zip.Items() | Where-Object {$_.Name -like $e})
                if ($list) {
                    foreach ($l in $list) {
                        try {
                            $shell.Namespace($location).Copyhere($l)
                            Write-Host "Extracted file $($e) successfully." -foregroundcolor green
                            "Finished extracting contents of $file to $location." | Out-file -Filepath $global:logfile -append
                        } catch {
                            Write-Host "Unable to extract file $($e), error was:`n$($_.Exception.Message)" -foregroundcolor red
                            "Unable to extract file $($e), error was:",$_.Exception.Message | Out-file -Filepath $global:logfile -append
                        }
                    }
                } else {
                    Write-Host "No file with name $($e) found in specified archive." -foregroundcolor yellow
                    "No file with name $($e) found in specified archive." | Out-file -Filepath $global:logfile -append
                }
				Remove-Variable -Name list -Force -ErrorAction SilentlyContinue
            }
        }
		if ($cleanup) {
			Write-Host "Cleanup enabled: deleting compressed file..." -foregroundcolor Green
			Remove-Item -Path $file -Force 
		}		
    } else {
        Write-Host "Unable to proceed with extraction, invalid input specified!" -foregroundcolor red
		"Unable to proceed with extraction, invalid input specified!" | Out-file -Filepath $global:logfile -append
        if (!(Test-Path $file)) {
            Write-Host "Could not find file $file!" -foregroundcolor red
			"Could not find file $file!" | Out-file -Filepath $global:logfile -append
			
        }
        if (!(Test-Path $location)) {
            Write-Host "Could not find or create folder path $location!" -foregroundcolor red
			"Could not find or create folder path $location!" | Out-file -Filepath $global:logfile -append
        }
    }
}

Function Rename-LongTracks {
	Param(
		[String]$location,
		[string]$replace
		
	)
	if (!$replace) {
		$replace=Read-Host("Type the string to be removed from the track names")
	}
	Push-Location
	Set-Location $location
	$tracklist=Gci -Filter "*.mp3"
	foreach ($t in $tracklist) {
		$Newname=$t.Name.ToString().Replace($replace,"")
		Rename-Item -Path $t.FullName -NewName $newname
		Remove-Variable -name newname -force
	}
	Pop-Location
}

I’ve already detailed the working of Extract-Zip in a previous post so I won’t repeat myself. Rename-Longtracks is pretty simple but I went through it in my post about Powershell profiles a few months back.

With those functions out of the way, here’s the script body:

# Main script body
$scriptroot=Split-Path -parent $MyInvocation.MyCommand.Definition
$global:logfile=$scriptroot+"\"+(Get-Date -format 'yyyy_MM_dd_HHmm')+"_Bandcamp_Zip_Extractor.log"
"$(Get-Date -Format 'yyyy-MM-dd HH:mm'): Bandcamp Zip Extractor" | Out-file -Filepath $global:logfile

# 1. Prompt for location to search for zip files.
[boolean]$validpath=$false
while (!$validpath) {
	$dirpath=Read-Host -Prompt "Enter top-level path to check for zipfiles"
	try {
		Test-Path $dirpath -ErrorAction Stop
		"Searching $($dirpath) for Zip files to extract..." | Out-file -Filepath $global:logfile -append
		$validpath=$true
	} catch {
		Write-Host "Invalid path entered, please try again!"
		Start-sleep 3
	}
	cls
}
Remove-variable -name validpath -force

$zipfiles=GCI -Recurse -Path $dirpath -Filter "*.zip"

# 2. Iterate through found files.
foreach ($zip in $zipfiles) {
	# 3. Check if directory already exists and is populated with mp3s
	if (Test-Path ($zip.Fullname -replace ".zip","")) {
		if ((Gci -path ($zip.Fullname -replace ".zip","") -filter "*.mp3").count -gt 0) {
			[boolean]$done=$true
			"File $($zip.Fullname) appears to have already been extracted." | Out-file -Filepath $global:logfile -append
		}
	}
	if (!$done) {
		# 4. Check for dash in filename, rename if found.
		if ($zip.name -match "-") {
			$newname=($zip.Name -split "-")[1]
			if ($newname -match "^ ") {
				$newname=$newname.TrimStart(" ")
			}
			#"Renaming $($zip.Name) to $($newname)..." | Out-file -Filepath $global:logfile -append
			rename-item -Path $zip.fullname -NewName $newname
		}
		
		# 5. Extract zip file to new folder in same location
		if ($newname) {
			[string]$source=$zip.Directory.ToString()+"\"+$newname
			[string]$target=$zip.Directory.ToString()+"\"+$($newname -replace ".zip","")
		} else {
			[string]$source=$zip.FullName
			[string]$target=($zip.FullName -replace ".zip","")
		}
		Remove-Variable -name newname -force
		Extract-Zip -file $source -location $target -cleanup $true
		
		# 6. Examine filenames in new folder for common fragments e.g "Artist - Album - " or similar.
		$sample=(GCI -path $target -Filter "*.mp3")[0]
		$count=($sample.Name -split "-").count
		if ($count -gt 1) {
			[string]$prefix=""
			for ($i=0;$i -lt $($count -1); $i++) {
				$prefix+=($sample -split "-")[$i]
				$prefix+="-"
			}
			if (($sample.Name -replace $prefix,"") -match "^ ") {
				$prefix+=" "
			}
			"Renaming files in directory $($target) to remove prefix $($prefix)..." | Out-file -Filepath $global:logfile -append
			Rename-LongTracks -location $target -replace $prefix
		}
		"All actions for file $($zip.Fullname) complete." | Out-file -Filepath $global:logfile -append
	} else {
		Remove-Variable -Name done -force
	}
}

The numbered comments identify which steps are handled in each section – there’s nothing complicated or new here, really. The only slightly tricky part was the logic for determining what to remove in step 6, which is a combination of ensuring that all but the last part of the filename is discarded and making sure there are no leading spaces.

As it stands, this didn’t take long to put together and works for what I need it to do. I’ve recently started using Github, so I have uploaded this script and will add any further changes or bugfixes there. There are a couple of things I’d like to add, like a check in Extract-Zip to make sure file extraction has been successful before deleting the zip file if cleanup mode is enabled, or an optional “copy extracted files to remote location” section (for copying files to my mp3 player). There is also at least one bug relating to zip file names with non-alphanumeric characters.