A little understanding of how find expressions work makes it easy to filter its output to exclude certain files and directories.
The Unix find command is great for exploding the contents of a directory tree, however sometimes it can find a little too much for our needs. Fortunately it’s fairly easy to blind it to certain files and folders if you can get your head around how its expressions work.
By way of an example, let’s say we’re identifying files in an Apache Tomcat installation that we want to include in a backup. The directory tree looks a little like this:-
find . -type d -print . ./work ./work/Catalina ./work/Catalina/localhost ./work/Catalina/localhost/ROOT ./temp ./webapps ./webapps/ROOT ./logs ./bin ./conf ./conf/Catalina ./conf/Catalina/localhost ./lib
Let’s say we want to exclude the work folder since it only contains temporary files. We could just pipe the output through grep to eliminate those paths:-
find . -print | grep -v "./work" . ./temp ./temp/safeToDelete.tmp ./webapps ./webapps/ROOT ./webapps/ROOT/index.html …
This works fine, though it’s a little cheesy as solutions go, and it’s no help if we wanted instead to use find’s -exec option to run commands on the located files.
Find has a rather useful -prune option which can be used in combination with -name or -path to cause it to ignore certain files or paths. So how about something like the following?
find . -path "./work/*" -prune -print ./work/Catalina
This isn’t doing what we expected. This is because our -path option has started a find expression with which -prune and -print are both associated. Hence the first path matching ./work/*, in this case the Catalina subfolder has been pruned (i.e. the find command will no longer follow it) and has also been printed. Since we’ve not told it what to do with anything else, it does nothing.
We can fix this by making our -print a separate expression by using -or, which roughly means either of the two expressions must evaluate to true:-
find . -path "./work/*" -prune -or -print . ./work ./temp ./temp/safeToDelete.tmp ./webapps ./webapps/ROOT ./webapps/ROOT/index.html …
Note that we don’t really need the /* on the end of our path for our use case, simply looking for ./work will match and prune that folder and therefore not find anything within it. It also excludes the folder itself from our output:-
find . -path "./work" -prune -or -print . ./temp ./temp/safeToDelete.tmp ./webapps ./webapps/ROOT ./webapps/ROOT/index.html …
But we can’t eliminate the leading ./ from our -path option since that no longer matches what find locates – “./work” is not a text match for “work”.
Working with full paths
Since, in our example, we’re creating a manifest for a backup, we’ll want full paths in our output, so we’ll change our find command to use the full path of our tomcat folder:-
find /home/myapps/tomcat -path "./work" -prune -or -print /home/myapps/tomcat /home/myapps/tomcat/work /home/myapps/tomcat/work/Catalina /home/myapps/tomcat/work/Catalina/localhost /home/myapps/tomcat/work/Catalina/localhost/ROOT /home/myapps/tomcat/temp /home/myapps/tomcat/temp/safeToDelete.tmp /home/myapps/tomcat/webapps /home/myapps/tomcat/webapps/ROOT /home/myapps/tomcat/webapps/ROOT/index.html …
This hasn’t worked and the reason is that the -path value is simply being matched as plain text against each entry found. Since our generated matches are no longer relative to the tomcat folder we also need to update what we’re searching for
find /home/myapps/tomcat -path "/home/myapps/tomcat/work" -prune -or -print /home/myapps/tomcat /home/myapps/tomcat/temp /home/myapps/tomcat/temp/safeToDelete.tmp /home/myapps/tomcat/webapps /home/myapps/tomcat/webapps/ROOT /home/myapps/tomcat/webapps/ROOT/index.html …
Multiple sub-directories
That temp folder is also kinda useless so let’s expand our command to exclude that one as well. We do this by combining additional expressions with -or:-
find /home/myapps/tomcat -path "/home/myapps/tomcat/work" -prune -or -path "/home/myapps/tomcat/temp" -prune -or -print /home/myapps/tomcat /home/myapps/tomcat/webapps /home/myapps/tomcat/webapps/ROOT /home/myapps/tomcat/webapps/ROOT/index.html …
This command is starting to get a bit involved, so we’ll take advantages of find’s bracketing to make our expressions a little clearer:-]
find /home/myapps/tomcat \( -path "/home/myapps/tomcat/work" -prune \) -or \( -path "/home/myapps/tomcat/temp" -prune \) -or \( -print \) /home/myapps/tomcat /home/myapps/tomcat/webapps /home/myapps/tomcat/webapps/ROOT /home/myapps/tomcat/webapps/ROOT/index.html …
When doing this we need to leave a space around our brackets and escape them with a \ to stop the shell interpreting them rather than the find command.
The directory trap
We can now use our generated manifest file as a parameter to tar to create our backup:-
tar Tcfz manifest.txt mybackup.tar.gz
But a quick check of the contents of that tar file shows it’s not doing quite what we’d expected:-
tar tvfz mybackup.tar.gz drwxrwxr-x devsumo/devsumo 0 2012-10-20 22:03 home/myapps/tomcat/ drwxr-xr-x devsumo/devsumo 0 2013-07-09 20:50 home/myapps/tomcat/work/ drwxr-xr-x devsumo/devsumo 0 2013-07-09 20:50 home/myapps/tomcat/work/Catalina/ …
Our backup file still contains the files and directories we’d filtered out. This isn’t so much a mistake in our original find command as a “feature’ of tar. If we supply tar with a directory name it’ll include that directory and all its subdirectories. Since the first line in our find command’s output was /home/myapps/tomcat everything under that folder gets included. Worse still, since tar just processes each entry in the list sequentially it doesn’t “know” it’s already included those files and goes on to include them again later in the file.
Since we’re only interested in the files anyway we can eliminate this problem by adding a -type f to our -print expression to ensure tar isn’t given any directories to explicitly include:-
find /home/myapps/tomcat \( -path "/home/myapps/tomcat/work" -prune \) -or \( -path "/home/myapps/tomcat/temp" -prune \) -or \( -type f -print \) /home/myapps/tomcat/webapps/ROOT/index.html …