diff --git a/docs/appendix-d-filters.md b/docs/appendix-d-filters.md index 590cfc9..81b54f5 100644 --- a/docs/appendix-d-filters.md +++ b/docs/appendix-d-filters.md @@ -9,7 +9,7 @@ Duplicati's filter engine processes folders first then files. The reason for tha When filter rules have been defined the first folder is taken and the filter rules are processed one by one. The first rule that matches is applied and the following rules are not processed anymore. For instance, if the first rule excludes a folder, then this folder and all files within will be excluded from the backup even if following rules include this folder or its files. Likewise, if the first rule includes a folder, then it will be included even if a following rule would exclude it. -It is recommended to write folder rules first and file rules afterwards. That way rules are written in the same order as they will be effective when Duplicati processes them and Duplicati's filters are easier to understand that way. +It is recommended to write folder rules first and file rules afterwards. Also it is recommended to write the folder rules one directory level at a time. That way rules are written in the same order as they will be effective when Duplicati processes them and Duplicati's filters are easier to understand that way. Per default, all files and folders will be backed up. That means, if no rule matches, the file or folder will be included. In the special case that all rules are include rules (which does not make sense when all files and folders are included per default) Duplicati assumes that all other files and folders are meant to be excluded (this had to be defined as another rule in Duplicati 1.3 but most people found that confusing so we changed that in Duplicati 2.0). @@ -47,6 +47,12 @@ In the UI, filters can be created using drop down boxes for common rule types. M Using the command-line there are specific settings to specify include or exclude rules. These are `--include` and `--exclude`. Multiple rules can be specified by using `--include` or `--exclude` repeatedly. +### Creating and validating your filters + +Duplicati UI updates the file and folder include/exclude icons on the fly to reflect the current filters. A green check-mark indicates that the folder will be traversed by Duplicati, but its content may be excluded by other filters. + +![Filter example](duplicati-filters-match-example.png "Filter example") + ### Settings Besides filter rules there are settings that can exclude specific files by their attributes. Those settings are `--skip-files-larger-than` and `--exclude-files-attributes`. The latter is able to exclude files that have any of the following attributes: `ReadOnly`, `Hidden`, `System`, `Directory`, `Archive`, `Device`, `Normal`, `Temporary`. Those settings are applied to all files of the backup. @@ -61,3 +67,52 @@ Besides filter rules there are settings that can exclude specific files by their **Include some files, exclude others.** Now let's define a filter that does both of the above. First it excludes @eaDir specifying `-*/@eaDir/`. Then it includes only JPG files specifying `+*.jpg`. The problem here is, that Duplicati includes all files and folders per default. This means that e.g. /photos/movie.avi will also be part of the backup. To make the including rule effective an additional rule is required that excludes all files that do not match any of the current rules. The filter must say "exclude this, exclude that, include this but nothing else". The best rule for "but nothing else" is a regular expression that excludes all files. It is `-[.*[^/]]` on Linux or Mac, and on Windows the rule is `-[.*[^\\]]`. The rule says "exclude everything that is not a folder". The final filter then is `-*/@eaDir/ +*.jpg +*.jpeg -[.*[^/]]`. Duplicati will process all folders but @eaDir/ and it will include JPG and JPEG files but exclude all other files. + +**Advanced regular expression filter example** + + * Suppose we want `/mnt/(user|disk\d+)/media/.*` but not `Movie.*` folders within `media/` + * This includes `/mnt/user/media/X`, `/mnt/disk23/media/Y`, but not `/mnt/user/media/Movie/DieHard.mkv`. + * We don't want `/mnt/user0/.*`, or `/mnt/user/secrets` + +Duplicati applies the filters to a folder before its children, searching for the first filter-line matching the folder. If a parent path matches an exclude then that whole tree is cut off. Conceptually its easiest to build the expressions by starting at the top of the folder hierarchy and move down one level including/excluding the desired files. So let us first match the parent folders we want to be processed, then remove those we don't want. Then we add the subfolder we want, and exclude all others. + + * Source set: /mnt/ + * `+[/mnt/(disk\d*|user)/]` + * `-[/mnt/[^/]*/]` + * `+[/mnt/[^/]*/media/]` + * `-[/mnt/[^/]*/[^/]*/]` + * `-[/mnt/[^/]*/media/Movie[^/]*/]` + * `+[/mnt/[^/]*/media/[^/]*/]` + * `-[/mnt/[^/]*/.*\.log]` + +Note that `.*` matches anything, and `[^/]*` matches anything NOT containing a `/` (linux path separator). + +* The first two lines match the root folders in our `/mnt` source-set, including only folders like `disk123` and `user`. +* The next two lines include only media subfolder in the already included. Notice how `[^/]*` does not match a path-separator `/` +* The 5-6 lines exclude directories like `Movie` and `MovieSeen`, but includes the rest +* The last line excludes all files with `.log` extension (case-sensitive). Here we use `.*` to match including paths. An equivalent alternative is `[/mnt/[^/]*/.*/[^/]*\.log]` where the last path separator is included so `[^/]*\.log` matches a filename. Or just `[.*\.log]` to skip all that verbosity (and stay in the reg-exp world, which isn't requried). + +Notice also in lines 5-6 we stay on one directory level, we do not match `Movie.*`. Both would be valid, The reason being that it is then easier to remember that Duplicati works through directories a level at a time. + +The 6. filter (include) can be omitted, as files which do not match a filter are included (unless all filters are include filters). + +Here we utilize that in Duplicati all folders end in / (in Linux), while a file does not end in / (for Windows its backslash). + +# Testing Filters with command line tool + +Filters can be tested with the command line tool, see https://github.com/duplicati/duplicati/wiki/Headless-installation-on-Debian-or-Ubuntu and https://duplicati.readthedocs.io/en/latest/04-using-duplicati-from-the-command-line/ + +For instance test the above example on a [linuxserver docker](https://hub.docker.com/r/linuxserver/duplicati) or [duplicati/duplicati](https://hub.docker.com/r/duplicati/duplicati) with + +``` +docker exec duplicati mono /app/duplicati/Duplicati.CommandLine.exe test-filters /mnt/ \ + --include="[/mnt/(disk\d*|user)/]"\ + --exclude="[/mnt/[^/]*/]"\ + --include="[/mnt/[^/]*/media/]"\ + --exclude="[/mnt/[^/]*/[^/]*/]"\ + --exclude="[/mnt/[^/]*/media/Movie[^/]*/]"\ + --include="[/mnt/[^/]*/media/[^/]*/]"\ + --exclude="[/mnt/[^/]*/.*\.log]" +``` + +**Building Regular Expressions:** There are lots of online services such as [Skinners RegExp engine](https://regexr.com) to help build correct regular expressions. \ No newline at end of file diff --git a/docs/duplicati-filters-match-example.png b/docs/duplicati-filters-match-example.png new file mode 100644 index 0000000..d280a90 Binary files /dev/null and b/docs/duplicati-filters-match-example.png differ