1. Preamble
parsyncfp2 intially started as a one-off bash script and moved to Perl (as parsync) when that became too cumbersome. I incorporated fpart and it became parsyncfp, running on a single host, transmitting to a single rsync receiver host. This MultiHost (MH) version allows it to run over multiple SEND hosts with shared storage to cooperatively send data much faster. It still runs on a SingleHost and with a 64core AMD CPU, will happily saturate a 100Gbs pipe. In MultiHost mode, it can also send data to multiple storage endpoints simultaneously, as well as the same shared endpoint via multiple hosts.
Rather than appending yet more acronymic characters to the name, I differentiated it with the major version number, so … parsyncfp2 or pfp2. In the docs and src code, you may still find references to parsync, parsyncfp, as well as various abbriev’s (why does abbrieviation need one?).
2. Introduction
parsyncfp2 (pfp2) is a Perl script that wraps Andrew Tridgell’s & Paul Mackerras' miraculous rsync to provide load balancing and parallel operation across network connections to substantially increase the amount of data it can send simultaneously. pfp2 exploits parallel operation to decrease the impact of the TCP Round Trip Time(rtt) to significantly increase the total bandwidth of data across networks. For more information about the variables surrounding data transfer over networks, see How to Move Data. Even on low-latency networks, it can speed large transfers by 4-10x. However it is not effective for small transfers, since the startup overhead will slow the effective throughput.
2.1. General Features
Versions <2 allowed the SingleHost (SH) version to use 10s to 100s of rsyncs to increase aggregate bandwidth. Versions >2 allow MultiHost (MH) send & receive to increase bandwidth saturation to both regular rsync connections as well as rsyncd servers. This allows the traffic to be split out to servers on different networks as well as sending to multiple filesystems on the receiving end (tho the split dirs would then have to be re-combined).
pfp2 uses Ganael Laplanche’s excellent fpart to dynamically create chunkfiles for rsync to read, bypassing the need to wait for rsync’s complete recursive scan. ie, it starts the transfer almost immediately, as soon as the first chunk is written. For large, deep trees, this can be quite useful. Also see the filesfrom options below. pfp2 also allows huge transfers to take place without the memory overflow sometimes seen with using a single rsync, due to splitting the memory required over many smaller rsync instances.
In both SH and MH, pfp2 monitors the system loadavg. It will suspend spawned rsyncs until the 1m load decreases below the cutoff, then UNsuspend them as the load decreases below it.
In the SH version, suspending the parent pfp2 (with Ctrl+Z) will suspend all rsync children, regardless of current state. Similarly, if you kill the parent pfp2 (Ctrl+C), all the children rsyncs will die with various cries of distress, depending on their states. In the MH version, the spawned rsyncs are running independently on separate hosts and can only be controlled by commands issued to that host. ie you have to ssh to the host and suspend or kill the processes separately. A version where the hosts communicate via sockets is in the works and a killer script pfp2stop is written out at each MH invocation, which will ssh to each of the SEND and REC hosts to kill off all the rsync and pfp2 processes running.
pfp2 can send files to any host with a standard rsync on the other end. In normal client mode (the remote rsync starts up on demand via ssh) the target syntax is either host:/fully/qualified/path or host:path (implying a dir off the user’s HOME dir (specified in other apps as as host:~/path, but unacceptable to a native rsync). pfp2 can also send data to an rsyncd server. The rsyncd target syntax requires a module name (host::module) and the user must be pre-registered in the server’s /etc/rsyncd.conf and /etc/rsyncd.secrets file - see man rsyncd.conf, unless the server is running without any kind of authentication.
Unless changed by --interface, pfp2 assumes and monitors the routable interface. The transfer will use whatever interface normal routing provides, normally set by the name of the target. While rsync can be used for non-host-based transfers (between mounted filesystems), it works less well than for strictly network-based syncs. pfp2 will honor requests to sync across local filesystems and shows low but significant speedup (2x-6x) on large transfers.
pfp2 only works on dirs and files that originate from the current dir (or specified via --startdir). You cannot include dirs and files from discontinuous or higher-level dirs. pfp2 also does not use rsync’s sophisticated/idiosyncratic treatment of trailing ‘/s’ to direct where files vs dirs are sent; dirs are treated as dirs regardless of the trailing ‘/’.
The .pfp2 dir : (unless redirected to another dir via the --altcache option), this contains the cache dir (fpcache, which is cleared on each run), and the time-stamped rsync log files. These can accumulate quickly since each rsync instance will leave a date-stamped log. If you use the MH version, the .pfp2 dir is created in the common shared directory (--commondir), and contains the (common) fpcache dir. The rsync logs are stored in the host-named subdirectories in the .pfp2 dir and are NOT deleted by the next pfp2 run.
If you use the MH version, the STDERR/STDOUT of the entire transfer (the text that’s written to the screen) from each of the SEND hosts is captured in the host-specific dir named pfp2-log-(time)_(date).
Due to the terminal text coloration, the pfp2-log files are best viewed by cat’ing them to the terminal and then if necessary, copy-pasting them from the terminal.
Odd characters in names : pfp2 will refuse to transfer some oddly named files (tho it should copy filenames with spaces fine. Filenames with embedded newlines, DOS EOLs, and some other odd chars will be recorded in the log files in the .pfp2 dir (see above). You should be able to specify dirs and files in the pfp2 command with either/both escaped spaces or with quotes: "file\ with\ spaces" or ‘file with spaces’. Internal to pfp2, rsync rules prevail.
2.2. Release License
parsyncfp2 is distributed under the Gnu Public License (GPL) v3.
2.3. Installation
Installation of parsyncfp2 is fairly simple. There’s not yet a deb or rpm package, but the bits to make it work that are not part of a fairly standard Linux distro are the Perl scripts parsyncfp2, scut (like cut but more flexible), and stats (spits out descriptive statistics of whatever semi-numeric stream is fed to it). The rest of the dependents are listed here:
- 
Debian/Ubuntu-like: 
sudo apt install ethtool iproute2 fpart iw libstatistics-descriptive-perl infiniband-diags git clone https://github.com/hjmangalam/parsyncfp2 cd parsyncfp2; cp parsyncfp2 scut stats ~/bin
- 
RHel/Centos/Rocky-like: 
sudo yum install iw fpart ethtool iproute perl-Env.noarch \ perl-Statistics-Descriptive wireless-tools infiniband-diags git clone https://github.com/hjmangalam/parsyncfp2 cd parsyncfp2; cp parsyncfp2 scut stats ~/bin
2.3.1. Required utilities and packages
Should the above commands not fulfill the requirements or be missing from your set of repositories, the utilities are listed below.
- 
ethtool - query or control network driver and hardware settings. Install via repository. 
- 
ip - show / manipulate routing, network devices, interfaces and tunnels. Install via repository. 
- 
fpart - Sort and pack files into partitions. Now in many distro repositories or install from: github; 
- 
scut - a more intelligent cut. Included in the parsyncfp2 github 
- 
stats - calculate descriptive stats from STDIN. Included in the parsyncfp2 github 
- 
Perl::Descriptive-Statistics - basic descriptive statistical functions 
2.4. Recommended Utilities
- 
iwconfig - configure a wireless network interface. Needed only for WiFi. Install via repository. 
- 
perfquery - query InfiniBand port counters. Needed only for InfiniBand. Install via repository. 
3. Options in detail
pfp2 has a lot of options, but most are straightforward. The MultiHost and FilesFrom options require a little more description and are described in their own sections below.
3.1. Basic Options for both SH and MH
The only native rsync options that pfp2 uses are -a (archive), -s (protect-args), and -l (copy symlinks as symlinks). If you need to pass more options to rsync, then it’s up to you to provide them ALL via --ro and you must include the entire option string as rsync would see it (--ro=-slaz --times)
In the list pfp2 options below, the brackets indicate:
[i] = integer number, [f] = floating point number, [s] = "quoted string", ( ) = the default if any
- 
--NP|np [i] (sqrt(#CPUs))The number of rsync processes to start. The optimal NP depends on many variables. Try the default and increase as needed. No point in using a high NP if your network won’t support it. 
- 
--altcache|ac [/path/to/dir] : The alternative cache dir for placing it on another FS or for running multiple SH (not MH) pfp2s simultaneously 
- 
--startdir|sd [s] (pwd) : The top-level directory at which pfp2 starts looking for files & dirs. You can use globs/regexes with --startdir, but only if you’re at that point in the dir tree. ie: if you’re not in the dir where the globs can be expanded, then the glob will fail. However, explicit dirs can be set from anywhere if given an existing dir with --startdir. 
- 
--maxbw [i] in KB/s (unlimited): pfp2 appropriates rsync’s bandwidth throttle mechanism, using --maxbw as a passthru to rsync’s bwlimit option, but divides it by the NP value so as to keep the total bandwidth the same as the stated limit. It monitors and shows total (not just pfp2’s) bandwidth thru the given interface. 
- 
--maxload|ml [f] (NP*2) : max system load - if 1m loadavg > maxload, 1 rsync process will be suspended per checkperiod cycle until the loadavg decreases below the maxload. At that point, the suspended rsyncs will be UNsuspended, one per checkperiod. rsync is very CPU-light; running 6 rsyncs with compression (--ro=-slaz) causes an increase in loadavg of only about 1-2, depending on the storage systems. This is handled independently on each of the SEND hosts. 
- 
--chunksize|cs [s] (10G) : aggregate size of files allocated to one rsync process. Can specify in human terms [100M, 50K, 1T] as well as integer bytes. pfp2 will warn once when/if you exceed the WARN # of chunkfiles (2000) and abort if you exceed the FATAL # of chunkfiles (5000). You CAN force it to use very high numbers of chunkfiles by setting the number negative (--chunkfile=-50GB), but this can be risky. Optimally, you want to choose a chunksize that will result in a fairly short startup time but will not result in 10s of 1000s of files. Decrease the NUMBER of chunkfiles by increasing the SIZE of the chunkfiles. The sweet spot is to choose a chunksize that will result in no more than 10x the NP number, so if --NP=20, there should be no more than ~200 chunkfiles, altho you have very broad latitude to set your own preferences. 
- 
--interface|I [s] : network interface to monitor (not use; see above). Only SENT bytes are displayed, and the bytes are the total sent thru the link, not just from pfp2, so it’s a rough estimate of the the bandwidth. 
- 
--ro [s] : Options passed to rsync as quoted string . This option triggers a pause before executing to verify the command. The --ro string can pass any rsync option to all the rsyncs that will be started. This allows options like -z (compression) or --exclude-from (filter out unwanted files). If you use this option, you’re responsible for supplying ALL the options and providing the files and formats required. The --ro string is NOT appended to the default -as string. DO NOT use any delete options with this utility. See Hints below. 
- 
--checkperiod|cp [i] (3) : Sets the period in seconds between updates. This is a best effort attempt. If chunksize is set so small so 1000s of chunkfiles are created, file IO may lengthen this time. 
- 
--rdma : report RDMA bytes thru the IB or IB-bonded interface, otherwise, only TCP/IP bytes will be reported. 
- 
--reusechunks [i] (1) : Re-use the chunking data collected for the previous run, using the same chunk size. Useful for restarting a run that was mistakenly ended w/o waiting for fpart to recalculate the chunks. The integer argument is the chunk to start at, so rather than running thru all the (possibly 100s of) chunks, you can start at the one closest to where the interruption occurred. 
- 
--verbose|v [0-3] (2) : Sets chattiness. 3=very; 2=normal; 1=less; 0=none. This only affects verbosity post-start; warning & error messages will still be printed. This is a work in progress. 
- 
--slowdown [f] (0.5) : Introduces delays between ssh-mediated commands if the RTT is too long. It’s increased in steps automatically for large RTTs, but this option allows you to explicitly slow down the speed at which ssh connection are made. Increment in integer seconds if you see errors like: rsync error: unexplained error (code 255) at io.c(xxx) [sender=x.x.x] 
- 
--dispose|d [s] (l) : What to do with the fpart cache files. (l)eave untouched, (c)ompress to a tarball, (d)elete. 
- 
--email [s] (none) : Email address to send completion message. The email address should not need escaping or quoting but should also work with them as well (joe\@go.com). The SEND host will need a working mailer for this to work. 
- 
--nowait : For scripting, sleep for a few sec instead of pausing and waiting for human intervention. 
- 
--version|V : Dumps version string and exits 
- 
--help|h : Dumps a short version of this help into your pager and then exits when you quit. 
3.2. MultiHost (MH) Operation
3.2.1. Overview
The single pfp2 script has both SH and MH functionality.
The MH options allows you to rsync in parallel streams via multiple SEND hosts to the same or multiple RECEIVE hosts, including sending to different filesystems on the different RECEIVE hosts. The RECEIVE hosts can be:
- 
standard servers which launch matching rsyncs via the usual mechanism. These can also have the same or different endpoints. 
- 
rsyncd servers with different modules and as such, can define different authentication for different users and different endpoints for the data. The comprehensive description of how this works is described in rsyncd.conf(5). Make sure that the rsyncd can start as many rsyncs as the sending hosts require by modifying the max connections line. 
Both types can be mixed in the same hosts string. The MH version requires that the initiator and all SEND hosts have access to a common filesystem for both data and configuration info.
|   | The required last element in a MH command is POD::/path (POD for a pod of whales) which is the default path for any RECEIVE hosts that haven’t been defined in the --hosts option. This is only the case for regular paths, not for rsyncd module definitions. So while the terminal target path will be appended to otherwise naked RECEIVE hosts, rsyncd modules have to be completely specified in the hosts file as host::module (More info below and see Good Example 5, Good Example 6 Good Example 7 below). | 
3.2.2. MultiHost pfp2 sequence:
- 
start the process on the master host 
- 
process the options 
- 
check the status & separation of the SEND and REC hosts and rsync some required utilities to the SEND hosts (requested via --checkhosts) and verify that the pfp2 scripts being used are identical. 
- 
start the fpart chunking process on the master node (unless it’s been done previously and you’re using the --reusechunks option.) 
- 
reformat the pfp2 command based on the original options and how many SEND hosts were requested 
- 
start the SEND host processes (using the same pfp2 Perl script), each with the same number of parallel rsyncs. 
- 
and then exit the master process. 
The SEND hosts will continue to send output back to the originating terminal (prefixed or suffixed) with the SEND hostname so you can decipher which SEND host is saying what. This information is not failsafe since output from different hosts can overwrite each other. If you wish to view the complete output per SEND host, each SEND host log can be found in the host-specific subdir in the file pfp-log-$Mar 23, 2023.
However, unlike the original parsyncfp or using the SH option, killing or suspending the originating program will have no effect on the SEND hosts; the remote rsyncs are independent and have to be killed manually. This SEND host independence should be addressed shortly via socket-based controls.
In the meantime, a killer script called pfp2stop is automatically generated when a MH run is initiated that will ssh to each SEND and RECEIVE host and kill off all YOUR rsync and pfp2 processes (even those not associated with the instigating pfp2, so be careful). The pfpstop script is usually placed in your parsync_dir and its exact path is emitted a couple times in the run of the pfp2 script as a reminder.
3.3. Options for MultiHost transfers
The MultiHost (MH) version allows you to rsync multiple streams of data via multiple SEND hosts to the same or multiple RECEIVE (REC) hosts, including different filesystems on the different REC hosts. The REC hosts can be: . rsyncd servers with multiple modules and as such, can define different auth for different users and different endpoints for the data. The comprehensive description of how this works is described in rsyncd.conf(5) . standard servers which launch matching rsyncs via the usual mechanism. These can also have the same or different endpoints.
In a MH command, the last phrase is the POD:: string. This not only defines the command as MH, but also provides the default storage path for all REC hosts in the --hosts argument that lack an explicit one.
Both types can be mixed in the same hosts string. The MH version requires that the master and all the send hosts (which can include the master) have access to a common filesystem for both data and configuration info.
- 
--checkhost : Requests a pre-check to make sure that the SEND & RECEIVE hosts specified with --hosts do not have any rsyncs running. If they do, the number of them is reported. Those rsyncs may be valid and independent of pfp2 but it may be evidence of a failed pfp2 which may interfere with another pfp2 launch. This option also pushes the required utilities to the SEND hosts to make sure that they have the utilities necessary to run with full functionality. 
- 
--commondir [s] : The shared, common dir in which all chunk files and rsync logs will be stored. Similar to --altcache but MUST be readable by all SEND hosts. 
- 
--rpath [s] : the remote PATH prefix on the SEND hosts to check for the bits needed to run this. It is prefixed to the remote ssh cmd as export PATH=<rpath string>:$PATH; The rpath string can contain as many paths as you’d like, separated by colons (:), tho vars have to be escaped appropriately. 
ie:
  --rpath="~/bin:$HOME/pfp2/bin"
  (default is ~/bin:$parsync_dir/.pfp2), and ':$PATH is also appended so
        --rpath="~/bin:$HOME/pfp2/bin"
            is transmitted as:
        --rpath="~/bin:$HOME/pfp2/bin:$PATH"
- 
--hosts [s] : the string argument specifies the SEND and REC hosts, optionally supplying REC hosts with individual alternate paths to store data. The --hosts string format is a comma-delimited set of Send=Receive hosts. 
 example: "s1=r1:/path1,s2=r2:/path2,s3=r3:/path3,s4=r4,s5=r5"
 where each s# and r# imply a full "user\@host" string. s# and r# obey the standard Linux rules that they are either long or short hostnames that are resolvable by your DNS or by an entry in the /etc/hosts file or a numeric address (113.42.23.56). Also, each r# can have a storage path appended (r2:/path2). If the REC path is not given, the path from the final POD::/path target is appended. ie pfp [option option option..] POD::/common/default/receive/target.
If you specify different REC paths, the SEND data will be split over those host:/path combinations, so they will have to be manually combined afterwards. This is to allow different remote filesystems to accept high bandwidth transmission without impacting other FS operations. The SEND=REC couplets follow ssh rules so that if the user at one of the hosts is different than the one being used to initiate the process, you’ll have to specify the user. Similarly for the REC host, if the user is different than the initiating USER. ie: in the following option string:
--hosts="cooper=ben,tux@chinstrap=hjm@ben,nash=ben"
hjm is the initiating user and is the mediating user on cooper, ben, and nash, while tux is the mediating user on chinstrap. Because tux@chinstrap is mediating the command, ssh assumes the same user on ben, so hjm@ben has to be explicitly specified. The required last element in a MH command is POD::/path which is the default path for any REC hosts that haven’t been defined in the --hosts option. (More info below and see Good Example 4 & 5 below)
For rsyncd targets, you can specify the REC hosts as:
r1::module_name r2::module_name2 etc
and you can mix rsyncd targets with regular rsync targets so a valid hosts string could be:
"s1=r1:/path1,s2=r2::mod2,s3=r3:/path3,s4=r4::mod4,s5=r5"
However, unless the rsyncd server is open (without authorization) you must export your RSYNC_PASSWORD in the SEND host’s ~/.bashrc for this to work, or use --ro="--password-file=FILE" to point to a permission-protected file containing the appropriate credentials. Otherwise, the responding rsyncd will query for your rsync user password (not your login password). This is defined in the rsyncd host’s /etc/rsyncd.secrets file and explained in detail via man rsyncd.conf(5).
The master parsyncfp2 command will exit once the fpart chunking process is finished and leave the rsyncs running independently on the SEND shosts. They will continue to send output back to the originating terminal (prefixed or suffixed) with the SEND hostname so you can decipher which SEND host is saying what.
However, unlike the single-host version, killing or suspending the originating program will have no effect on the SEND hosts; the remote rsyncs will have to be killed manually. This is made easier with a kill script that is generated at YOUR rsync and parsyncfp2 instances running (including ones that were not part of the the originating parsyncfp2, so be careful).
This SEND host independence should be addressed shortly via socket-based controls.
3.3.1. Stopping a MultiHost pfp2
As noted above in the Overview, a crude pfp2stop bash script is generated for each run of the pfp2 MultiHost version and will kill all running rsyncs and pfp2 processes on all the hosts specified in the --hosts option string.
3.4. Options for using filelists
(thanks to Bill Abbott for the inspiration/guidance).
These options were created so that people who use filesystem databases such as Robinhood or Starfish, or filesystems such as GPFS, can generate lists of files directly from these utilities and avoid the (fast, but additional) overhead of running fpart.
These options work with the MH version as well as the SH version.
The 3 options below provide a way of explicitly naming the files you wish to transfer by providing a file of fully qualified filenames. ie. the names start with a leading /.
If you use this list directly with rsync, it will remove the leading / but then place the file with that otherwise full path inside the target dir. So /home/hjm/DL/hello.c would be placed in /TARGET/home/hjm/DL/hello.c. If this result is OK, then simply use the --filesfrom option to specify the file of files. If this is NOT OK, see the -trimpath option below.
If the list of files are NOT fully qualified then you should make sure that the command is run from the correct dir so that the rsyncs can find the designated dirs & files.
- 
--filesfrom|ff [s] : Take explicit input file list from given file, 1 path name per line. 
- 
--trimpath|tp [s] : The path to trim from the front of full path name if --filesfrom file contains full path names and you want to trim them. If you want the file /home/hjm/DL/hello.c to end up as /TARGET/DL/hello.c (ie remove the original /home/hjm), you would use the --trimpath option as follows: --trimpath=/home/hjm. This will remove the given path before transferring it and assure that the file ends up in the right place. This should work even if the command is executed away from the directory where the files are rooted. If you have already modified the file list to remove the leading dir path, then of course you don’t need to use this option. A trailing / is not required; it will be removed regardless. 
- 
--trustme|tm : Used with --filesfrom above allows the use of file lists of the form: 
size in bytes<tab>/fully/qualified/filename/path
825692            /home/hjm/nacs/hpc/movedata.txt
87456826          /home/hjm/Downloads/xme.tar.gz
Such a file format can be generated with 'find' in the format:
  find $PWD/{dir} {criteria} -type f -printf '%s %p\n' | sed -e 's/ /\t/'
  ie:
  find $PWD/dir42  -maxdepth 5 -mtime +183 -type f -printf '%s %p\n' | sed -e 's/ /\t/'
  (to find regular files within 5 levels deep and >  183 days old)
4. Hints & Workarounds
|   | rsync --delete options will not work with --ro because the multiple parallel rsyncs that parsyncfp launches are independent and therefore don’t know about each other (and so cannot exchange info about what should be deleted or not. Use a final, separate rsync --delete to clean up the transfer if that’s your need. | 
Also, rsync options related to additional output has been disallowed to avoid confusing pfp2’s IO handling. -v/-verbose, --version, -h/--help are caught, and pfp2 will die with an error. Most of the info desired from this are captured in the rsync-logfile files in the ~/.parsyncfp dir.
Unless you want to view them, it’s usually a good idea to send all STDERR to /dev/null (append * 2> /dev/null * to the command) because there are often a variety of utilities that get upset by one thing or another. Generally, silencing the STDERR doesn’t hurt anything.
5. Examples
5.1. Good example 1
% parsyncfp2 --maxload=5.5 --NP=4 \ --chunksize=\$((1024 * 1024 * 4)) \ --startdir='/home/hjm' dir[123] \ hjm@remotehost:~/backups 2> /dev/null
where:
- 
--maxload=5.5 will start suspending rsync instances when the 1m system load gets to 5.5 and then unsuspending them when it goes below it. 
- 
--NP=4 starts 4 instances of rsync 
- 
--chunksize=\$1024 * 1024 * 4 sets the chunksize, by multiplication or by explicit size: 4194304 
- 
--startdir=/home/hjm'' sets the working dir of this operation to /home/hjm and dir1 dir2 dir3 are subdirs from /home/hjm 
- 
the target hjm@remotehost:~/backups is the same target rsync would use 
- 
2> /dev/null silences all STDERR output from any offended utility. 
- 
It uses 4 instances to rsync dir1 dir2 dir3 to hjm\@remotehost:~/backups 
5.2. Good example 2
% parsyncfp2 --checkperiod 6 --NP 3 \ --interface eth0 --chunksize=87682352 \ --ro="--exclude='[abc]*'" nacs/fabio \ hjm\@moo:~/backups
The above command shows several options used correctly:
- 
--chunksize=87682352 - shows that the chunksize option can be used with explicit integers as well as the human specifiers (TGMK). 
 
- 
--ro="--exclude=[abc]*" - shows the correct form for excluding files based on regexes (note the quoting in block above to protect the regex as it gets passed thru) 
- 
nacs/fabio - shows that you can specify subdirs as well as top-level dirs (as long as the shell is positioned in the dir above, or has been specified via --startdir 
 
5.3. Good example 3
parsyncfp2 -v 1 --nowait --ac pfp2cache1 --NP 4 --cp=5 --cs=50M --ro '-az' \ linux-4.8.4 moo:~/test
The above command shows:
- 
short version of several options (-v for --verbose, --cp for checkperiod, etc) 
- 
shows use of --altcache (--ac pfp2cache1), writing to relative dir pfp2cache1 
- 
again shows use of --ro (--ro -az) indicating archive & compression. 
- 
includes --nowait to allow unattended scripting of parsyncfp 
5.4. Good example 4
parsyncfp2 --NP=8 --chunksize=500M --filesfrom=/home/hjm/dl550 \ hjm\@moo:/home/hjm/testparsync
The above command shows:
- 
if you use the --filesfrom option, you cannot use explicit source dirs (all the files come from the file of files (which require full path names) 
- 
that the --chunksize format can use human abbreviations (m or M for Mega). 
5.5. Good example 5 (MultiHost)
parsyncfp2 --verbose=2 --ro='-asz' \
--hosts="bigben=bridgit.ure.edu:/d1/in, \
          pooki=bridgit.ure.edu:/d2/in, \
        stunted=bridgit.ure.edu:/d3/in" \
--hostcheck --ro="-asz"  --NP 4 --chunk 15G \
--check 5 --dispo=l --interface=wlp3s0 \
--commondir=/home/hjm/pfp2 --startdir /home/hjm/pfp2 \
dir1 dir2 dir3 dir4  POD::/
The above MH command shows:
- 
3 SEND hosts (bigben, pooki,stunted) all sending data to the REC host bridgit.ure.edu altho the data is being split among 3 filesystems. You could also define 3 REC hosts, writing data to the SAME PATH if that was a better performance fit. 
- 
You could also define 3 REC hosts, writing data to the SAME PATH if that was a better fit: 
 ...
  --hosts="bigben=bridgit.ure.edu:/d1/in, \
            pooki=bridgit.ure.edu:/d1/in, \
          stunted=bridgit.ure.edu:/d1/in" \
 ...
  and even shorter:
 ...
  --hosts="bigben=bridgit.ure.edu, \
            pooki=bridgit.ure.edu, \
          stunted=bridgit.ure.edu" \
 ...
with the final argument as:
      POD::/d1/in
which would distribute the same 'POD::' suffix to all REC hosts.
- 
the preferred way of defining the rsyncopts with --ro=-asz 
- 
the --dispo=l option requests that the cachefiles be left alone. In MH mode the chunk files MUST be left, since all the independent SEND hosts need to reference them until they’re finished. 
- 
the POD::/ terminal element is the (required) default path for any undefined REC hosts. Since all of the REC hosts paths are defined, they aren’t affected. 
5.6. Good Example 6 (MultiHost)
cd /home/pfp; ~/bin/pfp2 --ro='-slaz' --chunk=50M --dispose=c --NP=6 \ --commondir=/home/pfp --filesfrom=/home/pfp/recentfilelist.txt \ --trustme --trimpath='/home/pfp' --checkhost \ --hosts="stunted=bridgit,bigben=bridgit" POD::~/test
This example shows:
- 
that you can symlink or rename the parsyncfp2 executable anything (to pfp2, above) and it will continue to be usable. The executable started is compared to the remote one (and is rsync’ed to the SEND hosts, if the --checkhost option is used, as it is here). 
- 
using the --filesfrom options in MH mode, where the prefix /home/pfp is removed from the path of all the filenames with the --trimpath option and the filenames are supplied with sizes, indicated by the --trustme option. 
- 
the TARGET string POD::~/test indicating that the naked RECEIVE hosts (stunted, bigben) are automatically suffixed with the string :\~/test 
- 
an incorrect option --dispose=c that is overridden in the process. The chunk files need to be kept until the end so the given --dispose option is detected and changed to --dispose=l to enable this. 
- 
the use of --checkhost to make sure all the MH hosts are in good shape to begin an pfp2 session. 
5.7. Good example 7 (Multihost)
parsyncfp2  --hostcheck --NP=16 --chunk=50G --check 5  \\
--hosts="bigben=tux@moon1, \\
          pooki=tux@moon2, \\
        stunted=tux@moon3  \\
         cooper=gibson@moon4::circadian" \\
--maxload=20 --ro='-slaz' \\
--commondir=/home/pfp --startdir /home/pfp/incoming \\
dir1 dir2 dir3 dir4  POD::/d1/incoming
The above multihost command shows 4 SEND hosts (bigben, pooki, stunted, cooper) each sending 16 stream of data to the 4 clustered REC hosts (moon1 - moon4) with the REC data path being provided by the POD default path /d1/incoming, except for moon4 which is using a rsyncd module as the REC endpoint, with the rsyncd ID gibson as the authorized user (this requires the rsyncd password to be part of the ENV on cooper: ie the ~/.bashrc must contain RSYNC_PASSWORD=whateveritis).
Thus there are 64 (4x16) rsync streams pushing data to the REC cluster. This assumes the filesystem on the moon cluster can write that fast and that the intermediate network can provide the bandwidth. It also assumes that the rsync compression requested by the --ro (--ro=-slaz') arguments can stay below the individual 1m loadavg of 20 requested by --maxload=20. If it doesn’t, the SEND hosts will start to suspend rsyncs until the loadavg goes below 20. The --commondir and --startdir paths define the shared storage and where in it the data to be sent is stored. --commondir and --startdir do not have to be identical, but they do have to be R/W available to all the SEND hosts. The --hostcheck command makes sure that required utilities are available, that the parsyncfp2 program is identical, and also checks the latency between the SEND and REC hosts.
5.8. ERROR example 1
% pwd /home/hjm # executing parsyncfp from here % parsyncfp2 --NP4 /usr/local /media/backupdisk
why this is an error: - --NP4 is not an option (parsyncfp will say "Unknown option: np4" It should be --NP=4 or --NP 4 - if you were trying to rsync /usr/local to /media/backupdisk, it will fail since there is no /home/hjm/usr/local dir to use as a source. This will be shown in the log files in ~/.parsync/rsync-logfile-<datestamp>_# as a spew of "No such file or directory (2)" errors
The correct version of the above command is:
% parsyncfp2 --NP=4 --startdir=/usr local /media/backupdisk
Note that this example is sending data to another local mounted filesystem, not a remote host. This is OK.
5.9. Error Example 2
% parsyncfp2 hjm@moo.boo.yoo.com:/usr/local --start-dir /home/hjm mooslocal
Why this is an error:
- 
this command is trying to PULL data from a remote SOURCE to a local TARGET. pfp2 doesn’t support that kind of operation yet. 
The correct version of the above command is:
# ssh to hjm@moo, install parsyncfp2, then: % parsyncfp2 --startdir=/usr local hjm@remote:/home/hjm/mooslocal
# Error Example 3
% parsyncfp2 --NP=4 --chunksize=500M -startdir=/usr/local/bin hjm@remote.host.edu:/home/backups
Why this is an error:
- 
you’ve specified a startdir but haven’t specified the dirs or files to be transferred. 
The correct version of the above command is:
% parsyncfp --NP=4 --chunksize=500M -startdir=/usr/local bin hjm\@remote.host.edu:/home/backups
6. Block tags, Version 2.243
The following is a functional block list of how pfp2 works, described by in-line comments indented to the same degree as the code itself to provide some functional hinting. If you modify the code yourself or want to add more such comments, just prefix them with the obvious '##: ' in the code and 'grep -n '\##: ' pfp2.
24:##: == COMMON TO MASTER & SLAVES ==
25:##: Lib Requirements
36:##: Dev/github/update gunk
45:##: ITER notes
56:##: Global Vars
79:##: Pre-Getopt var declarations
97:##: Getopt options & Setup
135:##: Var declarations
150:##: Reset colors
155:##: MD5 checks of executable
167:##: Declare run-permanent vars
199:##: Define cache and log dirs
232:##: Get current system stats
244:##: Define & init Getopt flag vars
315:##: ARGV processing
324:##: parse_rsync_target call
352:##: Hostlist processing
465:##: NETIF determination
533:##: IB / perfquery
550:##: get IF_SPEED
568:##: fix .ssh/config
572:##: checkhost on SINGLEMASTER, RSYNCD, RSYNC hosts (NOT POD hosts)
584:##: Check loadavg too high
603:##: == MASTER ONLY ==
637:##: process Files & Dirs to send
684:##: Process $FROMLIST, how to set up fpart cmd
742:##: Warn about OTHER FPs running
763:##: More $FROMLIST proc
891:##: == MASTER ONLY ==
892:##: reformat orig pfp2 arguments for SEND hosts
922:##: Write out pfpstop script
954:##: == SEND hosts only (SH/MH)
955:##: Compose RSYNC_CMD & send it to all the SEND hosts
1001:##: == MASTER ONLY ==
1002:##: Write feedgnuplot script to viz data xfer
1036:##: == MASTER EXITS ==
1037:##: == SEND hosts only
1056:##: init Bandwidth vars
1076:##: Start the overall common rsync loop
1146:##: stats print loop
1450:##: Final rsync log check to verify completions.
1451:##: Detect failed rsyncs and retransmit.
1478:##: Resend failed rsyncs all at once,
1494:##: Calc bytes of rsync logs and convert raw bytes to 'human'
1514:##: Print reminders
1538:##: Exit cleanup: email
1544:##: Dispose of cache
1557:##: Exit message
1578:##: Left over orphan warning
1593:##: == Subroutines
1639:##: parse_rsync_target ($LOCALUSER, $TARGET, $ALTCACHE, $recv_hoststring)
1889:##: checkhost ( "NODETYPE", $HOST2CHECK, $RSYNCMODULE, $ALTCACHE, $VERBOSE, $MAXLOAD )
2009:##: first_run_required_utils ()
2092:##: check_ssh_ok ($HOSTNAME)
2112:##: get_nbr_chunk_files () # 1st ver
2126:##: remove_fp_cache ()
2134:##: check_utils($required_str, $recommend_str)
2180:##: get_rPIDs ($pidfile, $spids)
2243:##: trim ($string)
2254:##: getavgnetbw ($NETIF, $CHECKPERIOD, $PERFQUERY)
2290:##: pause()
2297:##: INFO($message)
2309:##: WARN($string)
2331:##: FATAL($message)
2344:##: DEBUG (__LINE__, $message)
2369:##: fixfilenames ($CUR_FP_FLE, $ROOTDIR)
2404:##: ptgmk ("154.32M")
2422:##: fix_ssh_config ()
2459:##: usage ()