Transfer protocols

A number of methods allow transferring data in and out of PSMN computing center. For most cases, we recommend using SSH-based file transfer commands, such as scp, sftp, or rsync. They will provide the best performance for data transfers from and to computing center.

SCP (Secure Copy)

The easiest command to use to transfer files to/from PSMN is scp. It works like the cp command, except it work over the network to copy files from one computer to another, using SSH protocol.

For instance, the following command will copy the file named myfile from my local machine to the mydir directory in my home directory on PSMN:

scp myfile mylogin@allo-psmn.psmn.ens-lyon.fr:~/mydir/

(replace mylogin by your login as provided by PSMN).

You can copy myfile under a different name, or to another directory, with the following commands:

scp myfile mylogin@allo-psmn.psmn.ens-lyon.fr:~/inputfile
scp myfile mylogin@allo-psmn.psmn.ens-lyon.fr:~/mydir/subdir/foofile

To copy back files from PSMN to your local machine, you just need to reverse the order of the arguments, as in this example:

scp mylogin@allo-psmn.psmn.ens-lyon.fr:~/inputfile local_inputfile

scp also support recursive copying of directories, with -r option:

scp -r mydir/ mylogin@allo-psmn.psmn.ens-lyon.fr:~/

SCP from outside ENS network

To transfer your files between your PC and allo.psmn from outside the ENS network, you have to use the ssh.psmn gateway as a proxy (see Connection on PSMN servers), so in a terminal of your workstation, you could execute :

#your PC -> your PSMN home:
scp -oProxyCommand="ssh mylogin@ssh.psmn.ens-lyon.fr netcat -w1 allo-psmn" source_file mylogin@allo-psmn:~/destination_file
# your PSMN home -> your PC  :
scp -oProxyCommand="ssh mylogin@ssh.psmn.ens-lyon.fr netcat -w1 allo-psmn" mylogin@allo-psmn:~/source_file destination_file

where source_file and destination_file should be changed as needed. If you want to transfer a directory (and not a file) you have to add -r option to scp (i.e. scp -r -oProxyCommand=...).

SFTP (Secure File Transfer Protocol)

SFTP clients are interactive file transfer programs (as to FTP), which perform all operations over an encrypted transport.

A variety of graphical SFTP clients are available:

When setting up your connection to PSMN in the above, use these informations:

host: allo-psmn.psmn.ens-lyon.fr
port: 22
username: your login at PSMN
password: your password at PSMN

OpenSSH also provide a command-line SFTP, named sftp, which can take advantage of ssh-agent and ssh keys. Example of use:

$ sftp mylogin@allo-psmn.psmn.ens-lyon.fr
Connected to allo-psmn.psmn.ens-lyon.fr.
sftp>

There are many tutorials online containing more informations about SFTP clients. Here’s one.

rsync

If you have complex hierarchies of files to transfer, or if you need to synchronize a set of files and directories between your local machine and PSMN storages, rsync will be one of the best tools to do the job. It will efficiently transfer and synchronize files across systems, by checking the timestamp and size of files. Which means that it won’t re-transfer files that have not changed since the last transfer, and will complete faster.

Also, if, for any reason, a transfer is interrupted, you might end up with part of files being transferred. Rather than restarting the transfer from scratch, rsync will only transfer what needs to be transferred: missing files, modified files, etc.

For instance, to transfer the whole ~/test/ folder tree from my local machine to my home directory on PSMN, I can use the following command:

$ rsync -avzP -e ssh ~/test/ mylogin@allo-psmn.psmn.ens-lyon.fr:~/test

Refer to the rsync manual for more options, like these ones:

--archive
--verbose
--recursive
--itemize-changes
--append-verify
--progress
--bwlimit=56K
--numeric-ids

Here’s another tutorial.

fpart (+rsync)

fpart generate lists of files that can be feeded to rsync, correcting some of rsync defaults on large filetrees:

  • no parallelism -> small parallelism (3 to 4 process, don’t be greedy),

  • larges batches that don’t fit in memory -> small batches (start early, fit in memory),

  • decreasing use of bandwidth over time -> frequent ‘restarts’ maintening maximum use of bandwidth over time.

See fpart documentation.

cd /Xnfs/planetary

fpart -L -v -f 2000 -Z -o /tmp/planetary.part.out -W \
'parallel --semaphore -j 4 \
"rsync -e ssh -az --numeric-ids --files-from=${FPART_PARTFILENAME} /Xnfs/planetary/ user@external_server:/data/planetary"' .

This example will scan the /Xnfs/planetary filetree, creating lists of 2000 files each, feeding them to 4 parallel rsync that copy these files, over ssh, on external_server.

Refer to the fpart manual for more options and use cases.

Unison

unison is a file-synchronization tool that is available on PSMN clusters.

See Unison homepage for more.

SSHFS

Sometimes, moving files in and out of the cluster, and maintaining two copies of each of the files you work on, both on your local machine and on PSMN, may be painful. Fortunately, PSMN offers the ability to mount its home filesystem to your local machine, using a secure and encrypted connection (and vice-versa, if your workstation expose a SSH server).

With SSHFS, a FUSE-based filesystem implementation used to mount remote SSH-accessible filesystems, you can access your files on PSMN as if they were locally stored on your own computer.

Hint

Be aware that, while very convenient, SSHFS is also very slow, due to FUSE.

This comes particularly handy when you need to access those files from an application that is not available on PSMN, but that you already use or can install on your local machine. Like a data processing program that you have licensed for your own computer but can’t be use on PSMN, a specific text editor that only runs on MacOS, or any data-intensive 3D rendering software that wouldn’t work comfortably enough over a forwarded X11 connection (See also Visualization server).

SSHFS is available for all platforms (Linux, MacOS and Windows).

Warning

SSHFS on MacOS

SSHFS on macOS is known to try to automatically reconnect filesystem mounts after resuming from sleep or suspend, even without any valid credentials. As a result, it will generate a lot of failed connection attempts and likely make your IP address blacklisted on ssh.psmn.ens-lyon.fr or allo-psmn.psmn.ens-lyon.fr.

Make sure to unmount your SSHFS drives before putting your macOS system to sleep to avoid this situation.

For instance, on a Linux machine with SSHFS installed, you could mount your PSMN home directory with the following commands:

$ mkdir ~/PSMN_home
$ sshfs mylogin@allo-psmn.psmn.ens-lyon.fr:~/ ~/PSMN_home

(replace mylogin by your login as provided by PSMN).

And to unmount it:

$ umount ~/PSMN_home

or:

$ fusermount -u ~/PSMN_home

For more information about using SSHFS on your local machine, you can refer to this tutorial for more details and examples.