Unix file management

This is likely the last post just on Unix for ASTR 119. We've previously seen how to move around the directory tree in Unix. Here let's see how to read, write, copy, move, edit, and delete files.

Reading files

There are a large number of ways to read the contents of files. The program cat (short for concatenate) will take one or more file names as arguments and output the text within that file, concatenating text if you provide multiple file names.

This can be rather awkward for reading a large text file. The programs less and more will take a file name as an argument and give you an interactive screen for scrolling through that file. less is a bit more full-featured than more, which is supposed to be humorous (I think).

Also of note are the programs head and tail, which will take a file name as an argument, and return the first or last 10 lines of the file respectively.

File creation

Up to this point, we've mostly created these files in a GUI text editor. There are a number of classic text editors that come with most Unix-like systems. If you're looking for a good, flexible terminal-based text editor, emacs and vim are both well worth learning to use. Note that both of these programs have a rather steep learning curve. nano is a simple text editor which is useful for quickly modifying a text file.

Input/output redirects

One of the magical things about Unix is the ability to manage and redirect the outputs of one program to the inputs of another program. To catch the output of a program and save it to a file, we can use the > operator. For example, it might be useful to catch the output of a python script and save it for later. In this case, we could run python3 myscript.py > output.txt. We can also redirect a text file to the input of a program with the < operator, and redirect the output of one program to the input of another program with the | operator (called a "pipe"). The Wikipedia article has a nice summary and some good examples.

Copying, moving and deleting files

The command for copying files is called cp (short for copy). It takes two arguments, the source file you wish to copy and the target file you wish to copy to. Note that this target file doesn't need to exist, but if it does, the target file will be overwritten! There is no "undo" when you overwrite a file! You can recursively copy files with the -r option. For example, cp -r foo/ bar/ will copy every file and directory (recursively, so including all files in those directories) from the directory foo into the directory bar.

The command for moving files is called mv (short for move). Like cp, it takes two arguments for the source and the target. The difference with cp is that, after successfully running, the source file will no longer exist. Strictly speaking, we have just renamed the file and the actual bits haven't moved. This means that mv is generally quite fast, while cp can take a while to copy large files.

To delete unwanted files, you can use the the rm command (short for remove). Unlike cp and mv, rm takes one or more files as arguments, and will delete each of those files. Like cp, you can recursively call rm on a directory with the -r option.

I can't emphasize enough that any of these commands that remove or overwrite files are permanent. This is important to remember, as if you are careless with your use of these command, you can easily overwrite files you wanted to keep. Again, there is no undo on the command line. This is why we want to back up any important files, like with a version control software like git.

Wildcards

Sometimes we want to refer to a range of files that satisfy some simple requirements. Wildcards fill this need. The most common wildcard used is the * character. The asterisk will match zero or more characters. For example, if we wanted to list all the python files in a directory, we could use the command ls *.py. This is sometimes known as "globbing".

There are a number of other wildcard patterns. You can see a summary of wildcards available on Linux here.

social