PHP Command line scripting basics

Shot on Lomography Lomochrome Purple 35mm film with a Yashica FX-D Quartz

PHP on the command line

PHP was originally designed for the Web – but, as an easy to use language with access to a vast array of libraries, it can also be an excellent choice as a command line scripting tool.

To script with PHP or to script with shell

I was recently reading up on comparisons between PHP and Python and came across a few bullet points which suggested that PHP was really only useful in a Web context. After all, the clue is in the name, right? PHP was originally named Personal Home Page (back when it was first created as a set of Perl tools by Rasmus Lerdorf). Even its full modern name, PHP: Hypertext Preprocessor, while nicely recursive, is pretty suggestive of a Web orientation.

The reality, however, is that PHP is a versatile language and it can be just as useful on the command line as it is in generating Web and mobile applications. Of course, just because you can do something in tech, it doesn’t mean you always should (see, for example, the ongoing mission to run Doom on literally every possible device). So how do you make that call?

Here are a couple of reasons why you might choose to use a shell script instead of PHP.

NOTE: A shell script is a set of commands combined to drive the command line interface for Unix-like operating systems. There are a number of such shells but Bash is the most popular and is often installed by default. Pretty much anything you can type at a command prompt can be added to a shell script. They support many familiar languages features such as variables, flow control structures, and functions. A shell script is often the quickest way to automate the command line but the language variations can often seem clunky and may not scale well.

Shell scripts are, by definition, more tightly integrated with the shell

Although you can call Unix commands from most programming languages, shell scripts are a natural fit for the command line. In order to invoke a Linux or Unix command from a shell script, you do not need to deploy a special syntax as you do from other high level scripting languages (you may see this practice referred to as shelling out). Although complex shell scripts are rarely elegant, you can use them to create quick and powerful pipelines that link commands together. If you need to invoke many native commands, a shell script may well be the way to go. See this excellent blog post for some shell script best practices.

The shell is ubiquitous

If you’re looking to write something that can be run in most contexts, then a shell script is a good choice. Most, if not all, Linux instances will include Bash. macOS currently enables Zsh by default but also bundles Bash. I’m not much for Windows, but apparently Windows 10 includes Bash with its developer features too. Beware, though, not all common commands will be available or identical from platform to platform (there are some particularly marked differences between the Linux and macOS command line environments).

On the other hand, there are some compelling reasons to choose PHP for your command line scripting

PHP is friendly and familiar

If you’re already a PHP programmer, then writing code in the language can soon become second nature. So you can build something that works with less up front effort.

PHP is feature rich

The richness of a shell script lies in its easy access to all the other commands available in the shell. PHP provides access to a well-featured core and can be extended with numerous easily installed extensions. The Composer dependency manager offers simple access to a vast number of libraries and tools. All that, and access to shell commands too, if you need it.

PHP scales well

PHP is fast and efficient, for sure – especially since the advent of PHP 8. As your script grows, though, the language’s support for breaking up a project into packages, classes and functions will help keep your logic organised and elegant. Shell scripts, by contrast, often become unwieldy as they grow in size.

Shell or PHP: making the call

So how do you decide which way to jump?

For very limited and focussed requirements, I’ll often plump for a shell script. There’s nothing better for grabbing some data from one command, filtering it through another and then then looping through the result. This kind of solution is fast, elegant and well integrated with the OS. As soon as I need to think about deeper features – processing lots of data, for example, working across networks, or with formats like XML or JSON then the support provided by a language like PHP makes it a good choice.

The hashbang

This name (AKA ‘shebang’ among others) is poetry. It describes a script’s opening line of #! followed by a file path. This tells the shell where to find the interpreter for the current script.

This allows us make a script runnable without the need to invoke an interpreter explicitly.

Here is a basic hello world script that I will name hellophp:

<?php

print "hello world\n";

Here’s how I might run it

$ php scripts/hellophp

Let’s add a hashbang to a copy of the script in hellophp2:

#!/usr/local/bin/php
<?php

print "hello world\n";

Because we’re no longer passing the script explicitly to the PHP binary we should make the hellophp2 file executable

$ chmod 755 scripts/hellophp2

And now we can invoke it just like any other command

$ scripts/hellophp2

Or, if the location of the script is included in the $PATH environment variable, you can even omit the filepath. At this point the command can be simply invoked with just hellophp2..

Whether or not you use a hashbang is largely a matter of choice and convenience. I quite like to be able to call my commands without the need to invoke the interpreter explicitly. On the other hand, if your PHP binary is not where the hashbang expects it to be, your script will fail with an error before it even starts. This may affect the portability of your scripts.

Let’s get on and look at some other aspects of a well-formed PHP script.

Begin with usage

A good script should provide information about its usage. You can output this in response to a help request (more on this later) or even as part of an error message. I’m going to build a simple demo script that counts the files and directories in a given directory.

So let’s describe the usage:

function usage(?string $msg=null): string
{
    $argv = $GLOBALS['argv']; 
    $usage  = "\n";
    $usage .= sprintf("usage: %s <directory>\n", $argv[0]);
    $usage .= sprintf("%6s %-6s %s\n", "-h", "",      "this help message");
    $usage .= sprintf("%6s %-6s %s\n", "-a", "",      "count all files including hidden");
    $usage .= sprintf("%6s %-6s %s\n", "-p", "<pat>", "apply regexp pattern");
    $usage .= "\n";
    if (! is_null($msg)) {
        $usage .= "$msg\n\n";
    }
    return $usage;
}

The usage() method accepts an optional string and combines it with general usage information. I have used the slightly arcane-looking sprintf function to format my usage method. This accepts a format string and embeds subsequent arguments into its return string according to any number of conversion specifications. In this case, I’m specifying one string aligned to the right of six characters including padding with spaces, one string aligned to the left of six characters including padding, and a third unformatted string.

The result looks like this:

usage: scripts/dircount.php <directory>
    -h        this help message
    -a        count all files including hidden
    -p <pat>  apply regexp pattern

A confession here. If I wasn’t trying to be all posh and proper in a blog post I’d probably just format my usage string manually. Something like this is easier to read and will give me identical results at the cost of a bit of fiddling around with alignment:

function usage(?string $msg=null): string
{
    $argv = $GLOBALS['argv']; 
    $usage  = "\n";
    $usage .= "usage: {$argv[0]} <directory>\n";
    $usage .= "    -h        this help message\n";
    $usage .= "    -a        count all files including hidden\n";
    $usage .= "    -p <pat>  apply regexp pattern\n";
    $usage .= "\n";
    if (! is_null($msg)) {
        $usage .= "$msg\n\n";
    }
    return $usage;
}

As we will see again later, PHP provides us with an array, $argv, in global space. This contains any arguments with which the script was invoked and, as the first element, the invoking call (including any directories specified). I use this first element to include the script name for my usage method. First, of course, I need to get access to $argv within the usage() function. For this I could use the global keyword, but I prefer to use the $GLOBALS superglobal variable.

As for the string itself, it should be pretty self-explanatory. It covers the name of the script, any arguments, and a list of supported flags. By convention, required arguments are <wrapped> in greater than and lesser than characters. Optional arguments are [enclosed] by square brackets.

Now that I can generate a usage string, I should create a function that I can call when an error occurs:

function errorUsage(string $msg): void
{
    fputs(STDERR, usage($msg)); 
    exit(1);
}

I can call errorUsage() with an error message any time I hit trouble I can’t deal with. Although this simply invokes usage() to generate the error string there are two other aspects to this little function that are worth considering. Firstly, note that I don’t print the error in the usual way. Instead I send it to standard error. This will help clients of the script which need to treat error output differently from standard output (in order to send errors to a different log file, for example). Secondly, I call exit() with the integer 1. In the Unix world, commands that run successfully render an exit code of zero whereas errors generate non-zero results.

NOTE It might seem counter-intuitive that Unix commands exit with 0 for success. It might help to think of the exit status as an error code. 0, then, means no error, and a non-zero result is a particular error. This Stack Overflow discussion is interesting on the topic.

By following that convention, we make it easier for others to work with our command on the shell or in scripts.

How to process command line flags

Now that we have our utility functions written, I can get on and process user input. We are going to look at two kinds of inut in this article: flags and arguments. Typically, arguments provide the substance upon which your command will work – its targets. Flags, with some exceptions, modify the way that the command works. Since arguments are clearly more core than flags, why deal with flags first? As you’ll see, it’s easier to remove flags from the equation first.

We have already seen from the usage message that I intend to respect three flags: -h, -a and -p. The latter will require its own argument. I could write a complicated regular expression (or a little parser) to do this, but luckily, for most purposes, PHP’s getopt() function is perfectly serviceable.

getopt() requires a string argument representing the short flags you wish to support. It will optionally accept a second array argument for long flags (that is, flags --in --this --format – I don’t intend to delve into those in this article). Finally, getopt() accepts an optional $rest_index reference variable which it will populate with an integer value for you (a little more on that to come). The function reads the $argv array and extracts any matching flags, populating an associative array with keys and, where specified, values.

It’s a lot easier than it sounds. Let’s take a look:

$options = getopt("hap:", [], $rest_index);
$myargs = array_slice($argv, $rest_index);

if (isset($options['h'])) {
    print usage();
    exit(0);
}
$pattern = (isset($options['p']))?preg_quote($options['p'], "/"):null;
$countall = (isset($options['a']))?true:false;

So, breaking this down. I specify the short options hap:. That means I’ll accept two standalone flags: -h and -a. For the third flag I demand an argument. I have specified that with the colon in the short options string: p:. If I wanted to accept an optional flag argument, I would use two colons: p::. I’m skipping long options in this example, so I just pass in an empty array. Finally I include $rest_index which getopts() will populate for me.

So what happens if we run this code with some flags? Let’s say something like:

$ scripts/dircount.php -a -p ssss /usr/local

To start with, we will see an $argv variable that looks like this:

Array
(
    [0] => scripts/dircount
    [1] => -a
    [2] => -p
    [3] => ssss
    [4] => /usr/local
)

getopt() finds the -a and -p flags, and the argument to -p. It populates and returns an array (which I assign to $options).

Array
(
    [a] => 
    [p] => ssss
)

So now I have access the user’s flags. But I still have a problem. Even though I have dispensed with the flags, I don’t know where any remaining arguments start. That is where the third argument to getopt() comes in. I passed in the empty variable $rest_index and, after processing it is populated by the index of the position in the $argv array at which getopt() stopped processing. In other words, it gives me the starting point of non-flag arguments. That means I can then use array_slice() to generate a new arguments array by lopping the flags lopped off the start of $argv.

The rest of the flag processing code is pretty straightforward. I check for -h and, if found, I print the usage message and end execution. Note that I’m not sending this usage string to standard error. The user has requested usage information, so I don’t treat the output as an error. Finally, I set a couple of variables according to the status of the -a and -p flags.

Process remaining arguments

Remember that we have now decanted the remaining arguments into a new variable $myargs. So all we really need to do is check that the minimum number of arguments – just one in this case – has been provided and that is sane for our purposes. I assign the value to a variable for convenience and I’m done with boilerplate and set up.

// basic argument checks
if (count($myargs) < 1) {
    errorUsage("too few arguments");
}

// assignment 
$dir = $myargs[0];

// more details argument checks
if (! is_dir($dir)) {
    errorUsage("'{$dir}' is not a directory");
}

Now write your logic

I don’t propose to spend too much time on the script’s actual logic. It simply loops through and counts the contents of a given directory.

$di = new DirectoryIterator($dir);
$count = 0;
foreach ($di as $fileinfo) {
    if ($fileinfo->getFilename() == ".") {
        continue;
    }
    if ($fileinfo->isDot() && ! $countall) {
        continue;
    }
    if (! empty($pattern)) {
        if (! preg_match("/{$pattern}/", $fileinfo->getFilename())) {
            continue;
        }
    }
    $count++;
}

print "{$dir} contains {$count} files and directories";
if (! empty($pattern)) {
    print " matching regexp pattern '{$pattern}'";
}
print "\n";

If the -a flag was not provided the code will exclude hidden files and directories (names that begin with a leading dot). If the -p flag was used, the script will attempt to match the given pattern (without much error checking). Finally, I output some results and I’m done.

NOTE I deal with the DirectoryIterator class in some detail in another post.

Conclusion

While that seems like a lot, most of the work is simple boilerplate. The actual logic involved is minimal. Get it right, though, and you can provide the same kind clean experience on the command line that we all aspire to on the Web page.

Photo by Annie Spratt on Unsplash

Twitter Facebook LinkedIn

PHP Command line scripting basics

matt zandstra

Hidden Hat Press