# Shell Scripting

## Scripting

So we know how to run commands from an interactive prompt, but what if we want to save the commands we run so that we can reuse in the future? That's where scripting comes into play

## Basics of Scripting

You can write programs directly at the prompt, or write into a file (writing scripts)

```{.bash
#!/bin/sh
echo something
```

* Open an editor (for beginner, `nano` is recommended), save the script as `example-script`
* On your shell, run `chmod +x example-script`
* You can run your script as `./example-script`
* `#!/bin/sh` is also known as the **shebang**, specifies the interpreter
* `echo` is a command that prints its arguments to the standard output.&#x20;

### More on Flags

Most command line utilities take parameters using **flags**. They come in short form (`-h`) and long form (`–help`). Usually, running `COMMAND -h` or `man COMMAND` will give you a list of the flags the program takes.

* Short flags can be combined: `rm -r -f` is equivalent to `rm -rf` or `rm -fr`
* A double dash `–` is used in to signify the end of command options, after which only positional parameters are accepted.
  * For example, to create a file called `-v`, Use `touch -- -v` instead of `touch -v`
  * For example, to grep a file called `-v`, `grep pattern -- -v` will work while `grep pattern -v` will not.

#### Common Flags

There are a few flags that are widely accepted and have similar meanings throughout many programs

* `-a` commonly refers to all files (i.e. also including those that start with a period\[^4])
* `-f` usually refers to forcing something, e.g. `rm -f`
* `-h` displays the help for most commands
* `-v` usually enables a verbose output
* `-V` usually prints the version of the command

## Unix Directory Structure

The Unix Directory Structure Unix has a different directory structure from Windows.

There is no concept of drives.

Everything is files and directories. The root directory is `/`

We use forward slash `/` instead of backward slash `\`

Specifically for Linux, there is FHS

{% embed url="<https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard>" %}

### Important Unix Directories

* `/bin`, `/sbin`, `/usr/bin`, `/usr/local/bin`, `/opt` = executables
* On Linux: `/home` = user home directories
* On macOS: `/Users` = user home directories
* `/var/log` = log files
* `/tmp` = temporary files
* `/dev/urandom` = random number generator

## Shell Syntax

```bash
echo Hello
```

We've seen this command before, but we've never assigned it the proper terminology. Whenever we type something out, we can split the input into **COMMANDs** and **ARGs** (short for arguments)

* `COMMAND ARG1 ARG2 ARG3`

## Variables

<pre class="language-bash"><code class="lang-bash"><strong>echo location
</strong>name=COM3
echo $name
</code></pre>

* Used to store text
* `name=value` to set variable
* `$name` to access variable

:There are also a bunch of special variables we can use in our scripts:

* `$?`: get exit code of the previous command
* `$1` to `$9`: arguments to a script
* `$0`: name of the script itself
* `$#`: number of arguments
* `$$`: process ID of current shell&#x20;

### Environment Variables

On top of variables you can declare, there are a bunch of global variables that are declared in order for your system to run. We call these **Environment Variables.** You can see the full list of environment variables using the command:

```bash
env
```

### Quick Exercise

Create a script `variable-example` containing the code below, then try running it with various arguments.

```{.bash
#!/bin/sh
echo $0
echo $1
echo $2
echo $#
```

## Loops

Loop is used to run a command a bunch of times.

For example:

```bash
for i in $(seq 1 5); do echo hello; done
```

Let's unpack this!

```
`for x in list; do BODY; done`
```

`for x in list; do BODY; done`

* `;` terminates a command -- equivalent to newline
* Split `list`, assign each to `x`, and run `BODY`
* Split by "whitespace" -- we will get into it later
* Compared to C, no curly braces, instead `do` and `done`

So, knowing the above,

```bash
for i in $(seq 1 5); do echo hello; done
```

* `$(seq 1 5)`
  * Run the program `seq` with arguments `1` and `5`
  * Substitute the `$(...)` block with the output of the program
  * Equivalent to

    ```bash
    for i in 1 2 3 4 5; do echo hello; done
    ```
* `echo hello`
  * Everything in a shell script is a command
  * Here, it means run the `echo` command, with argument `hello`.
  * All commands are searched in `$PATH` (colon-separated)
  * Find out where a command is located by running `which COMMAND`, e.g. `which ls`&#x20;

## Conditionals

```bash
if test -d /bin; then echo true; else echo false; fi;
```

Let's unpack this!

```
if CONDITION; then BODY; fi
```

* `CONDITION` is a command.
* If its exit code is `0` (success), then `BODY` is run.
* Optionally, you can also hook in an `else` or `elif`

So, knowing the above,

```bash
if test -d /bin; then echo true; else echo false; fi;
```

* `test -d /bin`
  * `test` is a program that provides various checks and comparison which exits with exit code `0` if the condition is true.
* Alternate syntax: `[ condition ]`, e.g. `[ -d /bin ]`

Let's create a command that only prints directories

{% hint style="danger" %}
Bug! Hold on! What if the directory is called "`My Documents`"?
{% endhint %}

* `for f in $(ls)` expands to\
  `for f in My Documents`
* Will first perform the test on `My`, then on `Documents`

## &#x20;Argument Splitting

* Bash splits arguments by whitespace (tab, newline, space)
* Same problem somewhere else: `test -d $f`
* If `$f` contains whitespace, `test` will error!
* Need to use quote to handle spaces in arguments `for f in "My Documents"`
* How do we fix our script?
* What do you think `for f in "$(ls)"` does?

## Globbing&#x20;

`bash` knows how to look for files using patterns:

* Thus, `for f in *` means all files in this directory
* When globbing, each matching file becomes its own argument
* However, still need to make sure to quote, e.g.\
  `test -d "$f"`

You can make advanced patterns

* `for f in a*`: all files starting with `a` in the current directory
* `for f in foo/*.txt`: all `.txt` files in `foo`
* `for f in foo/*/p??.txt`: all three-letter text files, starting with p, in subdirectories of `foo`

## Whitespace issues

* `if [ $foo = "bar" ]; then`: What's the issue?
* What if `$foo` is empty? arguments to `[` are `=` and `bar`
* Possible workaround: `[ x$foo = "xbar" ]`, but very hacky
* Instead, use `[[ CONDITION ]]`: `bash` built-in comparator that has special parsing
* Good news: it also allows `&&` instead of `-a`, `||` instead of `-o`, etc.

## Shellcheck

* The mentioned problems are the most common bugs in shell scripts.
* A good tool to check for these kinds of possible bugs in your shell script: <https://www.shellcheck.net/>&#x20;

## Composability

* Shell is powerful, in part because of **Composability**
* You can chain multiple programs together, rather than one program that does everything
* Remember **The Unix Philosophy**:
  1. Write programs that do one thing and do it well.
  2. Write programs to work together.
  3. Write programs to handle text streams, because that is a universal interface.&#x20;

### More Pipes

`cat /var/log/sys*log | grep "Sep 10" | tail`

* `cat /var/log/sys*log` prints the system log
* This output is fed into `grep Sep 10`, which looks for all entries from today.
* This output is then further fed into `tail`, which prints only the last 10 lines.&#x20;

## Streams

* All programs launched have 3 streams:
  * `STDIN`: the program reads input from here
  * `STDOUT`: the program prints to here
  * `STDERR`: a second output that the program can choose to use.
* By default, `STDIN` is your keyboard, `STDOUT` and `STDERR` are both your terminal&#x20;

### Stream Redirection

* However, this can be changed!
* `a | b`: makes `STDOUT` of `a` the `STDIN` of `b`.
* `a > foo`: `STDOUT` of `a` goes to the file `foo`
* `a 2> foo`: `STDERR` of `a` goes to the file `foo`
* `a < foo`: `STDIN` of `a` is read from the file `foo`
* `a <<< some text`: `STDIN` of `a` is read from what comes after `<<<`
* You can also pipe to `tee` (look up in `man` what `tee` does)&#x20;

**So why is this useful?**

It lets you manipulate output of a program!

* `ls | grep foo`: all files that contain the word `foo`
* `ps | grep foo`: all processes that contain the word `foo`
* On Linux: `journalctl | grep -i intel | tail -n 5`: last 5 system log messages with the word `intel` (case-insensitive)
* Note that this forms the basis for **data-wrangling**, which will be covered later.&#x20;

### Grouping Commands `(a; b) | tac`

* Run `a`, then `b`, and send all their output to `tac`\[^7]
* For example: `(echo qwe; echo asd; echo zxc) | tac`

### Process Substitution `b <(a)`

* Run `a`, generate a temporary file name for its output stream, and pass that filename to `b`
* To demonstrate: `echo <(echo a) <(echo b)`
* On Linux: `diff <(journalctl -b -1 | head -n20) <(journalctl -b -2 | head -n20)`
* This shows the difference between the first 20 lines of the last boot log and the one before that.&#x20;

### Jobs

Used to run longer-term things in the background.

* Use the `&` suffix
  * It will give back your prompt immediately.
  * For example: `(for i in $(seq 1 100); do echo hi; sleep 1; done) &`
  * Note that the running program still has your terminal as `STDOUT`. Instead, can redirect `STDOUT` to file.
  * Handy especially to run 2 programs at the same time like a server and client: `server & client`
  * For example: `nc -l 1234 & nc localhost 1234 <<< test`
* `jobs`: see all jobs
* `fg %JOBS`: bring the job corresponding to the id to the foreground (with no argument, bring the latest job to foreground)
* You can also background the current program: `^Z`, then run `bg`
  * `^Z` stops the current process and makes it a job.
  * `bg` runs the last job in the background.
* `$!` is the PID of the last background process.&#x20;

### Some Exercises

* Sometimes piping doesn't quite work because the command being piped into does not expect the newline separated format.
* For example, `file` command tells you properties of the file.
* Try running `ls | file` and `ls | xargs file`
* What is `xargs` doing?&#x20;
