Assignment/Shell

From WoxWiki

Jump to: navigation, search
Assignment Out: Sept. 9 (Wednesday)
Assignment Due: Sept. 18 (Friday 10:00PM)

This is the first assignment in this class, it must be completed by both CS167 and CS169 students.

Please remember to read the course Programming Guide for good style guidelines.

Contents

Introduction

In this assignment you will be writing a real live UNIX shell. A shell is typically used to allow users to run other programs in a friendly environment, often offering features such as command history and job control. Shells are also interpreters, running programs written using the shell's language (shell scripts).

Your shell will have the same basic functionality as the shells you are used to working in (e.g. csh/tcsh or bash) - meaning it will allow the user to type in the name of an executable, with arguments, and then execute it. The shell will also provide a few built-in commands, such as cd, as well as some basic features such as file redirection. You don't need to implement anything not described on this page (e.g. history, job control, tab completion).

Your shell will be written and test under Linux.

To run the TA shell demo, run /course/cs167/demo/shell/shell.

The Assignment

Your job is fairly simple: your shell must display a prompt and wait until the user types in a line of input. It must then do some simple text parsing on the input and take the appropriate action. For example, some input is passed on to built-in shell commands, while other inputs specify external programs to be executed by your shell.

Additionally, the command line may contain some special characters which will correspond to file redirection. The shell must set up the appropriate files to deal with this. As you know, users are far from perfect; your shell should have good error-checking.

The File System

Crucial to understanding how your shell will work is a working knowledge of the UNIX VFS (Virtual File System) model.

In the VFS model, there is a root file system denoted as "/", and zero or more mounted file systems which reside at mount points, like "/dev". All file systems expose an internal structure of directories and files: within the root file system there might be subdirectories such as "/bin", "/home", "/home/joeuser", and "/home/joeuser/src". There are also files within these directories, like sh.c and README. Mounted file systems behave just like root file systems except that names of files within the file system are prefixed with the mount point.

The effect of all this is to abstract the particular way of accessing a file (the on-disk structure) from the fact that a file exists. In fact, some file systems might have no on-disk structure at all, and simply provide names that behave like files for other purposes. For instance, files in "/proc" are not really stores anywhere, they simply provide a file interface to kernel data structures.

Files, File Descriptors, Terminal I/O

To explain how files are represented in UNIX, examine the open system call, which opens a file:

int open(const char *path, int oflag, mode_t mode);
  • path = an absolute (starting with "/") or relative pathname of the file to open
  • oflag = a combination of access modes and status flags as described in the open(2) man page
  • mode = the default permissions for the file if it needs to be created

open returns an integer which is a file descriptor (often abbreviated as "fd"). File descriptors are references to the open file in user mode. Instead of having access to the kernel-level file struct, users make system calls (like read(2)) that take file descriptors as input.

Each process is initially "given" (set up by its parent) three standard file descriptors for input, output, and error: file descriptors 0, 1, and 2, respectively. You will start your shell from your regular UNIX shell (e.g. bash), which will set these first three file descriptors, using code similar to this:

if (!fork()) {
    /* now in the child process */
    close(0);
    close(1);
    close(2);
    open("/dev/tty", O_RDONLY);
    open("/dev/tty", O_WRONLY);
    open("/dev/tty", O_WRONLY);
}

Since file descriptors are assigned in increasing numerical order, we are assured that file descriptor 0, 1, and 2 are assigned correctly. This in turn means that to write to the terminal, all you need to do is call:

write(1, "foo", 3);

If you are having trouble remembering the file descriptor numbers for the standard streams, you can use STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO (after including unistd.h), which are macros for 0, 1, and 2, respectively.

Executing a Program

Executing a program in UNIX takes several steps. The fork(2) system call creates a new child process which is a replica of the parent, and begins execution at the point that the call to fork(2) returns. fork(2) returns 0 to the child process, and the child's process id (abbreviated pid) to the parent.

The execve(2) system call begins the execution of a new program and if it succeeds it will not return. It is passed the path of the program to be executed, an argument list (argv), and a list of environment strings (envp). argv and envp are null-terminated arrays of pointers to strings (character arrays). The command:

$ /bin/echo Hello world!

would have an argv that looks like this:

char *argv[4];
argv[0] = "/bin/echo";
argv[1] = "Hello";
argv[2] = "world!";
argv[3] = NULL;

The environment strings envp are a set of strings of the form "variable=value". Processes typically access these through the libc function getenv(3). Run the command:

$ /usr/bin/env

in a shell to list all the environment variables set in your shell (and passed to env via execve(2) when you ran the command). You are not required to keep track of environment variables for this assignment, (just pass NULL for envp, this may break some programs, as a work around if you want you can have your main function take not only the standard argc and argv but also a third argument, char *envp[] which will get the environment of the program which executes your shell, then pass this along to execve) but you may implement setting and un-setting them for extra credit.

Here is an example of forking and executing a new process:

if (!fork()) {
    /* now in child process */
    char *envp[] = { NULL };
 
    execve(argv[0], argv, envp);
 
    /* we won't get here unless execve failed */
    if (errno == ENOENT) {
        fprintf(stderr, "sh: command not found %s\n", argv[0]);
        exit(1);
    } else {
        fprintf(stderr, "sh: execution of %s failed: %s\n", argv[0], strerror(errno));
        exit(1);
    }
}
/* continue parent process execution */

UPDATE: Note that your shell needs to wait for the executed command to finish before continuing (and displaying a new prompt). Look at the man page for the wait(2) system call.

Built-In Shell Commands

In addition to supporting the spawning of external programs, your shell will support a few internal (built-in) commands. When a built-in command is input, your shell should make the necessary system calls to handle the request and return control back to the user. The following is a list of the built-in commands that your shell should provide:

  • cd dir: change the current working directory
  • ln src dest: makes a hard line to a file
  • rm file: remove a directory entry
  • exit: quit the shell

Note that we are only looking for the basic behavior of these commands. You do not need to implement rm -r or ln -s. You also do not need to support multiple arguments to rm or multiple commands on a single line. Your shell should print out an error message if the user enters a malformed command.

UNIX System Calls for Built-In Functions

To implement these commands you will need to understand the functionality of several Linux system calls. You can (and should if you want to succeed) read the man pages for these commands by running:

$ man 2 (syscall)

for the syscall you want information on. It is highly recommended that you read all of the man pages for these syscalls before even starting to implement built-in commands:

int open(const char* path, int oflag, mode_t mode);
int close(int fd);
int chdir(const char* path);
int link(const char* existing, const char* new);
int unlink(const char* path);

File Redirection

From the sh(1) man page:

A command's input and output may be redirected using a special notation interpreted by the shell. (You do not need to support redirection for built-in commands.) The following may appear anywhere in a simple-command or may precede or follow a command and are not passed on as arguments to the invoked command.

  • < word - Use file word as standard input (file descriptor 0).
  • > word - Use file word as standard output (file descriptor 1). If the file does not exist, it is created; otherwise, it is truncated to zero length. (See the description of the O_CREAT and O_TRUNC flags in the open(2) man page.)
  • >> word - Use file word as standard output. If the file exists, output is appended to it; otherwise the file is created. (See the description

of the O_APPEND flag in the open(2) man page.)

You must code your parser to support file redirection and concatenation, with error checking. For example, if the shell fails to create the file to which output should be redirected, the shell must report this error and abort execution of the specified program. If multiple input or output redirections appear, this is also an error(i.e. it is illegal to redirect standard input twice, it is perfectly legal to redirect both input and output). To understand the details of file redirection it will be helpful to experiment with redirection in your favorite UNIX shell (whose error messages you may borrow for your own shell).

Parsing the Command Line

A significant part of your implementation will most likely be the command line parsing. Redirection symbols may appear anywhere on the command line, and the file name appears as the next word after the redirection symbol. One algorithm for parsing the command line is as follows:

  • Scan through the line for redirection symbols, keeping track of the input and output file names if they exist. Check for errors such as multiple redirection or missing file names (i.e. a redirection token that is not followed by a file name) at this point.
  • Remove all traces of redirection from the command line (i.e. replace the relevant characters with blanks).
  • Split the line in to words. The first word will be the command, and each subsequent word will be an argument to the command.

Most symbols and words are separated by one or more spaces or tabs. Redirection characters may be separated from arguments by spaces or tabs. There need not be spaces or tabs before the first word on the line. Special characters such as control characters should be treated just like alphanumeric characters and should not crash your shell.

Be very careful to check for error conditions at all stages of command line parsing. Since the shell is controlled by a user, it is possible to receive bizarre input. For example, your shell should be able to handle all these errors (as well as many others):

$ /bin/cat < foo < gub

ERROR - can't have 2 input redirects on one line

$ /bin/cat <

ERROR - no redirection file specified

$ > gub

ERROR - no command specified. Make sure file gub is not overwritten.

Your shell should also handle bizarre yet correct input:

$ < bar /bin/cat

OK - redirection can appear anywhere in the input.

$ [TAB]/bin/ls <[TAB] foo

OK - any amount of whitespace is acceptable.

$ /bin/bug -p1 -p2 foobar

OK - make sure parameters are parsed correctly.

$ cat>bar<README

OK - no whitespace between redirection symbols is acceptable.

UPDATE: You will not be held responsible if your input buffer is not big enough to handle user input. Use a large buffer size (e.g. 1024) and assume that the user will not enter more than that many characters. Note that in future assignments you will be responsible for handling similar cases.

Use of External Functions

You should use the read(2) and write(2) system calls to read and write from file descriptors 0, 1, and 2. Do not use C++ iostreams cin, cout, or cerr or C stdio (fopen, fread, etc.). Part of the purpose of this assignment is to learn about system calls you'll be implementing later on in the semester.

UPDATE: A common question has been how we "wait" for the user to input some text. You should note that the read() system call will block (wait) until the user presses enter or Ctrl+D. Try out both cases to see how you can distinguish them. (They should be handled differently by your shell.)

You may use almost all the library calls defined in string.h to perform string or buffer manipulation in your shell code. You may also use any syscalls (functions with section 2 man pages) your heart desires. Do not use floating point numbers (don't ask). If you have any questions about functions that you are able to use, please email the TAs.

In order to avoid confusion here is a list of the external functions allowed. Note that you can't use strtok(), part of the assignment is to practice C string manipulation.

open printf (and variants) isalnum
close str(n)cpy isalpha
chdir str(n)cat iscntrl
link str(n)cmp isdigit
unlink str(r)chr islower
read str(c)spn isgraph
write strpbrk isprint
fork strstr ispunct
execve strlen isspace
wait strerror isupper
exit memcpy isxdigit
malloc memmove tolower
free memcmp toupper
assert memchr
perror memset

Another important aspect of parsint the command line is knowing how to handle Ctrl+D. When the user enters some text on the dommand line followed by Ctrl+D, handle it as a newline. If the user does not enter anything but Ctrl+D the shell should exit.

Style

As this class will consist of many large programs all written in C we will be emphasizing good style early on in order to make sure that your future code is easy for you and your TAs to understand. Therefore style will determine part of your grade on this project. We don't have any strict rules saying how many spaces you should indent here or there, but we require that you indent consistently. Also, do things like, clean up unused variables and blocks of code before, give variables and functions meaningful names, and remember that one of the TAs will have to read and make sense of your code once you turn it in. You can take a look at the course Programming Guide for some guidelines to follow.

Code Exchange

To get started, copy the /course/cs167/asgn/shell directory and start hacking <code>sh.c. To run your program type

$ ./sh

Note the '.' before 'sh'. To debug your program, run:

$ gdb sh

When the gdb prompt appears type run.

You will need to write a README file documenting any bugs you have in your code, any extra features you added, and anything else you think we should know about your shell.

You should hand in your shell by running:

$ make clean
$ /course/cs167/bin/cs167_handin shell

from the directory containing your code.

Extra Credit

Here are some suggestions for extra credit:

  • PATH, either as an environment variable, or just as a hard coded search path.
  • rm -r
  • environment variables (setenv, printenv, and passing the environment to processes via execve.
  • pwd, though you must use getdents(2) system call (not pwd) for it to count. This is a bit tricky but quite interesting. Yes, you must use getdents, not even readdir.
Personal tools