I decided I’m going to try to reimplement the tree
utility from
scratch in a very limited version. Mine will only work from the current
directory and will support no arguments. This article will take you
through my implementation of the problem.
The tree
program produces a diagram of the directory structure from a
given starting point, generally the current working directory. You can
see an example in the screenshot below.
My version is much simpler. It does not implement:
- Sorting of each subdirectory result.
- Colored output based on file type by
dircolors
- Command arguments to configure behavior.
- Multiple output formats.
The first order of business is to open a directory and read it’s
content. For that we need opendir
and readdir
.
These functions from the posix standard. opendir
opens a directory
stream, and readdir
reads from that stream to produce a dirent
structs. These structs contain information about the current entry, such
as the name and type of file.
The next task is to separate out the results of readdir
from what we
want. We need to be careful of the current directory entry, and the
previous directory entry, if we recurse on those directories we could
get an infinite loop during our in order traversal. We also need to
filter out hidden files, because the normal find does not include
hidden files by default. To do this, we examine the contents of the
dirent.d_name
field.
The next step is to recurse on entries that contain more entries.
According to the man page when d_type
is equal to DT_DIR
, that entry
is a directory, and we should recurse on them. But we run into a problem,
we don’t have the full path. We could do string concatenation, and build
on each step, but I didn’t feel like it. Instead I used chdir
to move our context to that directory, and drop down once we finished
up. This may not be very efficient(system calls), but it avoids buffer
overflows, and C-style strings are miserable.
The final piece we must consider is counting up the number of files and the number of directories. I used a struct containing the counts, and returned them from each layer of the walk in the classic recursive way of building up a solution.
The important takeaways are:
- Be careful of how you walk a directory, you need to ignore the current and parent directories.
- Watch out for exhusting the available file descriptors for your program. You could do this if you try to go too deep or end up with an infinite loop.
Below is the finished version of this code. It’s quite simple as you can see.