Lecture Notes by Anthony Zhang.

CS 349

User interfaces.

Jeff Avery
Section 001
Website: https://www.student.cs.uwaterloo.ca/~cs349/s16/about.html


All information is available on the course website.

For the purposes of the course, a user interface is any place where a person expresses intention to an artifact, and the artifact presents feedback to the person. For example, the panel and the screen/beeper on a microwave is a user interface. In most user interfaces, there is a feedback loop between intention and feedback. We call this the feedback loop.

An event is an observable occurrence or phenomenon, as well as a message representing the fact that this phenomenon occurred.

An interface is what is presented to the user, such as the widgets and controls, much like the intention part of the feedback loop. An interaction is an action that the user invokes to perform a task, much like the intention part of the feedback loop.

The important challenges in user interfaces and interaction design largely revolve around accounting for variability - users can have very different levels of expertise, and there are often a huge possible range of tasks to support. Well designed user interfaces empower the user, allowing poeple to do more with the same tools.

This course covers designing and implementing user interfaces, with a focus on desktop and mobile apps. Most of the course will be abot the design and architecture of interfaces, with interaction design as a focus.

Assignments are to be submitted using Git on your uWaterloo GitLab account. You should avoid posting assignments online until the course is over, after which you are free to post whatever you like. Make sure to log into your GitLab account before the next class, in order to get your account set up properly.


The X Windows system was one of the first ever standardized UNIX windowing systems. Invented in 1984, it is still being used today in various UNIX-ey systems. X runs as a separate component that the operating system starts and manages, and window managers/games/etc. communicate with it to display windows.

Note that the X Windowing system only provides the windowing primitives, with no specified window manager. These primitives include window creation, input handling, and graphical functionality at a relatively low level. X is also responsible for routing input to the correct window, and making sure only one thing can change the video buffer at a time (since the graphics card can only accept one set of instructions at a time; this is the reason most modern user interfaces are single threaded).

Most window decorations, like title bars and window borders, are handled by the window manager. The window manager controls the look and feel of windows, but not the contents of the application itself.

The X Windows system was designed to work on a variety of displays, allow mirroring and extending displays, device independence, and even network transparency. It was first to standardize the overlapping and resizable window conventions we're used to today.

X separates the user interface and the application - they can actually run on totally different machines, due to the network transparency mentioned previously. Basically, the client is the computer that runs the custom software, and the server is the computer that displays things to the user (intuitively, the server serves up windowing capabilities).

Users express intent to the X server input system. The input systems then update the X clients, which in turn notify the X server output systems. Then, the results are presented back to the user. This can be thought of as an MVC implementation, where the client is the model, the input system is the controller, and the output system is the view.

A typical X client application does the following:

  1. Initialize the program.
  2. Connect to the X server.
  3. Initialize the X portion of the program.
  4. Do the event loop:
    1. Get the next input event from the X server (note that the server actually drives the program).
    2. Handle the event (and break the loop if it was an exit event).
    3. Do client work if necessary.
  5. Close connection to the X server.
  6. Uninitialize the program.

Before windowing systems, we had terminal applications that would take up the entire screen. Windowing systems introduced a new concept: different applications can take up different parts of the screen, and they can all run independently, not having to worry about which parts of the screen they need to draw, how to move/resize themselves, and whether to accept input. For applications, X exposes a buffer for the application to draw into, and a stream of input events for the application to consume.

These days, X is only used on Linux and BSDs - Windows and OS X have their own windowing systems and window managers, and are often not interchangable. However, there is still the fundamental need for the same functionality.

In this course, we will be using Swing with Java. Swing has a different, simpler set of assumptions, and therefore is a lot easier to use for creating most modern applications. For example, modern architectures tend to run directly on the user's computer rather than on a mainframe accessed via a thin client. Here's a simple example:

import javax.swing.*;

public class TestWindow extends JFrame {
    public TestWindow() {
        this.setTitle("Test Window");

    public static void main(String args[]) {
        TestWindow testWindow = new TestWindow();
        testWindow.setSize(400, 300);


Review of DVCSs, plus a beginner's Git primer. Java review. Jeff confirms that the FOSS OpenJDK is okay for this course, contrary to what was stated on Piazza. The recommended system for building Java files is to just use Make.

Some highlights:


More Java stuff. Here are some of the useful class libraries:


The base window system manages where the window is located, overlapping windows, and various other things related to making multiple windows work well together. It also exposes a local coordinate system to the applications, so they can draw on the screen relative to, say, the top left corner of their window.

Conceptually, there are 3 different modes of drawing - pixel-by-pixel, strokes, and regions. When we're drawing real UIs, there is often the need to reuse drawing properties, like colors, stroke width, and stroke style. In X, these reusable properties are called the Graphics Context (GC). When the client sends a graphics context to the user, the server draw all subsequent things using that context, often saving us from sending a lot of redundant information.

When we have overlapping windows, X doesn't draw the parts underneath the covered windows. That means when we move a window, part of covered windows underneath it get revealed, and other parts get covered. X handles this with clipping. Whenever a new part of the window gets revealed, just that part gets redrawn. All of thi part can be automatically handled by X, which will ask the application to just draw the needed part.

X does this using the painter's algorithm - draw things from background to foreground, progressively layering things on top of each other. In X, we can only repaint when we receive an event from X.

When an event is received form the hardware, the base window system makes sure it gets timestamped, queued up, and then handed off to the right window, or to the window manager. Applications can either check if there are any events available, or get the next event off the queue, blocking if there are no events. Most X applications are therefore structured as event loops. Applications can also opt to not receive any events that it doesn't want to handle.

The JVM is still an X application. When we create a window in Java, the JVM will start a new thread with the actual event loop, which handles all of the X stuff. That thread is controlled purely by the JVM.

Animation is done by simply showing a sequence of images at a fast enough rate. We do this by using a loop that sleeps enough that it only runs one iteration every, say, 1/24 seconds, and do the animation calculations and repainting on each iteration.

When graphics operations are slow enough, we might see flickers while the graphics operations are being executed. This is because when we do drawing in X, everything goes directly into the screen buffer. With double buffering, we can keep another, non-screen buffer around to actually draw on, and then swap the screen buffer and the background buffer after we're actually done drawing, effectively instantly updating the screen buffer. For X applications we must do this manually. In Java, double buffering is on by default (note that if a window is double buffered, all of the child windows must also be double buffered).

Notes on Assignment 1: design patterns and MVC not required (though they will be for later assignments), and the simplest way to do graphics is to have a single canvas that paints everything. A Timer is a great way to make sure things run at a fixed interval. It's very important that drawing and game logic are independent.


In complex applications, widgets are often nested quite deep in a heirarchy this heirarchy is known as the interactor tree.

When the windowing toolkit receives an event that depends on position (like mouse clicks), it needs to figure out which widget actually shold receive the event by walking down the tree, triggering events for widgets along the way down (this is known as top-down positional dispatch). Widgets can often decide whether the event should be passed on to children, handle the event, or ignore it.

There's also bottom-up positional dispatch, which is the same thing, but we go one by one through the leaf widgets in the interactor tree, and then going up the tree if we find one, triggering events for widgets on the way up.

Top-down positional dispatch is useful because a parent widget can enforce policies on its children. For example, if a toolbar gets disabled, we shouldn't also have to disable all of the children as well. Bottom-up positional dispatch is useful because the parent can have events triggered on it after the leaf widget, rather than before (which we have with top-down positional dispatch).

Positional dispatch by itself isn't enough. Consider what happens when you drag a scrollbar, and let the mouse leave the bounds of the scrollbar while dragging - it should continue to change the scrollbar even through the cursor isn't strictly in the bounds of the scrollbar, at least until the mouse is released. Most windowing toolkits handle this with the concept of mouse focus. This essentially overrides positional dispatch in some special situations to refine UI behaviours.

There's also keyboard focus, where one designated widget receives all keyboard events, and keyboard accelerators (keyboard shortcuts) that override all of these event handling mechanisms, at a window manager level.

Originally, each X application has one main event loop to receive events from X. This worked, but wasn't very scalable for larger applications, and made it harder to organize code. Early Java gave every widget every event, and each widget had to implement its own event handling, which was even less scalable, though it did make it easier to organize code. The goal is to keep the event behaviour separate from the widgets themselves.

Modern Java totally decouples the widget from the event handling. When we do class X extends JPanel implements MouseMotionListener, JPanel isn't depenent on MouseMotionListener in any way, while we can still handle event behaviours in an intuitive way. It's often also common and good practice to use an inner class that implements the listener interface, rather than the widget class itself implementing it.

In web applications, we have the DOM, which is sort of like an interactor tree that's exposed to Javascript. Over various revisions of the DOM, we currently now have very fine-grained control over event propagation. Javascript events are propagated top-down, can be captured by a particular node, and then bubble back up again. Therefore, we can choose whether to handle events when they're bubbling down, or when they're bubbling up - we don't have to choose a particular dispatch style.

In .NET, we have delegates, which are basically sequences of methods. We declare a delegate signature, declare a delegate for a certain event, and then add methods to call when that delgate is called.

Input devices like pens and multitouch screens often feed data too fast for dispatch mechanisms to work. For things like drawing applications with high performance inputs, applications often need to access the hardware events directly.


A model is a mathematical representation of an object. Computer graphics is the representation and manipulation of models.

A shape model is a model that includes all of the data needed to draw that model, often represented as an array of points. A widget can be considered a shape model. What Java actually provides is the ability to actually draw these models, and to test whether an event falls into a drawn shape.

How do we tell if a poin is inside a polygon? For convex polygons, then we can use something like SAT. One of the simplest ways that works for any kind of polygon is to shoot a ray from any point to the edge of the screen, and check whether it intersected an even or odd number of lines. However, in this course, the polygon classes have better algorithms built in.

Widgets in the interactor tree can be manipulated using affine transformations. This allows us to tell widgets how to draw themselves, and let widgets tell when we're clicking or tapping on them. Note that widgets are always drawn relative to the parent - to figure out the actual postion of a widget, we need to walk down the interactor tree, applying transformations in turn to get the final transformation of the target widget. For our purposes, an affine transformation is some combination of translations, rotations, and scalings.

Translation is done simply by adding values to the coordinates of each shape point. Scaling can be done by multiplying the X and Y coordinates by scalars. Rotation can be done by applying \tup{x \cos \theta - y \sin \theta, x \sin \theta + y \cos \theta} to every shape point.

All affine transformations are linear functions of the original coordinates. As a result, affine transformations are closed - for any affine transformation of an affine transformation of any set of points, there exists a single affine transformation that these two transformations are equivalent to.

2D affine transformations can be better represented using 2 by 2 matrices, since they are simply linear combinations of 2 variables. The advantage of this is that the matrices are associative under multiplication Also, GPUs and modern toolkits are optimised to work with this representation. As a result, a scale matrix is \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}, and a rotation matrix is \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}.

A translation matrix is simply \begin{bmatrix} 1 & \frac{t_x}{y} \\ \frac{t_y}{x} & 1 \end{bmatrix} (when x and y are nonzero). Since it doesn't work for coordinates like the origin, we can't, in general, represent translations as 2 by 2 matrices. Instead, we can use homogeneous coordinates. Basicaly, in addition to the X and Y component, we also have the W component. Basically, every coordinate now has infinite equivalent vectors for all different W values. The actual coordinate is simply the X and Y values divided by the W value.

The advantage of transformations over homogeneous matrices is that the transformation matrix \begin{bmatrix} a & b & t_x \\ c & d & t_y \\ 0 & 0 & 1 \end{bmatrix} - all 2D transformation can be represented using these matrices.


When we're overriding methods like paintComponent, we should call super.paintComponent() before we do any drawing in the overridden method.

Although matrix transformations are nice because they're associative when we multiply then, note that they aren't commutative - order matters a lot when applying transformations.

The default coordinate system has the origin at the top left corner of the window, and increase in both axes as you move rightward and downward. Transformations actually change the coordinate system we're drawing in, so they apply to things we start drawing after changing the coordinate system.

In Java, when we do a translation, we're shifting the canvas by the specified offsets, not the shapes themselves - if we translate by positive values, everything we draw goes upward and leftward. When we do a rotation, we're rotating the canvas - a positive rotation value results in a clockwise rotation. When we do a scale operation, we're stretching the canvas - a value greater than 1 in both axes results in a bigger shape than the original one.

Rotations and scales are always about the origin - to rotate/scale about a particular point, we need to translate such that that point is at the origin, perform the rotation/scaling, then do the opposite translate operation to return the point to where it was originally. Since this is a bit of a hassle, it's generally best to simply draw shapes at the origin to begin with, and move them into place using transformations.

We can also save and restore the current transformation matrix using someGraphics2D.getTransform(), and restore it later with someGraphics2D.setTransform(someTransform). This is useful for things like undoing a sequence of transformations without having to apply all of the inverse transformations.


The scene graph is actually a tree, and allows us to represent heirarchical objects with relative coordinate systems for each objects with respect to its parent in the tree.

Consider the drawing of an arm. Without a scene graph, each time we transformed the upper arm, we'd have to manually recompute the transformations for the lower arm, hand, etc., which is a big hassle. Instead, we could maintain an affine transformation for each object, and child objects would get drawn with that transformation in place:

  1. Component draws itself using its affine transform.
  2. For each child object:
    1. Save the current affine transform.
    2. Do any other transforms necessary, on top of the existing transforms - multiply the current transform with the transforms needed by the child object.
    3. Recursively do all of the steps here for the child object.
    4. Restore the affine transform saved earlier.

With a scene graph, each part of the arm only needs to consider its transformation relative to its parent - the lower arm only needs to care about where it is relative to the upper arm, the hand only needs to care about where it is relative to the lower arm, and so on.

An inside test is a test for whether a point is inside a shape. Without a scene graph, this is simple because we know exactly where every shape is relative to the global coordinate space. With a scene graph, this is more difficult because shapes only know where they are relative to their parents. When we're doing testing whether the point is inside each shape, the usual solution is to transform the point into each shape's coordinate space, using the matrix inverse of the transformation.


Midterm is in 3 weeks. It's going to cover everything up until the midterm.

The Model-View-Controller (MVC) Pattern

MVC is useful because we can have multiple views of loosely coupled data models. For example, in Powerpoint the . When one view gets changed by the user, or if the model changes, we want everything else updated automatically, in order to get a single true representation of our data.

MVC is a pattern that decouples input handling (controller), data representation/manipulations (model), and presentation (view). When the user expresses intent to the controller, the controller takes that input, and uses it to change the model. When the model changes, it notifies the view, which then updates what is shown to the user.

Ideally, the controller knows about the model (the controller calls into a public interface in the model), the view knows about the model (the view grabs all of the model's data and presents it), but the model doesn't know about the views (the model only has the view's interface, not a reference to the actual views themselves - this means we can swap out views arbitrarily).

For example, for Siri, the part that converts speech to intents is the controller, the part that actually retrieves data and performs actions is the model, and the part that displays results is the view.

Although the view knows about the model, the model doesn't know about the view. That means in general, the model should simply notify that it has changed, and the view should pull in all of the data it needs and reload everything, rather than having the model say exactly what changed and have the view update things selectively. This is because making sure things like cascading dependencies and other things are updated correctly in the view is hard, and should be the model's job. However, there are occasionally situations where we would want granular updates, such as when reloading the view would be very expensive.

Even though the view shouldn't have anything to do with the controller, in Java, widgets like buttons act like both views (because they display things) and controllers (because they handle input). So as a comprimise, we generally treat things like buttons as a view, and for its listeners we'll just change the model directly as if the button was a controller. This is a hack, but it's necessarily for most modern UI toolkits. Additionally, this means we have to implement listener interfaces in the controller. Because this forces us to tightly couple the controller and view, it's common to just fold the controller into the view, which then knows about and updates the model, while the model doesn't know about the view/controller.

The reason that we still have models at all is to ensure that we can have multiple views/controllers that work over a single data source (the model). Additionally, separating the model from everything else allows us to swap out different views as needed, such as changing a desktop UI for a smartphone UI. This also makes testing a lot easier, since we can just test each part separately at its interface.

In this course, we will be following that convention, and combining the controller in the view. The model will be loosely coupled to this view/controller using the observer pattern.

In Java, the model will keep a list of objects that implement the view interface, which are the views that rely on the model. This is to ensure that we can update the views properly when the model changes. Note that the model doesn't know anything about those view objects besides what's implemented in the view interface; it just tells the views that it's changed.

The MVC pattern is an instance of the observer pattern, where the model is the subject, and the view is the observer. Java doesn't have MVC classes built in, but it does have classes for the bbserver pattern. When implementing MVC, we can save some work by using java.util.Observable.


While heavyweight widgets are nice because they have a native look and feel, they often appear different and work differently on different platforms. In contrast, lightweight widgets are nice because we can ensure the UI looks the same across every platform and can be made to be very lightweight and optimised, but they look out of place among actually native widgets.

A logical input device is a graphical component, defined by what their function is rather than what their physical form is. For example, locators (X/Y coordinates), buttons, numerical inputs, and so on. Logical input devices sort of have their own model (the properties and behaviours of the input device), view (the graphical part of it), and controller (the part that generates the events).

We can think of a widget as a specific type of logical input device. For example, a button widget is sort of like a logical button input device (something that generates a push event). Another example is a slider or text field widget beinng a logical number input device (something that generates number change events).

Basically, widgets have their own MVC architecture internally. Our application interfaces with the widget via its properties and events that it generates and accepts. Potentially, we can also subclass the widget and modify the model to implement new behaviours.

For example, a label widget has no model, no events, and various properties that determine how it appears. The text is a property rather than a model because it doesn't change - models only contain things that we interact with.

A button has no model, generates push events, and has various properties that control its appearance. A checkbox has a boolean as its model, generates change/check/uncheck events, and has various properties that control its appearance.


Although we could build layouts by positioning our widgets by hand at predetermined coordinates, our windows will often not be the same size, and our content should scale to match. To do this, widgets and layouts need to be flexible - they should be adaptable using layout managers.

Swing has multiple layout managers such as grids, flows, and boxes. Some of the strategies applied are:


Struts are fixed size conponents. Springs are variable-size components that expand to fill space.

Many programming languages support defining UIs in a markup language, in order to separate presentation from content. This arose out of the need for designers to be able to build UIs directly.

Assignment 2 stuff: create a vector drawing application with save/load support using the MVC pattern.


Usefulness is the ability for the interface to meet requirements and actually be practical for real use. Usability is the ability for users to efficiently and effectively do the tasks allowed by the UI.

Devices and their interfaces are getting more complicated over time, but our cognitive capacity for remembering and figuring these interfaces out has more or less remained unchanged. As a result, modern UI design is focused on balancing usefulness and usability. A useful but non-usable interface is hard to work with, but can do everything we need to, while a non-useful but usable interface is really easy to pick up, but doesn't do what we need it to.

A good interface should help users form simple, consistent mental models of the user interfaces. To do this, the UI should do exactly what the appearance suggests - a button should perform an action, and a toggle should change a state variable. Ideally, users should get feedback as soon as possible for the results and progress on their operations.

Basically, an interface should strive to surprise the user as little as possible.


I dropped this course due to my injuries, as it became infeasible to continue going to class.

Creative Commons License This work by Anthony Zhang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Copyright 2013-2017 Anthony Zhang.