Introduction to Artificial Intelligence.

```
Alan Tsang
Section 001
Email: akhtsang@uwaterloo.ca
Website: https://cs.uwaterloo.ca/~klarson/teaching/F17-486/, https://cs.uwaterloo.ca/~akhtsang/cs486.html
Office Hours: Tuesdays 1:15pm to 2:45pm in DC2306B, accessible via the AI Lab
Mondays/Wednesdays 4:00pm-5:20pm
```

Info on course website, assignments on LEARN, questions on Piazza. 5 assignments worth 40% total, 15% midterm on October 18, 2017, 45% final exam, optional project worth 5% bonus marks. Assignments can be submitted up to 48 hours after deadline.

There's no real agreed-upon definition for AI, but in this course we'll consider it to be the mathematical study of intelligent action in a complex environment. We usually think about 4 different kinds of AI: systems that think like humans, act like humans, think rationally, or act rationally. The definitions usually differ along the dimensions of thinking vs. acting and human behaviour vs. idealized rational behaviour. In other words, we're trying to duplicate what the human brain does (passing the turing test), and what the human brain should do (acting rationally).

Overview of AI in popular perception: threats of superintelligence, automation displacing jobs, biases encoded in AI models, self-driving cars. Overview of the turing test, including issues like not being amenable to mathematical analysis, whether it's more important to understand underlying principles than mimicking true intelligence, and whether it's even a good test (we don't make planes by trying to make something so bird-like that it fools a bird). Overview of history of AI, like Turing's work, the Dartmouth meeting, AI winter, and modern computational AI.

Classically, AI was seen to mean general problem solving, like solving chess. More recently, we focus more on specific problems, such as perception, robotics, and deliberative reasoning (thinking about what to do based on available data), with a greater focus on statistical methods, decision theories, and probability.

In this course we'll talk about classic AI problems like search, as well as knowledge representation/reasoning, making decisions under uncertainty, and learning.

An **agent** is an entity that perceives and acts. We usually evaluate agents based on things like how well it achieves goals and how many resources it consumes. Agents exist within an environment, with actuators, sensors, and environmental constraints. Environments can be fully/partially observable (can we see the entire environment at once?), deterministic/stochastic (does the same action always result in the same result?), episodic/dynamic (do previous steps affect later steps? in classification no, since previous predictions are independent, but in chess yes, since previous moves are very important), discrete/continuous (chess AI could be considered discrete, whereas controlling a robot arm might be considered continuous), and single/multi agent (is the agent the only one in its environment?).

Search is one of the first topics in AI, used in everything from theorem proving to robotic navigation. Search problems consist of a **state space** (the world we're searching within), a **successor function** (given a state, determines possible actions/next states and the cost of getting to those states), a **start state** (the state at the start of the search), and a **goal test** (determines whether we've reached the goal, and possibly also information about how close we are to the goal). A **path** is a sequence of states and operations. A solution is a path from the start state to a state that satisfies the goal test. An optimal solution is a solution of minimal cost.

Consider a pac-man AI. the state space would be the game map, the successor function would define actions like, moving left/right/up/down and how expensive those would be, the start state would be the initial game state, and the goal test would be getting to the end.

Overview of N-queens problem and why it's a search problem, as well as various other types of search problems. For our purposes, search problems won't have elements of chance, continuous states, adversaries, or partial observation - all of our problems will be deterministic, discrete, single-agent, and fully observable.

The world state includes every detail of the environment at a certain point in the search process. For something like pac-man, we might care about pac-man's position, the dots, and the ghosts, but pathfinding might only have current position in its state.

Search problems can be thought of as minimum-cost pathfinding on a graph of world states - we can use algorithms like breadth-first search and A*. The search tree is the tree made of possible paths within the graph of world states. The factors we care about in search algorithms are things like completeness (will it always find a solution?), optimality (does it always find the minimum cost solution?), time complexity, and space complexity. While DFS is light on memory and goes deep quickly (O(b^h) time and O(h) space, where b is arity of tree and h is tree height), cycles in the graph can result in infinite loops, so it's not complete. BFS is complete and can find shallower solutions quickly, but needs a lot of memory to store visited states (O(b^h) space and time where b is arity of tree and h is tree height).

Instead, we can use **iterative deepening search**, which is essentially a mix of both - do DFS with a small depth limit, then repeatedly increase the limit and rerun DFS until we reach the goal. This is both memory-efficient and complete, since the depth limit prevents infinite loops. This sounds wasteful, but it isn't too bad in practice since the number of nodes grows exponentially, so most nodes we visit will be in the later iterations.

Note that all three of these algorithms we've looked at so far are the same besides the order we visit nodes in - they're generic and aren't specialized for specific problems. We can do better in specific problems by taking advantage of their inherent structure.

Assignment 1 will be out this week.

Uninformed search is search that doesn't depend on any knowledge about the problem, like BFS or DFS, or iterative deepending. It's widely suitable, but performs poorly and is computationally expensive. We would like to do informed search instead, where we can take advantage of knowledge we have about the problem to make our searches more efficient.

Usually when searching a graph, we often have knowledge about the **merit** of a node - some approximation of how good the node is. There are often different ways to defined merit for any given problem, such as how much it would cost to use the node in our solution, or how computationally expensive it would be to use it. For example, uninformed search expands nodes based on the distance from the start node, but we might augment this by using a heuristic like Euclidean distance to the goal, if we're searching in a Euclidean space.

A **heuristic** estimates the distance from any given node to the goal node, such as Euclidean or Manhattan distance. In our search, if node A has a lower heuristic than node B, we might choose node A over B (and also factor in things like how far from the start nodes A or B are).

Best-first search is the most naive informed search. At each step, we expand exactly the node with the lowest heuristic function value - the node that seems closest to the goal function. This is not a complete search, since it might get stuck in a minimim-heuristic cycle of nodes (we don't keep track of which nodes we've already visited). Additionally, this is not optimal, since the resulting path isn't always the shortest one - the length of the path taken so far at any given node isn't taken into account.

While best-first search looked at the estimated forward cost (expand node with lowest estimated cost from current node to goal), breadth-first search looked at the backward cost (expand node with lowest cost from start to current node). We want to combine these two to make breadth-first search more efficient, and this gives us **A* search**. Essentially, let g(n) be the backward cost for the node n and h(n) be an estimate of the forward cost for the node n. We then perform essentially breadth-first search, repeatedly queueing up each visited nodes' neighbors, but always choose the node in the priority queue with the lowest value of f(n) = g(n) + h(n), stopping when the priority queue is empty (the priority queue starts off only having the start node).

A* is complete - it will give us a valid path for any start/goal vertex in any graph. A* is also optimal if the heuristic is **admissible and we're searching in a tree**, or if the heuristic is **consistent**. Admissible means that the forward cost heuristic's value is always less or equal to the true forward cost. For example, h(n) = 0 is an admissable heuristic for any graph. Proof of optimality:

Let s be the start node and e be the goal node. Suppose A* gives us a non-optimal solution within a tree. Since it's a tree, we must have reached a different ending node, so there must be a non-optimal ending node e'.

Let n be some node along the path from s to e. Clearly, at some point both n and g' must be in the A* priority queue at the same time, and A* must have removed e' over n.

So f(e') \le f(n) and g(e') + h(e') \le g(n) + h(n). Let c(x, y) be the true cost from node x to y. Since the path is non-optimal, g(e') > g(e). Clearly, g(e) = c(s, n) + c(n, e) \ge g(n) + h(n) = f(n).

So f(e') > f(n), which isn't possible since f(e') \le f(n). So A* cannot have given us a non-optimal solution within a tree.

Consistent means that the heuristic is admissible regardless of the goal node - h(n) \le c(n, n') + h(n'), where c(x, y) is the true cost of getting from node x to node y. Most admissible heuristics will also be consistent, but not all of them.

A* has good performance in practice, but in the worst case needs to visit every node, and keep track of all the backtracking information for every node. The real-world performance of A* depends heavily on the heuristic function.

For example, if we were to use A* to solve a sliding puzzle game (a 3x3 grid of squares with one tile missing, squares must be slid into a particular pattern), One heuristic we might use is to count the number of squares to move each tile to get to its proper place, assuming the tiles can occupy squares regardless of whether there's already a tile there - the sum of the Manhattan distances between each node to its destination. A somewhat worse heuristic would be to count the number of tiles not in their correct places.

To design a heuristic, we generally start by removing some constraints of the problem, finding a computationally cheap solution to the relaxed problem (or precompute a database of solutions for those simplified problem instances), and then use that as the heuristic. For example, for the sliding puzzle game, we started by relaxing the requirements that there only be one tile per square, and then trivially found a solution by simply moving the tiles to their correct places. There's often a big tradeoff between the cost of computing the heuristic, versus how good the heuristic is.

We often want to use A* while also limiting how much memory we use. One way to do this is using Iterative Deepening A*, which is sort of like iterative deepening search, but we consider nodes by order of their f(n) value, limit the value of f(n), and progressively rerun the search with increasing f(n) limit until the goal is reached. Another way is simplified memory-bounded A*, which simply starts dropping nodes from the priority queue with the highest f(n) values (or oldest nodes, as a tiebreaker) when it's out of memory (this is still optimal/complete if distance to shallowest goal node is less or equal to memory size).

This work by Anthony Zhang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright 2013-2017 Anthony Zhang.