Skip to content

Automata Notes

Alex edited this page Nov 7, 2013 · 2 revisions

Motivation

A finite automaton is a formal system which we may think of as either a table or a graph.

It remembers only a finite amount of information.

Remark. This finite memory is a feature though. We cannot tell if two programs do the same thing. But we can tell if two automata perform the same thing or not. Finiteness of memory permits this to happen. (End of Remark)

Informations is represented by its state.

State changes in response to inputs.

We have rules which tell us how states change, called transitions.

So, just have caution: inputs do not describe state, they describe the transitions we take.

Applications

We can use finite automata for both design and verification of circuits and communication protocols.

They're wonderful for many text-processing applications (e.g., regular expressions).

Automata play a critical part in compilers.

But what's more: automata describe simple patterns of events, which is interesting just philosophically.

Example: Tennis

The scoring system is arcane. A "match" consists of 3-5 sets. And a "Set" consists of 6 or more games.

How does a "game" work in Tennis? One person serves throughout the game. To win a game, you must score at least 4 points. But you must also win by at least 2 points.

We will use inputs s ("server wins point") and o ("opponent wins point").

It is common to represent a finite automata by a graph, nodes represent states and edges represent transitions.

The state is <server>-<opponent> and each player has Love, 15, 30, and 40 points - in that order.

When both players are tied and have at least 3 points, they are in the deuce state. The deuce state doesn't remember the history of the game. It just results in either Advantage In (in referring to the server) if the server scores a point, or Advantage Out if the opponent scores. These new states either results in a victory or a return to deuce.

digraph {
    rankdir=LR;
        
    node [style="invis"]; start;
    node [shape="doublecircle" style=""] "Server Wins" "Opponent Wins";
        
    node [style="" shape="circle"];
    start -> "Love" [label="start"];
    "Love" -> "15-Love" [label="s"];
    "Love" -> "Love-15" [label="o"];
    "15-Love" -> "30-Love" [label="s"];
    "15-Love" -> "15-All" [label="o"];
    "Love-15" -> "15-All" [label="s"];
    "Love-15" -> "Love-30" [label="o"];
    "30-Love" -> "40-Love" [label="s"];
    "30-Love" -> "30-15" [label="o"];
    "15-All" -> "30-15" [label="s"];
    "15-All" -> "15-30" [label="o"];
    "Love-30" -> "15-30" [label="s"];
    "Love-30" -> "Love-40" [label="o"];
    "Love-40" -> "Opponent Wins" [label="o"];
    "40-Love" -> "Server Wins" [label="s"];
    "Love-40" -> "15-40" [label="s"];
    "15-30" -> "15-40" [label="o"];
    "15-30" -> "30-All" [label="s"];
    "30-15" -> "30-All" [label="o"];
    "30-15" -> "40-15" [label="s"];
    "40-Love" -> "40-15" [label="o"];
    "40-15" -> "Server Wins" [label="s"];
    "40-15" -> "40-30" [label="o"];
    "30-All" -> "40-30" [label="s"];
    "30-All" -> "30-40" [label="o"];
    "15-40" -> "30-40" [label="s"];
    "15-40" -> "Opponent Wins" [label="o"];
    "40-30" -> "Server Wins" [label="s"];
    "40-30" -> "Deuce" [label="o"];
    "30-40" -> "Deuce" [label="s"];
    "30-40" -> "Opponent Wins" [label="o"];
    "Deuce" -> "Ad-in" [label="s"];
    "Deuce" -> "Ad-out" [label="o"];
    "Ad-in" -> "Server Wins" [label="s"];
    "Ad-in" -> "Deuce" [label="o"];
    "Ad-out" -> "Opponent Wins" [label="o"];
    "Ad-out" -> "Deuce" [label="s"];
}

Accepting Inputs

Given a sequence of inputs ("input string"), we start in the start state and follow the transition from each symbol in turn.

Input is "accepted" if you wind up in a final ("accepting") state after all inputs have been read.

So in tennis, we could have an input string sososososososs where the server and opponent win alternate points, until ultimately the server wins.

Language of an Automaton

Definition. The set of strings accepted by an automaton A is the "language" of A. We denote this L(A). (End of Definition)

So the language of tennis, for example, is generated {s, o} provided the string results in a final state (someone wins).

Deterministic Finite Automata

Definition. An "Alphabet" is any finite set of symbols. (End of definition)

Examples: ASCII, Unicode, {0,1} (binary), among others. For tennis we used {s,o} as the Alphabet.

Definition. A "String" over an alphabet Σ is a list, each element of which is a member of Σ. We denote the set of all strings over Σ by Σ* (End of Definition)

So 011 is a string over bits. It's also a string over the ASCII alphabet.

Definition. The "Length" of a string is its number of positions. (End of Definition)

Definition. The "Empty String" is the string without any letters, denoted as ϵ throughout. (End of Definition)

Example. Consider {0,1}* = {ϵ, 0, 1, 00, 01, 10, 11, 000, 001, ...}. We do not distinguish the string 0 from the symbol 0 in its alphabet. Most languages distinguish these two. For example, C has '0' denote the symbol and "0" denote the string.

Definition. A "Language" is a subset of Σ* for some alphabet Σ. (End of Definition)

Example. The set of strings over {0,1} with no two consecutive 1. So this looks like L = {ϵ, 0, 1, 00, 01, 10, 000, 001, 010, 100, 101, ...}.

Puzzle. Observe, if n is the length of a string in L, we have f(n) be the number of strings with that length. So f(0)=1, f(1)=2, f(2)=3, f(3)=5, and so on. What is a closed form expression for f(n)? (End of Puzzle)

Formalism

We have a formalism for defining languages consisting of:

  1. A finite set of "States" (denoted Q typically)
  2. An "Input Alphabet" (Σ traditionally)
  3. A "transition function" (denoted δ)
  4. A "Start State" (denoted q0 in Q, usually subscripting the 0)
  5. A set of "Final States" (a subset F of Q)

Note that "Final" and "Accepting" are synonyms in Automata.

Transition Function

It takes two arguments: a state and an input symbol. Then δ(q, a) is the state the automata goes to when it is currently in state q and input a is received.

Note: we always have a next state. But sometimes we have a "dead state". In tennis, when e.g. the server wins, the transition function would always return to the server.

Graph Representation of an Automaton

We can represent an automaton using a graph, where the nodes represent the states and the arcs are the transitions.

An arc from state p to state q is labeled by the input symbols that have transitions from p to q. Usually, we have 1 arc, and just write out all the input symbols separated by a comma. So instead of two arrows a:p->q and b:p->q, we label one single arrow with a,b. (This is our own lazy convention)

We have an arrow labeled "Start" to the start state.

Final states are indicated by double circles.

Example: Strings ending in "-ing"

Consider the automata that recognize strings ending in "-ing".

We start with Nothing. We read character by character from the string. When we read an i, we change states to Saw 'i'.

If the next character is an i, we don't change states. On the other hand, if the next character is an n, we move to the Saw 'in' state. Otherwise, we return to the Nothing state.

Once at Saw 'in', we can move to the end state if the next character read is g. If it's an i, we go back to the Saw 'i' state. Otherwise, we go back to the Nothing start state.

digraph {
    rankdir=LR;
        
    node [style="invis"]; start;
    node [shape="doublecircle" style=""] "Saw -ing";
        
    node [style="" shape="circle"];
    start -> "Nothing" [label="start"];
    "Nothing" -> "Nothing" [label="Not 'i'"];
    "Nothing" -> "Saw 'i'" [label="'i'"];
    "Saw 'i'" -> "Nothing" [label="Not 'i' or 'n'"];
    "Saw 'i'" -> "Saw 'in'" [label="'n'"];
    "Saw 'in'" -> "Saw 'i'" [label="'i'"];
    "Saw 'in'" -> "Nothing" [label="Not 'i' or 'g'"];
    "Saw 'in'" -> "Saw -ing" [label="'g'"];
    "Saw -ing" -> "Nothing" [label="Not 'i'"];
    "Saw -ing" -> "Saw 'i'" [label="'i'"];
}

Example: Protocol for Sending Data

The simplest protocol for sending data. We begin in the Ready state. Then we send data, transitioning to the Sending state via the data in transition.

We have Sending state sent to itself via the timeout transition. Similarly, if Sending fails, then we have an ack transition to the Ready state.

There is no final state, since this is designed to run forever.

It's also incomplete, we should have a timeout transition on the Ready state, which would do nothing. We'd have a new Error state, which occurs if we try to run the data in on Sending or ack on Ready. The Error state would be terminal.

digraph {
    rankdir=LR;
        
    node [style="invis"]; start;
        
    node [style="" shape="circle"];
    start -> "Ready" [label="start"];
    "Ready" -> "Sending" [label="data in"];
    "Sending" -> "Sending" [label="timeout"];
    "Sending" -> "Ready" [label="ack"];
}

Example: Binary Strings without duplicate 1s

Recall our example of the language over {0,1} without two consecutive 1s in the language.

We have an automata for this. State A is the string so far, which has no 11 nor does it end in 1.

The state B is the string so far, like A, except B ends with a single 1.

The state C is terminal but not and end state, because it has 11 as a substring.

digraph {
    rankdir=LR;
        
    node [style="invis"]; start;
    node [shape="doublecircle" style=""] A, B;
        
    node [style="" shape="circle"];
    start -> A [label="start"];
    A -> B [label="1"];
    A -> A [label="0"];
    B -> A [label="0"];
    B -> C [label="1"];
    C -> C [label="0,1"];
}

We could represent the automata as a table. The rows each correspond to one of the states. The columns correspond to input symbols.

Final states are starred. Start states have arrows pointing to it.

0 1
->*A A B
*B A C
C C C

Remark (Convention). We will represent ..., w, x, y, z are strings. Similarly, a, b, c, ... are single symbol input. (End of Remark).

Extended Transition Function

We describe the effect of a string of inputs on a DFA by extending δ to a state and a string.

The intuition is we write δ(q, abc) = δ(δ(δ(q,a),b),c). So looking at the graph, we start at the node q, then follow the arcs a, then b, then c.

(This notation is counter-intuitive to mathematicians, because we ought to be composing the partial functions δ(-,c)∘δ(-,b)∘δ(-,a) and it would have been more intuitive to write δ(-,c∘b∘a) or something like that. But meh, conventions are arbitrary anyways.)

Language of a DFA

If A is an automaton, then L(A) is its language.

For a Deterministic Finite Automaton A, its language L(A) is the set of strings labeling paths from the start state to a final state. Or more precisely, L(A) is the set of strings w such that δ(q0,w) is in F, the set of final states.

Example: Binary Strings without 11

Consider the automaton for the language over {0,1} which does not include any substring 11. The language it produces consists of the strings w such that w does not have two consecutive 1's. (If it had two consecutive 1's, the string would transition A to C, and C is not a final state!)

So the automaton generates the desired language.

Remark. Often, to prove results about automata, we will prove two sets are the same. For this example, one set is "The language of the DFA" and the other is "The set of strngs of 0's and 1's with no consecutive 1's". (End of Remark)

Clone this wiki locally