-
Notifications
You must be signed in to change notification settings - Fork 0
Automata Notes
A finite automaton is a formal system which we may think of as either a table or a graph.
It remembers only a finite amount of information.
Remark. This finite memory is a feature though. We cannot tell if two programs do the same thing. But we can tell if two automata perform the same thing or not. Finiteness of memory permits this to happen. (End of Remark)
Informations is represented by its state.
State changes in response to inputs.
We have rules which tell us how states change, called transitions.
So, just have caution: inputs do not describe state, they describe the transitions we take.
We can use finite automata for both design and verification of circuits and communication protocols.
They're wonderful for many text-processing applications (e.g., regular expressions).
Automata play a critical part in compilers.
But what's more: automata describe simple patterns of events, which is interesting just philosophically.
The scoring system is arcane. A "match" consists of 3-5 sets. And a "Set" consists of 6 or more games.
How does a "game" work in Tennis? One person serves throughout the game. To win a game, you must score at least 4 points. But you must also win by at least 2 points.
We will use inputs s
("server wins point") and o
("opponent wins point").
It is common to represent a finite automata by a graph, nodes represent states and edges represent transitions.
The state is <server>-<opponent>
and each player has Love
, 15
,
30
, and 40
points - in that order.
When both players are tied and have at least 3 points, they are in the
deuce
state. The deuce
state doesn't remember the history of the
game. It just results in either Advantage In
(in
referring to the
server) if the server scores a point, or Advantage Out
if the opponent
scores. These new states either results in a victory or a return to deuce
.
digraph {
rankdir=LR;
node [style="invis"]; start;
node [shape="doublecircle" style=""] "Server Wins" "Opponent Wins";
node [style="" shape="circle"];
start -> "Love" [label="start"];
"Love" -> "15-Love" [label="s"];
"Love" -> "Love-15" [label="o"];
"15-Love" -> "30-Love" [label="s"];
"15-Love" -> "15-All" [label="o"];
"Love-15" -> "15-All" [label="s"];
"Love-15" -> "Love-30" [label="o"];
"30-Love" -> "40-Love" [label="s"];
"30-Love" -> "30-15" [label="o"];
"15-All" -> "30-15" [label="s"];
"15-All" -> "15-30" [label="o"];
"Love-30" -> "15-30" [label="s"];
"Love-30" -> "Love-40" [label="o"];
"Love-40" -> "Opponent Wins" [label="o"];
"40-Love" -> "Server Wins" [label="s"];
"Love-40" -> "15-40" [label="s"];
"15-30" -> "15-40" [label="o"];
"15-30" -> "30-All" [label="s"];
"30-15" -> "30-All" [label="o"];
"30-15" -> "40-15" [label="s"];
"40-Love" -> "40-15" [label="o"];
"40-15" -> "Server Wins" [label="s"];
"40-15" -> "40-30" [label="o"];
"30-All" -> "40-30" [label="s"];
"30-All" -> "30-40" [label="o"];
"15-40" -> "30-40" [label="s"];
"15-40" -> "Opponent Wins" [label="o"];
"40-30" -> "Server Wins" [label="s"];
"40-30" -> "Deuce" [label="o"];
"30-40" -> "Deuce" [label="s"];
"30-40" -> "Opponent Wins" [label="o"];
"Deuce" -> "Ad-in" [label="s"];
"Deuce" -> "Ad-out" [label="o"];
"Ad-in" -> "Server Wins" [label="s"];
"Ad-in" -> "Deuce" [label="o"];
"Ad-out" -> "Opponent Wins" [label="o"];
"Ad-out" -> "Deuce" [label="s"];
}
Given a sequence of inputs ("input string"), we start in the start state and follow the transition from each symbol in turn.
Input is "accepted" if you wind up in a final ("accepting") state after all inputs have been read.
So in tennis, we could have an input string sososososososs
where the
server and opponent win alternate points, until ultimately the server
wins.
Definition. The set of strings accepted by an automaton A
is the
"language" of A
. We denote this L(A)
. (End of Definition)
So the language of tennis, for example, is generated {s, o}
provided
the string results in a final state (someone wins).
Definition. An "Alphabet" is any finite set of symbols. (End of definition)
Examples: ASCII, Unicode, {0,1}
(binary), among others. For tennis we
used {s,o}
as the Alphabet.
Definition. A "String" over an alphabet Σ is a list, each element of which is a member of Σ. We denote the set of all strings over Σ by Σ* (End of Definition)
So 011
is a string over bits. It's also a string over the ASCII
alphabet.
Definition. The "Length" of a string is its number of positions. (End of Definition)
Definition. The "Empty String" is the string without any letters,
denoted as ϵ
throughout. (End of Definition)
Example. Consider {0,1}* = {ϵ, 0, 1, 00, 01, 10, 11, 000, 001, ...}
.
We do not distinguish the string 0
from the symbol 0
in its
alphabet. Most languages distinguish these two. For example, C has '0'
denote the symbol and "0"
denote the string.
Definition. A "Language" is a subset of Σ* for some alphabet Σ. (End of Definition)
Example. The set of strings over {0,1}
with no two consecutive
1
. So this looks like L = {ϵ, 0, 1, 00, 01, 10, 000, 001, 010, 100, 101, ...}
.
Puzzle. Observe, if n
is the length of a string in L
, we have
f(n
) be the number of strings with that length. So f(0)=1, f(1)=2,
f(2)=3, f(3)=5, and so on. What is a closed form expression for f(n
)?
(End of Puzzle)
We have a formalism for defining languages consisting of:
- A finite set of "States" (denoted
Q
typically) - An "Input Alphabet" (Σ traditionally)
- A "transition function" (denoted δ)
- A "Start State" (denoted
q0
inQ
, usually subscripting the 0) - A set of "Final States" (a subset
F
ofQ
)
Note that "Final" and "Accepting" are synonyms in Automata.
It takes two arguments: a state and an input symbol. Then δ(q, a) is the state the automata goes to when it is currently in state q and input a is received.
Note: we always have a next state. But sometimes we have a "dead state". In tennis, when e.g. the server wins, the transition function would always return to the server.
We can represent an automaton using a graph, where the nodes represent the states and the arcs are the transitions.
An arc from state p
to state q
is labeled by the input symbols that
have transitions from p
to q
. Usually, we have 1 arc, and just write
out all the input symbols separated by a comma. So instead of two arrows
a:p->q
and b:p->q
, we label one single arrow with a,b
. (This is
our own lazy convention)
We have an arrow labeled "Start" to the start state.
Final states are indicated by double circles.
Consider the automata that recognize strings ending in "-ing".
We start with Nothing
. We read character by character from the
string. When we read an i
, we change states to Saw 'i'
.
If the next character is an i
, we don't change states. On the other
hand, if the next character is an n
, we move to the Saw 'in'
state. Otherwise, we return to the Nothing
state.
Once at Saw 'in'
, we can move to the end state if the next character
read is g
. If it's an i
, we go back to the Saw 'i'
state. Otherwise, we go back to the Nothing
start state.
digraph {
rankdir=LR;
node [style="invis"]; start;
node [shape="doublecircle" style=""] "Saw -ing";
node [style="" shape="circle"];
start -> "Nothing" [label="start"];
"Nothing" -> "Nothing" [label="Not 'i'"];
"Nothing" -> "Saw 'i'" [label="'i'"];
"Saw 'i'" -> "Nothing" [label="Not 'i' or 'n'"];
"Saw 'i'" -> "Saw 'in'" [label="'n'"];
"Saw 'in'" -> "Saw 'i'" [label="'i'"];
"Saw 'in'" -> "Nothing" [label="Not 'i' or 'g'"];
"Saw 'in'" -> "Saw -ing" [label="'g'"];
"Saw -ing" -> "Nothing" [label="Not 'i'"];
"Saw -ing" -> "Saw 'i'" [label="'i'"];
}
The simplest protocol for sending data. We begin in the Ready
state. Then we send data, transitioning to the Sending
state via the
data in
transition.
We have Sending
state sent to itself via the timeout
transition. Similarly, if Sending
fails, then we have an ack
transition to the Ready
state.
There is no final state, since this is designed to run forever.
It's also incomplete, we should have a timeout
transition on the
Ready
state, which would do nothing. We'd have a new Error
state,
which occurs if we try to run the data in
on Sending
or ack
on
Ready
. The Error
state would be terminal.
digraph {
rankdir=LR;
node [style="invis"]; start;
node [style="" shape="circle"];
start -> "Ready" [label="start"];
"Ready" -> "Sending" [label="data in"];
"Sending" -> "Sending" [label="timeout"];
"Sending" -> "Ready" [label="ack"];
}
Recall our example of the language over {0,1}
without two consecutive
1
s in the language.
We have an automata for this. State A
is the string so far, which has
no 11
nor does it end in 1
.
The state B
is the string so far, like A
, except B
ends with a
single 1
.
The state C
is terminal but not and end state, because it has 11
as
a substring.
digraph {
rankdir=LR;
node [style="invis"]; start;
node [shape="doublecircle" style=""] A, B;
node [style="" shape="circle"];
start -> A [label="start"];
A -> B [label="1"];
A -> A [label="0"];
B -> A [label="0"];
B -> C [label="1"];
C -> C [label="0,1"];
}
We could represent the automata as a table. The rows each correspond to one of the states. The columns correspond to input symbols.
Final states are starred. Start states have arrows pointing to it.
0 | 1 | |
---|---|---|
->*A | A | B |
*B | A | C |
C | C | C |
Remark (Convention). We will represent ..., w, x, y, z are strings. Similarly, a, b, c, ... are single symbol input. (End of Remark).
We describe the effect of a string of inputs on a DFA by extending δ to a state and a string.
The intuition is we write δ(q, abc) = δ(δ(δ(q,a),b),c). So looking at
the graph, we start at the node q
, then follow the arcs a
, then b
,
then c
.
(This notation is counter-intuitive to mathematicians, because we ought to be composing the partial functions δ(-,c)∘δ(-,b)∘δ(-,a) and it would have been more intuitive to write δ(-,c∘b∘a) or something like that. But meh, conventions are arbitrary anyways.)
If A
is an automaton, then L(A)
is its language.
For a Deterministic Finite Automaton A
, its language L(A)
is the set
of strings labeling paths from the start state to a final state. Or more
precisely, L(A)
is the set of strings w
such that δ(q0
,w
) is in
F
, the set of final states.
Consider the automaton for the language over {0,1}
which does not
include any substring 11
. The language it produces consists of the
strings w
such that w
does not have two consecutive 1
's. (If it
had two consecutive 1
's, the string would transition A
to C
, and
C
is not a final state!)
So the automaton generates the desired language.
Remark. Often, to prove results about automata, we will prove two sets are the same. For this example, one set is "The language of the DFA" and the other is "The set of strngs of 0's and 1's with no consecutive 1's". (End of Remark)