Jesse A. Tov
2 May 2007
Code: array.erl clock.erl dp.erl factorial.erl global_server.erl kv.erl listener.erl print_server.erl ring.erl subscription.erl
Let's design a scrabble server:
- Game state with threads representing agents, shared memory,
explicit synchronization
- State machine (works okay so long as thing are I/O bound)
- Message passing concurrent processes
Why?
- The world is parallel and distributed.
- The way we've learned to think about programming is not.
- "To program a real-world application, we _observe_ the concurrency
patterns = no guesswork."
- Also parallel speed-up.
- Erlang not just about parallel speed-up. Also:
- Reliability
- Modeling inherent concurrency
So you've decided you want to model concurrency with concurrency.
You shouldn't have to pay for massive concurrency:
- In performance
- In complexity
Thesis of Concurrency Oriented Programming:
- You can't add concurrency to a sequential language in a sane way.
- You need a COPL. COPL characteristics:
- Processes (self contained)
- Strong isolation (separate heaps . . .)
- Unique, unforgeable pids
- No sharing -- message passing
- Message passing is asynchronous and potentially unreliable (send
and pray)
- You can find out if another process fails, and why (might lie,
e.g. network errors)
- Claim: Programming such a system is surprisingly easy.
- Claim: Erlang satisfies the above.
Some Erlang success stories:
AXD301
- Ericsson (ATM) phone switch
- Impressive (nine 9s?) uptime
- 1.1 M lines of code, mostly _not_ concurrent (more on this later)
- 10 Gbit = 50-100k phone calls
- expandable to 160 Gbit
Bluetail Mail Robustifier / Alteon SSL Accelerator
- Network appliances
- Each ~30 Kloc
- BMR deployed by Swedish ISP since 1999 (as of 2003)
- ASA "best in class", market leader
ejabberd
- Server for XMPP chat/presence protocol
- Replacing open source Jabber 1.4/2 as _the_ XMPP server to use
- Why? Anecdotally, less trouble
- Easy to cluster
- Everybody, including jabber.org, has switched over
LANGUAGE INTRODUCTION
Erlang is
- Untyped, single-assignment functional language
- Lists, tuples, symbols (called "atoms"), binary data
- Light weight processes
- Message passing
- Transparent distribution
- Runs on OS independent VM
- Soft real-time GC
Sequential examples:
fac(0) -> 1;
fac(N) when N > 0 -> N * fac(N - 1).
lookup(Key, {Key, Val, _, _}) -> {ok, Val};
lookup(Key, {Key1, _, L, _}) when Key < Key1 -> lookup(Key, L);
lookup(Key, {Key1, _, _, R}) -> lookup(Key, R);
lookup(Key, _) -> not_found.
adder(A) -> fun(B) -> A + B end.
Add1 = adder(1).
Add1(5).
A simple concurrent example:
account(Balance) ->
receive
{deposit, Amount, Whom} ->
Whom ! {deposit_receipt, Amount},
account(Balance + Amount);
{balance, Whom} ->
Whom ! {balance, Balance},
account(Balance);
{withdrawal, Amount, Whom} when Amount > Balance ->
Whom ! overdraft,
account(Balance);
{withdrawal, Amount, Whom} ->
Whom ! {withdrawal_receipt, Amount},
account(Balance - Amount)
end.
Account = spawn(fun () -> account(0) end).
Account ! {withdrawal, 200}.
receive
{withdrawal_receipt, Amount} -> ok,
overdraft -> not_ok
end.
Account ! {deposit, 300}.
receive {deposit_receipt, A} -> A end.
Actor version:
account(Balance) ->
receive
{deposit, Amount, Whom} ->
become account(Balance + Amount) |
Whom ! {deposit_receipt, Amount};
{balance, Whom} ->
Whom ! {balance, Balance};
{withdrawal, Amount, Whom} when Amount > Balance ->
Whom ! overdraft;
{withdrawal, Amount, Whom} ->
become account(Balance - Amount) |
Whom ! {withdrawal_receipt, Amount},
end.
Comparison with Actors
----------------------
Similarities:
- can only talk to someone if you have their address (well, unless you
look in the process dictionary)
- can create new processes
- addresses are first-class, so you can pass them in messages
-> variable topology
- message passing is asynchronous
- Actor laws apply:
1) finite number of events between any two events
2) no event immediately causes more than a finite number of events
3) each event is generated by at most one send
4) arrivals are well-ordered
5) event ordering is well-founded
6) no event immediately causes more than a finite number of process
creations
7) actors have a finite number of acquaintances
Differences:
- Erlang processes correspond to serialized actors. Erlang has
"constant values" instead of unserialized actors. (No need for CPS
all the time.)
- If process A sends two messages to process B, they arrive in the order
they were sent. Actors don't have this restriction. (Claim: this
makes programming easier.)
- Erlang messages are not guaranteed to be delivered -- if you want to
be sure, get a receipt.
- Erlang processes are internally sequential; Actors may have some
internal parallelism (for example, becoming and sending
simultaneously).
- In the actor model, no global information. In erlang, you can
"register" a process to a global name, and you can create globally
unique tokens (called refs).
Dining philosophers:
In CSP:
FORK =
*[ phil(i) ? pickup() -> phil(i) ? putdown()
[]
phil((i-1) mod 5) ? pickup() -> phil((i-1) mod 5) ? putdown()
]
ROOM =
occupancy: integer;
occupancy := 0;
*[
(i: 0..4) occupancy < 4; phil(i) ? enter() ->
occupancy := occupancy + 1
[]
(i: 0..4) phil(i) ? exit() ->
occupancy := occupancy - 1
]
PHIL ==
*[ true ->
THINK;
room ! enter();
fork(i) ! pickup();
fork((i+1) mod 5) ! pickup();
EAT;
fork(i) ! putdown();
fork((i+1) mod 5) ! putdown();
room ! exit();
]
fork(Left, Right) ->
receive
{pickup, Left} ->
receive {putdown, Left} -> fork(Left, Right) end;
{pickup, Right} ->
receive {putdown, Right} -> fork(Left, Right) end
end.
room(Occupants, Capacity) ->
receive
{knock, Whom} when Occupants < Capacity ->
Whom ! {enter, self()},
room(Occupants + 1, Capacity);
{leave, _} ->
room(Occupants - 1, Capacity)
end.
philosopher(Name, Room, LFork, RFork) ->
think(Name),
Room ! {knock, self()},
receive {enter, Room} -> ok end,
LFork ! {pickup, self()},
RFork ! {pickup, self()},
receive
{a_fork, LFork} ->
receive {a_fork, RFork} -> ok end;
{a_fork, RFork} ->
receive {a_fork, LFork} -> ok end
end.
eat(Name),
LFork ! {putdown, self()},
RFork ! {putdown, self()},
Room ! {leave, self()},
philosopher(Name, Room, LFork, RFork).
fork_setup() ->
receive
{left, Left} ->
receive
{right, Right} ->
fork(Left, Right)
end
end.
new_philosopher(N, Room, Left, Right) ->
Phil = spawn_link(fun () -> philosopher(N, Room, Left, Right) end),
Left ! {right, Phil},
Right ! {left, Phil}.
start(N) ->
spawn(fun () ->
Room = spawn_link(fun () -> room(0, N - 1) end),
Fork = spawn_link(fun fork_setup/0),
setup(N, Room, Fork, Fork)
end).
setup(N, Room, Left, Last) when N > 1 ->
Right = spawn_link(fun fork_setup/0),
new_philosopher(N, Room, Left, Right),
setup(N - 1, Room, Right, Last);
setup(1, Room, Left, Right) ->
new_philosopher(1, Room, Left, Right),
receive
_ -> exit({done})
end.
(Spawn_link/link: When a process exits, its linked processes receive a
signal. When an internode connection goes does, linked processes
receive an exit signal with reason 'noconnection'. Exit/2 can fake an
exit.)
SCHEDULING
Fairness:
- processes can be waiting, runnable, or running.
- processes blocked on receive are waiting; others are runnable, and
one is running.
- small time slices; when running process runs out of slice, marked
runnable
- goal is to schedule all runnable processes in the order they became
runnable, subject to priority constraints
Priorities:
- priorities: high, normal, low.
- high processes are scheduled fairly, in preference to others.
- normal (or low) processes are scheduled fairly if there are only
normal (or low) processes runnable.
- when mix of normal/low, normal get scheduled /normaladvantage/(=8)
times as much as low.
CULTURE & CRITIQUE
Recommended programming techniques
Many are standard (non-Erlang-specific):
- Use libraries
- Reduce interdependency
- Don't optimize
A few are interesting!
Abstract out concurrency.
- Structure the system so 95% client code and 5% concurrency code written
by experts.
- A few patterns cover a whole lot of stuff:
- client/server
- worker/supervisor
- event-handler
- upgrade-handler
- keep alive
- hot/standby
- The OTP library provides "behaviours", essentially higher-order
implementations of patterns.
- E.g. gen_server (we could use it to write our bank server)
handle_call({deposit, Amount}, _, Balance) ->
{reply, {deposit_receipt, Amount}, Balance + Amount};
handle_call({balance}, _, Balance) ->
{reply, {balance, Balance}, Balance};
handle_call({withdrawal, Amount}, _, Balance) when Amount > Balance ->
{reply, {overdraft}, Balance);
handle_call({withdrawal, Amount}, _, Balance) ->
{reply, {withdrawal_receipt, Amount}, Balance - Amount).
- In AXD301, behavior instances:
- gen_server: 122
- gen_event: 36
- supervisor: 20
- gen_fsm: 10
- In BRM/ASA:
- gen_server: 56
- gen_event: 9
- supervisor: 19
- gen_fsm: 1
- More generally, considering dirty code (code with side effects):
- AXD301: 776/1472 dirty modules
- : 4090/57412 dirty functions
- BRM/ASA: 132/253 dirty modules
- : 610/6876 dirty functions
- Observation: dirtiness isn't well partitioned
Error handling:
- Separate error cases from normal cases -- generally good advice
- In Erlang: handle errors in a separate process
- Processes should have only one purpose
- Identify the error kernel: small portion of your code that you need
to be bug free.
- Armstrong's expectation: structure processes in a worker/supervisor
tree.
- Reality: cascading restarts don't work -- supervision hierarchies are
really flat.
Processes and messages:
- Assign exactly one process to each concurrent activity in the system
- Reality: too expensive for AXD301 (120 calls/second * 6 processes) +
100k monitoring processes
- Flush unknown messages (or you have space problems)
- Can processes leak?
- Encapsulate message passing into functions
- Reality: then you can't parallelize RPCs
How well does this work?
- No one knows -- only anecdotes, no evidence
- Logs weren't design to keep track of non-events (e.g. successful
upgrades)
- There are very few events
Pi-calculus?
REFERENCES
Joe Armstrong. Making reliable systems in the presence of software
errors. Ph.D. thesis.
----. Concurrency oriented programming in Erlang. Invited talk at the
Lightweight Languages Workshop 2002 (LL2).
Thomas Noll and Chanchal Kumar Roy. Modeling Erlang in the Pi-Calculus.
Erlang Workshop '05.
Erlang specification 4.7.
Erlang Reference Manual and implementation.
Programming rules and conventions.
http://www.erlang.se/doc/programming_rules.shtml.
Home