Erlang hot swapping

I started using the erlang programming language few months ago. Erlang is a wonderful language if you want to do reliable, distributed, highly concurrent, fault-tolerant, soft real-time, highly available, hot swapping applications :-).

So, today I will give you a glimpse on one cool functionality of Erlang: the ability to update your code without stopping your application. Here it's just about how it works for a process (the principle is the same for an application except that you have much more things to take into account when you have several thousands process talking to each other).

The code

We are going to use a simple piece of code. We want a server thread listening for messages (each message is printed on the console with a sequence number), and a client thread, sending a message to the server every second. Here is the code (I usually don't use comments but I put some here because you may not understand erlang code):

code_reload.erl

-module(code_reload). % This is the module declaration.
-export([start_server/0, start_client/1]). % Exporting functions allow them to be called from outside the module.
-export([server_loop/1, client_loop/1]).

start_server() ->
    spawn(?MODULE, server_loop, [0]). % This function calls the server_loop function in a new thread.

server_loop(Count) ->
    receive % Wait for messages
        {From, quit} ->
            io:fwrite("Received quit command from ~p~n", [From]),
            ok;
        {From, Message} ->
            io:fwrite("~p: Server ~p received message ~p from ~p~n", [Count, self(), Message, From]), % Display the message we received.
            ?MODULE:server_loop(Count); % Call the same function again to wait for an other message.
        _ ->
            throw(unexpected_message)
    end.

start_client(ServerPid) ->
    spawn(?MODULE, client_loop, [ServerPid]).

client_loop(ServerPid) ->
    receive
        {From, quit} ->
            io:fwrite("Received quit command from ~p~n", [From]),
            ok
    after 1000 -> % If no messages were received after 1 second, send a message to the server.
            ServerPid ! {self(), now()},
            ?MODULE:client_loop(ServerPid)
    end.

Let's start an erlang shell:

$ erl
Erlang (BEAM) emulator version 5.6.2 [source] [smp:2] [async-threads:0] [kernel-poll:false]

Eshell V5.6.2  (abort with ^G)
1> c(code_reload). % This compile the module code_reload.
{ok,code_reload}
2> ServerPid = code_reload:start_server(). % Start the server and assign the server process id to the variable ServerPid.
<0.65.0>
3> ClientPid = code_reload:start_client(ServerPid).
<0.67.0>
0: Server <0.65.0> received message {1211,11183,231032} from <0.67.0>
0: Server <0.65.0> received message {1211,11184,232031} from <0.67.0>
0: Server <0.65.0> received message {1211,11185,233029} from <0.67.0>
0: Server <0.65.0> received message {1211,11186,234031} from <0.67.0>
0: Server <0.65.0> received message {1211,11187,235030} from <0.67.0>

We can see the server printing the messages. But ooops, there is a bug! I forgot to increment the counter so each line start with "0" instead of an incrementing number.

Let's change the following line in the code:

?MODULE:server_loop(Count + 1);

We need to go back to the shell and compile the new code:

4> c(code_reload).
{ok,code_reload}
0: Server <0.65.0> received message {1211,11188,236023} from <0.67.0>
0: Server <0.65.0> received message {1211,11189,237031} from <0.67.0>
1: Server <0.65.0> received message {1211,11190,238021} from <0.67.0>
2: Server <0.65.0> received message {1211,11191,239031} from <0.67.0>
3: Server <0.65.0> received message {1211,11192,240018} from <0.67.0>
4: Server <0.65.0> received message {1211,11193,241022} from <0.67.0>
5: Server <0.65.0> received message {1211,11194,242021} from <0.67.0>
6: Server <0.65.0> received message {1211,11195,243031} from <0.67.0>

Ah! That's better, the message number is growing as expected. Notice that the threads are still the same (<0.65.0> for the server and <0.67.0> for the client).

There is no black magic here. The main part of the code is a loop. The server function is server_loop and it calls itself with ?MODULE:server_loop(…). The process is running inside the old version of the code, and when we call ?MODULE:server_loop(…) again, the new call is sent to the new version of the code.

Be careful, this happens only because we call the function that way ?MODULE:server_loop(…) (code_reload:server_loop(…) also works). Hot upgrade does not work if we call server_loop(…) directly, in that case the same code version is used. This allows you to control where and when code reloading is done.

Now we can send a quit message to the client and the server:

5> ClientPid ! {self(), quit}.
Received quit command from <0.30.0>
{<0.30.0>,quit}
6> ServerPid ! {self(), quit}.
Received quit command from <0.76.0>
{<0.76.0>,quit}

Comments Add one by sending me an email.