Desgrange.net

Aller au contenu | Aller au menu | Aller à la recherche

lundi, février 22 2010

Login/Password autocomplete

A feature I really like in most web browsers, is the ability they have to “remember” my login and password for a given website. And I like the way Firefox does that, by displaying a non-intrusive notification bar at the top of the page while loading the page:

Save credentials

(I don’t like the way Safari does, by putting a modal window. So I have to already answer if I want to save my credentials before being sure that I entered the right ones)

I really like this feature for several reasons. I think it’s more secure than having to type my password. Some people may disagree, but to me, having that feature allows me to set a different password on every website I have an account on. Without that feature I will use the same password everywhere which is a very bad idea. If I use the same password everywhere, if somebody finds out what my password is, he will have access to all my accounts. And this is way much more easier than you may think:

  • lots of websites does not do the login procedure on a secure connection so intercepting data is not that difficult, especially with non-secure WiFi / free WiFi access points
  • lots of websites/companies store your password in plain text in their databases, so mostly anybody working in that company at some point in time may see your password (and I’m not kidding, I’ve seen that myself several times (If I was a bad guy, which I’m not by the way ;-), I would already be the owner of thousands of emails/logins/passwords))

An other reason why I think using the web browser’s password manager is more secure is because if at some point a malware installs a key-logger on your computer (which is not unusual on Windows computers), each time you type your password it’s a chance more for the key-logger to record it.

And of course, web browsers save your passwords in a crypted file (and not in a plain text file as some people do, which is also not really secure). To me, the biggest downside of this feature is that I can’t log in a lot of websites if I’m not using my computer because I don’t remember my passwords.

So there is something I really don’t like when surfing the web, is when I use websites where, for some reason, Firefox/Safari does not ask me to remember my password. Until recently I didn’t checked why, I was supposing that the login form was done in a way that web browsers did not recognized it as a login form (maybe because of an intensive use of javascript). But it looks like I was wrong on that, and that there are people stupid enough to call “feature” the ability of a website to prevent your web browsers to store your credentials.

From what I have seen so far, several web browsers disable the auto completion/password manager when the attribute autocomplete="off" is set on a form or input field. First of all: this attribute IS NOT STANDARD. It’s not part of any HTML/XHTML specification. It seems it was invented by Microsoft for Internet Explorer a long time ago (why bad ideas always come from the same guys? ;-)).

There is a page on Mozilla’s developers website explaining how works the autocompletion and how to turn it off, and the page on autocompletion attribute on MSDN website.

The second point is: ok, Internet Explorer has this stupid feature, why other web browsers have also implemented it? The final decision has to be done by the user, not some manager of a website who thinks that he knows what you want better than you.

The only point I see where it may be useful, is that it also works for forms other than login forms. For login forms your web browser always ask you if you want it to store your login and passwords in a secure place. For other forms, the web browser remembers everything, in a place that may not be secure, and without asking you anything, which might be quite bad when filling a payment form with your credit card number. At this point, what would be useful, is a way to say to the web browsers that some data in the form are sensitive information (so the web browser may ask you if it should remember those data, and in that case put them in a secure place).

If you have followed until here, my point is: the autocomplete attribute sucks, it does not solve any problem and annoys me.

How to make those broken websites behave correctly again?

Several possibilities:

  • use a web browser that does not understand the autocomplete attribute (I don’t know which ones)
  • if you are using an open source web browser that supports that attribute, remove the support from the sources, compile, enjoy (that’s one of the freedoms of open source)
  • if you use Firefox, use Greasemonkey

First time I heard about Greasemonkey was several years ago, but for some reasons my neurons did not connect together at that time and I did not realized the power of this Firefox plugin, until I saw Paul’s demo at FOSDEM. Since then, I love that plugin. Simply said, this plugin allows you to fix websites :-). First thing I did after installing it was to fix my bank website, which was forbidding me to go straight to the login page and was also forcing me to open the login page in an other tab/window. Greasemonkey allowed me to fix that with one line of code (really only one line, and a simple one in that case). Greasemonkey also has a lot of user contributed scripts for several websites (from that I found one fixing the download links on Jamendo (in order to download directly the OGG Vorbis version of an album (which is not possible from the website) and without opening a stupid download window)).

Something I love, on the scripts website, is the following sentence, at the bottom of the website: “Because it’s your web”.

How to fix the autocomplete attribute with Greasemonkey? My first try, was with my company’s Outlook Web Access (yes, unfortunately there are some people/companies actually paying for that), and guess what? Somebody already did a script for that: Allow Browser To Save Outlook Web Access Password.

So I was wondering: “do I have to do a script on every website that use autocomplete="off"”? I ended up, a few minutes after, with that script (note: I don’t know javascript at all, any comments to improve this script is welcomed):

// ==UserScript==
// @name           Turn ON autocompletion
// @namespace      http://desgrange.net
// @include        *
// ==/UserScript==
(function() {
	function turnAutocompleteOn(element) {
		if(element.hasAttribute('autocomplete')) {
			element.setAttribute('autocomplete', 'on');
		}
	}

	for(formKey in document.forms) {
		turnAutocompleteOn(document.forms[formKey]);
	}

	var inputs = document.getElementsByTagName('input');
	for(var i=0; i<inputs.length; i++) {
		turnAutocompleteOn(inputs.item(i));
	}
})();

It’s a bit brutal, on every pages you visit, it looks for all forms and all input tags having the autocomplete attribute and set it to “on”. I don’t know how often this autocomplete attribute is used, so I don’t know yet the side effects of doing that on every pages (that’s why I have not put this script on http://userscripts.org yet).

lundi, février 15 2010

Logging in Weblogic console with Log4J

If you have developed a JEE web application using Log4J for logging and have it deployed on a WebLogic application server, you may wonder how to display the logs in WebLogic console:

WebLogic Console

Preparation

You simply need to create and add a Log4J appender. This appender will redirect Log4J events to WebLogic by using the NonCatalogLogger class. You can found this class in wls-api.jar or wlclient.jar (depending on your WebLogic version) from your WebLogic’s lib directory. For instance if you are using maven, you need to add one the following dependencies in your maven’s pom.xml (enter the version corresponding to your WebLogic installation):

    <dependency>
        <groupId>weblogic</groupId>
        <artifactId>wlclient</artifactId>
        <version>10.3</version>
        <scope>provided</scope>
    </dependency>

or

    <dependency>
        <groupId>weblogic</groupId>
        <artifactId>wls-api</artifactId>
        <version>10.0</version>
        <scope>provided</scope>
    </dependency>

Obviously WebLogic JARs are not in official Maven repositories (due to license/distribution restrictions, proprietary softwares always here to hassle you). So type the following command in your shell to add the API in your local maven repository:

mvn install:install-file -DgroupId=weblogic -DartifactId=wlclient -Dversion=10.3 -Dpackaging=jar -Dfile=wlclient.jar

or

mvn install:install-file -DgroupId=weblogic -DartifactId=wls-api -Dversion=10.0 -Dpackaging=jar -Dfile=wls-api.jar
Creating the appender

The appender needs to implement Log4J’s Appender interface, but it’s more convenient to extends AppenderSkeleton. WebLogic’s NonCatalogLogger class has some “debug”, “info”… methods like Log4J so the appender is just going to map one to the other.

Since you may deploy your application on something else than WebLogic (for instance I usually use Tomcat and/or Jetty for development/testing) you don’t want have it crashing your application because WebLogic classes are not here. The appender can check if the class is in the classpath (using Class.forName()) and do nothing if the NonCatalogLogger is not here.

In WebLogic’s console, there is a “Subsystem” column, we can set it in the appender to display the application name.

package sample.project;

import org.apache.log4j.AppenderSkeleton;
import org.apache.log4j.Level;
import org.apache.log4j.spi.LoggingEvent;

import weblogic.logging.NonCatalogLogger;

public class WeblogicAppender extends AppenderSkeleton {
    private static final String SUBSYSTEM = "SampleProject";
    private NonCatalogLogger logger;

    public WeblogicAppender() {
        try {
            Class.forName("weblogic.logging.NonCatalogLogger");
            logger = new NonCatalogLogger(SUBSYSTEM);
        } catch (ClassNotFoundException e) {
            // Not running on WebLogic server.
        }
    }

    @Override
    protected void append(LoggingEvent event) {
        if (logger == null) {
            return;
        }
        if (Level.TRACE.equals(event.getLevel())) {
            logger.trace(getMessage(event), getThrowable(event));
        } else if (Level.DEBUG.equals(event.getLevel())) {
            logger.debug(getMessage(event), getThrowable(event));
        } else if (Level.INFO.equals(event.getLevel())) {
            logger.info(getMessage(event), getThrowable(event));
        } else if (Level.WARN.equals(event.getLevel())) {
            logger.warning(getMessage(event), getThrowable(event));
        } else if (Level.ERROR.equals(event.getLevel())) {
            logger.error(getMessage(event), getThrowable(event));
        } else if (Level.FATAL.equals(event.getLevel())) {
            logger.critical(getMessage(event), getThrowable(event));
        }
    }

    @Override
    public void close() {
        // Nothing to do here.
    }

    @Override
    public boolean requiresLayout() {
        return false;
    }

    private String getMessage(LoggingEvent event) {
        return String.valueOf(event.getMessage());
    }

    private Throwable getThrowable(LoggingEvent event) {
        if (event.getThrowableInformation() != null) {
            return event.getThrowableInformation().getThrowable();
        } else {
            return null;
        }
    }
}
Log4J configuration

In your log4j.xml just define a new appender using the above class and add it to the root logger. Here is an example:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
	<appender name="console" class="org.apache.log4j.ConsoleAppender">
		<param name="Target" value="System.out" />
		<layout class="org.apache.log4j.PatternLayout">
			<param name="ConversionPattern" value="%-5p %c{1} - %m%n" />
		</layout>
	</appender>

	<appender name="weblogic" class="sample.project.WeblogicAppender" />

	<root>
		<priority value="debug" />
		<appender-ref ref="console" />
		<appender-ref ref="weblogic" />
	</root>
</log4j:configuration>
Check

I have created a simple servlet logging some messages in debug, info and warn levels, let’s call it and see what happens:

WebLogic Logs

It works! (OK, I was not going to write all that just to show something that does not work ;-)). But be careful, as you can see, the debug message is not displayed even after setting Log4J’s level to debug. My WebLogic console is configured to display higher level messages. Don’t forget to check that it’s logging at the right level for you.

Notes

All source code in this post is in public domain, do what ever you want with it. I did it quickly so it may not work as you want, you are strongly advised to adapt it to your needs. I’m not mastering WebLogic at all (in fact I don’t really like that application server) so it may not be the best way to do it (and usually you should not need to do such things), and it is not at all certified in any way to be “production ready”.

lundi, février 8 2010

Post FOSDEM 2010

So last week-end I was at FOSDEM. First, it’s quite huge. Several thousand geeks in one location (if somebody hates open source, it’s the place to drop a bomb, lots of projects might also die afterward (I’m just saying that it’s not so cool on a risk management point of view, I’m not encouraging anybody to do such a bad thing ;-))). A lot of smart people, good ideas, interesting stuff to see, to hear and free WiFi everywhere (such thing would be illegal in France nowadays :-\).

As usual with conferences, goodies review. What do we get: - conference program (on paper) - a bag (made from biodegradable material)

That’s all good to me. Quite eco-friendy and nothing is unnecessary.

Here is the list of sessions I attended:

Welcome (FOSDEM Staff)

Quick history of the FOSDEM and of course the FOSDEM dance.

Promoting Open Source Methods at a Large Company (Brooks Davis)

Brooks Davis told us how they managed to bring some of the open source way of working in a big aerospace company. I find incredible that in a company working for aerospace (and even any company doing software) some developers are still not using any version control system.

Evil on the Internet (Richard Clayton)

Quick presentation of what “bad people” are doing on the internet and how it works, with live examples of phishing/fake banks/fake escrow websites.

Visit the AA419 website for more information.

Mozilla Europe/Mozilla Foundation (Tristan Nitot/Gervase Markham)

Some info on current status and future stuff at Mozilla. Some discussion about the ballot screen.

Personal note: the ballot screen will appear for every Windows XP/Vista/7 users how do not have installed any other web browser. This is a decision of the European Commission imposed to Microsoft. But what about people working at the European Commission? Are they going to see the ballot screen on their computers? Obviously, like in any company managing their computers, this is going to be blocked in order to keep the “homogeneity” and ease of system administration. Guess what? I am working at the EC (as external contractor). Since I’m a developer, I can install whatever I want^H^H^H^Hneed on my work computer so I don’t have the problem (and in fact since I’m doing a bit of web development I’ve all major web browsers installed). Anyway, I will see if my colleagues get some choice for their web browser.

FLOSS: a key to self-determination in Internet life (Mitchell Baker)

OK, I can’t really summarize but it was interesting. Free and open source software have values, freedom related ones (at least). To some extent we can see those values in how the internet has been built and we need to be sure that those values are still going to drive the future of the internet and even take a more predominant place.

Hackability (Tristan Nitot/Paul Rouget)

Do you want the internet to be a place only for for-profit companies to sell you their products? I hope not (if you do, what the hell are you doing here?). An important thing that will prevent that is to be sure that the internet is hackable. That mean we can do what we want with it, event if it was not designed for.

I would like to give an example of a hackable product by design: a Lego box. When you buy a Lego box, it’s shipped with a manual with one or two (sometimes more) patterns to build what Lego thinks you might want to build with. But obviously, it’s for fun, and Lego does not forbid you to do anything else with it, on the contrary, they encourage you to do stuff they didn’t think you could do with it… and it’s quite normal since Lego bricks are done to build whatever you want.

On the internet it’s quite the same. You have bricks. Different kind of bricks, versatile ones (bits and bytes) on top of which people have created more complex bricks (HTML, HTTP, SMTP, IMAP, XMPP, XML, CSS, JavaScript…) allowing you to do any kind of things. But there are some stuff that are not following that concept. Take Flash for instance, here you have the Logo box already mounted and you can’t unmount it, you can play a bit with it but not that much.

Paul did a demo showing that the web is hackable (changing the UI of a website and with the help of Firefox/Greasemonkey change how to interact with the website), that Firefox is hackable (switching from a tab to an other by shaking his wiimote!).

HTML 5 (Paul Rouget)

The “theorical” part of the presentation was done by someone else but I don’t have his name (sorry). Anyway, since most of the stuff I developed so far were web applications, I was quite interested in this presentation (and of course because I have been too lazy to check by myself what’s new in HTML 5).

HTML 5 syntax, very pragmatic. HTML has been slaughtered on so many web pages that web browsers are now very good at understanding the understandable. So of course, instead of imposing a drastic syntax (like XML based stuff requires usually) that nobody is going to apply, HTML 5 is quite “user friendly” (in the way that you can type whatever you want, it’s going to work (uppercase, lowercase, it doesn’t care, you don’t close your tags? not a problem…)). I think web browsers (except IE of course ;-)) are the perfect example of “be strict in what you send, but generous in what you receive”.

Anyway, lots of new tags like header, footer, aside, of course video, canvas

Paul did an amazing demo with a “simple webpage” turning out to be an interactive presentation with CSS transitions, video playing, 2D transformations, 3D ones… impressive.

Amarok 2.2 Rocking (Sven Krohlas)

I was an Amarok user for a long time but since I switched to the Mac it’s not the case anymore (even though Amarok runs on Mac). Anyway, the moodbar is back!

I haven’t played a lot with Amarok 2.x, but I don’t feel very comfortable with the UI. In 2.2 it’s a bit better. Maybe a part of the problem is that I don’t like KDE’s default theme.

It was a conference on free (as in free speech) softwares, but there are not only softwares that are free, there is also music. Go to Jamendo and listen/download a bit of music, you might discover good music under Creative Commons licenses (I recommend: Diablo Swing Orchestra and David TMX)

NoSQL for Fun & Profit (Tim Anglade)

A quick overview of what is NoSQL, no technical details, more a presentation for managers. Anyway, like lots of people I have suffered of SQL. For several reasons, first, it’s hard to find a project where a relational database is not badly used, a RDBMS can be very good at what it does (like PostgreSQL), it still needs to be used correctly, and secondly, because it was almost the only way “managers” did know about storing data. Who have never seen that kind of situation:

The manager: “On our new software we are going to use this programming language and that relational database.”

The developer: “I can understand that we need a programming language since we are going to write a software, but we don’t need a relational database for it.”

The manager: “Of course we need a relational database, every software use a relational database.”

The developer: “Well… no.”

The manager: “I’m the one deciding, you are only the mindless developer coding the stuff I ask so shut up.” (OK, maybe not that part)

Well anyway, NoSQL is a good idea to make sure that people know that we have choices on how we store data and that there are some ways better for some kind of tasks and others ways better for other kind of tasks.

Mozmill (Henrik Skupin)

A quick presentation of Mozmill, a tool used to do automated functional tests on Mozilla products (Firefox, Thunderbird…). Each version of Firefox in fact 225 versions of Firefox (75 languages on 3 platforms) and all of them should/need to be tested. It looks like at Mozilla they are not really in the test driven mindset (yet), and they are lacking of tests. Wait… sorry, when I say tests, I always think “automated tests”, it’s inhuman to make a person run a test suite manually, unfortunately to many people are paid for that. From what I understood they have some manual test suites for Firefox and fortunately they are trying to automate them.

You can see the mozmill generated reports for Firefox here: http://brasstacks.mozilla.com/couchdb/mozmill/_design/reports/_list/summary/summary

Towards GNUstep GUI 1.0 (Fred Kiefer)

GNUstep has been in development for ages and there is still no 1.0 version. So the question was “do we need to do one and if yes, what needs to be in”. Obviously, the answer for the first part is “yes” (so it will attract more developers, *BSD and Linux distributions will update their packages…). The second part of the question was not really solved. One proposition was to name the version 10.2 and has complete support of Cocoa 10.2.

L20n (Axel Hecht)

I’m not a specialist of internationalization (i18n) and localization (l10n), I know some issues regarding that but quite frankly, I didn’t really understood the presentation. It’s a bit more clear after a look on the l20n wiki. Sounds interesting to me since I think that the current way of doing (key/value) sucks a lot as soon as you have some non ultra-trivial stuff to do.

Étoilé: Where it is, where it’s going, why it isn’t there yet (Quentin Mathé/David Chisnall)

What have they done since the beginning in 2004? This is a project with few people but lots of ideas. One thing I find interesting is the CoreObject framework. Well in fact not the framework, but the ideas behind. From a user point of view, having to save your documents sucks. Why the default state is “in case of problem you are going to lose all your unsaved work” and not “in case of problem all your work is saved”? So here the idea is everything you change on your document is recorded, so you can do/undo/redo modification, close your document, open it again, ask to undo stuff you have done before… the history of your changes on the document have been saved all along.

Such ideas are not new, we have been talking about that for decades (well, not me, I’m talking about it only for years, I’m not that old ;-)), but mainstream operating systems are still not implementing it.

Women and Mozilla (Delphine Lebédel)

Quick presentation of WoMoz.

Nepomuk (Sebastian Trüg)

Recent operating systems are now indexing datas so it’s fast and easy to search for stuff on your computer. Nepomuk is a “semantic” way of doing so (using RDF and so on).

Several functionalities are similar between Nepomuk and what I think Étoilé’s CoreObject do. But Nepomuk is based on “standards” like RDF and SPARQL.

Mozilla Panel Discussion (Mitchell Baker/Tristan Nitot/Mark Surman)

A discussion on Mozilla’s mission. Lots of questions about privacy. I confirm, Mozilla’s people have the right mindset (at least the mindset I like) and I’m glad that they are caring about the Internet.

Write and Submit your first Linux kernel Patch (Greg Kroah-Hartman)

A live example on what you need to do and how to do a patch for the Linux kernel.

That’s all

There are several presentations I would like to went to but we still have not invented a device giving us ubiquity.

I now have a lot more thinks to thing about, I may write down some of my thoughts here soon.

Anyway, a big thank you to the FOSDEM staff for organizing all that, to all the speakers and finally to all the people attending the event.

mercredi, février 3 2010

FOSDEM 2010

FOSDEM (Free and Open Source Software Developers’ European Meeting) 2010 is happening this week-end (6-7 february) in… Brussels! At ULB, about 10 minutes by foot from home. So guess what? FOSDEM 2010

I never had the opportunity to assist FOSDEM before, so this time I’m not going to miss it.

I haven’t look at the planning seriously yet, there is a huge amount of stuff going on there, it’s going hard to make choices. At least I have seen that Mozilla is presenting some stuff, I hope I will be able to see Tristan at last (though I’m not sure it’s really fulfilling to listen to someone I always agree with (well, I’ve been reading his blog for several years now and I don’t really remember not agreeing on something)).

This reminds me I should find a way to start sharing with the community. I have been using open source softwares for years, on my day to day work, what frustrates me the most is each time I’m struggling with proprietary softwares (which I tend to avoid) while I know that the same problem with an open source software would have been solved much more easier (because of the help of the community and the availability of the source code).

So, are you coming?

lundi, octobre 12 2009

CITCON Paris 2009: Mock objects

Interfaces

During the session on mock objects there was a digression about interfaces. I have seen too often interfaces in a way that I don’t like. I will use the same example as Eric:

Let’s say that you have a FileManager, providing some services to manage files I suppose ;-), you may have an interface called IFileManager. And usually there is only one implementation of IFileManager which is FileManager.

I think this is wrong for at least two reasons:

  • Usefulness. If there is only one implementation, why do you need an interface?
  • Naming. The interface name should represent the “role”, so FileManager is suitable for the interface name, IFileManager has no meaning. Then the implementation should reflect what kind of implementation you have, like LocalFileManager, DistributedFileManager or a DummyFileManager for your tests (but not an ugly FileManagerImpl).

So usually, when I see a software with that kind one 1 to 1 relationship between interface and implementation and using bad names, it raises a warning light in my head, telling me that the person who wrote that code did not really now what he was doing (only applying some old and bad coding rules without trying to understand why it was useful for). As Antonio says, prefix ‘I’ for interface and suffix ‘Impl’ for implementations are signs of code smell.

I even have seen some interfaces with only one or two methods, the implementation had a lot more methods… and the concrete class was directly used in other classes… so yes, very useful interface :-.

Sometimes, when writing tests, I need to mock some classes that I haven’t defined any interface for… and since several mock libraries are able to mock concrete classes I still not extract any interface.

I like simple classes, with simple roles, so mostly all public methods (except constructor and setters) are the “implied” interface.

So my point on interfaces is “use an interface only when you really need it” (that reminds me YAGNI):

  • when you need several implementations of a given “role”,
  • when defining some “ability” (sorry I don’t find the right term) like Clonable, Closable, Comprable, Serializable, Anything-able (if you can add “able” at the end, it’s a good sign that you might be able to extract an interface for that ;-)).
Mock objects

So yes, we also spoke about mock objects. Steve Freeman was trying to explain us some stuff, I have the feeling that there was something in his speech that was enlightening but I didn’t really get it (that’s why it’s only a feeling for the moment).

What I remember is that, when writing tests:

  • mock the collaborating classes that change the outside world,
  • use stubs, dummy implementations, etc. otherwise.

I don’t fully understand the reason yet. But something I learn recently and that was says during the session: mock only the code you own, don’t mock external resources.

So for instance, if you have a Customer object, a table full of customers in your database, don’t try to mock JDBC classes like Connection, ResultSet and so on. Create a class accessing the data, let say CustomerDAO (I don’t like the name, but hey, it’s only an example), and then you can mock your CustomerDAO in your software.

I imagine that CustomerDAO will then be tested in integration tests (it’s a class using external software/server/stuff right? Can’t really unit test it (except maybe some data storage specific logic I may have to write in it)).

Anyway, it was an interesting session.

Misc

Books recommended during the session:

Frameworks:

  • jMock (the framework I usually use)
  • EasyMock
  • Mockito (more recent, I started using it a bit at work a week ago, looks quite nice)

lundi, janvier 19 2009

Do developers need to work on fast or slow computers?

From time to time I see this question happening. When a developer is writing a software, does he need a fast or a slow computer?

Why working on a slow computer?

  • If a developer works on a slow computer, he tends to be careful about how fast is running the code he is writing. So the produced software is fast and is more testable on the developers computer.

Why working on a fast computer?

  • The developer can open all resources he needs without having to care about closing some softwares because the computer does not handle the load (I remember a colleague having 2 internet explorer opened, 3 firefox with 15 tabs opened in each of them, 3 intellij, a webserver and a database running all together on his computer).
  • A slow computer may not be the same as having the product working under heavy load.
  • The tools the developer is using may not be "optimized" and are running slowly on slow computer or computer without much memory. In general, everything is slow and increase the waiting time of the developer.
  • In case of failure/delay the developer can't use the argument of having a slow computer ;-).

When writing a software, what's important for a lot of developers is the time for compile/run/test cycle. Shorter is it, faster the developer can see the result of its work and have feedback on it. On a slow computer, this cycle is longer, the developer tends to write more code before seeing if it works, in case of test failure, the modifications may be too important to easily find out what's wrong, leading to a big loss of time.

Depending on the technologies you are working on, tools may exist to distribute some computation like compiling on computers available on your network (hoping that other developers are not compiling at the same time too). But compiling may not be the longest task. I work on a project where compile time was mostly nonexistent (everything was compiled on the fly), running 700 unit tests took around 5 seconds, running 200 functional tests took around 5 minutes. We didn't have the fastest computers but they where fast enough. Here the functional tests were taking most of the time, I think writing them so they could be distributed on other available computers would have been a bit more complicated and may have taken too much effort (the environment was not designed to be distributed). And distributing the functional tests would have completely remove the point of seeing how well worked the system under load.

So to me, giving a powerful computer to developers is not a big cost and may make their job easier. If you need to check how your software behave under load or in restricted environment (slow CPU, small memory), make appropriate tests (if this is a requirement, tests must have been written to check it).

I think the real problem about writing a fat and slow software is because having a small fast one is quite often not a requirement (you are still doing the same things with the last version of Microsoft Word on Windows Vista than what you were doing with your old Word on Windows 95, you just need a far more powerful computer now). Most of us works for companies where the goal is selling products/licensees and they need more and more features to be sold. Selling a new version having only "faster and smaller footprint" is not enough for the marketing guys, even more, it's recognizing that the previous version were fat and slow (I know it's dumb, but that's what I have seen sometimes (depending in what kind of industries you are working in)).

lundi, janvier 5 2009

Exceptions in Erlang

Exception

In a programming language, an exception is something that could be generated when the system is behaving outside the normal execution path. An exception is mostly an error. In lots of programming languages, developers use exceptions as a meaningful information to do something or not.

For example, while reading a file, an exception can be generated because the file does not exists. The developer may choose to catch the exception and display a popup to ask the user to choose an other file, or the developer may not catch the exception because the file must have been there and if it's not there it's because something wrong is happening but the developer has no clue about what to do, so the best solution is to let the system crash (as opposed to try to do something and maybe enter in an inconsistent state).

Exceptions in Erlang

In Erlang there are exceptions too.

-module(exceptions).
-compile([export_all]).

run() ->
    io:fwrite("Test exception 1 starting...~n"),
    exception1(),
    io:fwrite("Test exception 1 finished.~n"),
    ok.

exception1() ->
  erlang:foo().

Running the function run will run exception1 which throws an exception (the function foo does not exist in module erlang) and is not catched by run (so the second fwrite is not displayed):

$ erl -s exceptions run
Erlang (BEAM) emulator version 5.6.2 [source] [smp:2] [async-threads:0] [kernel-poll:false]

Test exception 1 starting...
{"init terminating in do_boot",{undef,[{erlang,foo,[]},{exceptions,run,0},{init,start_it,1},{init,start_em,1}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
Types of exceptions

In Erlang there are 3 kinds of exceptions that can be generated:

  • normal exceptions, user generated (throw(Reason))
  • errors, something is going really wrong, should not be catched (erlang:error(Reason))
  • exit, used to terminate current process (exit(Reason))
Catching an exception
  • Catching an exception with catch:
run() ->
    io:fwrite("Test exception 1 starting...~n"),
    Result = (catch erlang:foo()),
    io:fwrite("Test exception 1 finished: ~p~n", [Result]),
    ok.

Calling run displays:

Erlang (BEAM) emulator version 5.6.2 [source] [smp:2] [async-threads:0] [kernel-poll:false]

Test exception 1 starting...
Test exception 1 finished: {'EXIT',
                               {undef,
                                   [{erlang,foo,[]},
                                    {exceptions,run,0},
                                    {init,start_it,1},
                                    {init,start_em,1}]}}

…and the process is still alive.

  • Catching an exception with try … catch. The full syntax is something like this:
    try erlang:foo() of
	Any ->
	    Any
    catch
	error:Reason ->
	    io:fwrite("Error reason: ~p~n", [Reason]);
	throw:Reason ->
	    io:fwrite("Throw reason: ~p~n", [Reason]);
	exit:Reason ->
	    io:fwrite("Exit reason: ~p~n", [Reason])
    after
	io:fwrite("Doing some stuff no matter what happened.~n")
    end.

try executes the given function and return it's value (which can be pattern matched) if everything's OK, if an exception is generated it goes in the the matching catch clause. In any case, it then goes inside the after block (similar to finally in Java).

Running the previous code output:

Error reason: undef
Doing some stuff no matter what happened.

(Calling a function that does not exists throws an error)

As everything in Erlang, try … catch returns a value (the executed function, the return the matching clause or the return of the matching exception clause).

Examples

Let see more examples.

-module(exceptions).
-compile([export_all]).

run() ->
    run(1, no_exception, 'catch'),
    run(2, no_exception, 'try'),
    run(3, 'throw', 'catch'),
    run(4, 'throw', 'try'),
    run(5, 'exit', 'catch'),
    run(6, 'exit', 'try'),
    run(7, 'error', 'catch'),
    run(8, 'error', 'try'),
    ok.

run(ID, Exception_type, Handling_type) ->
    io:fwrite("~p) Generating ~p, handled with ~p.~n", [ID, Exception_type, Handling_type]),
    Fun = fun() -> exception(Exception_type) end,
    Result = execute(Handling_type, Fun),
    io:fwrite("~p) Result: ~p~n", [ID, Result]).

exception(no_exception) ->
    ok;
exception('throw') ->
    throw("Throwed exception");
exception('exit') ->
    exit("Exited");
exception('error') ->
    erlang:error("Error generated").

execute('catch', Fun) ->
    (catch Fun());
execute('try', Fun) ->
    try Fun()
    catch
	Error:Reason ->
	    {Error, Reason}
    end.

Output:

1) Generating no_exception, handled with 'catch'.
1) Result: ok
2) Generating no_exception, handled with 'try'.
2) Result: ok
3) Generating throw, handled with 'catch'.
3) Result: "Throwed exception"
4) Generating throw, handled with 'try'.
4) Result: {throw,"Throwed exception"}
5) Generating exit, handled with 'catch'.
5) Result: {'EXIT',"Exited"}
6) Generating exit, handled with 'try'.
6) Result: {exit,"Exited"}
7) Generating error, handled with 'catch'.
7) Result: {'EXIT',{"Error generated",
                    [{exceptions,exception,1},
                     {exceptions,execute,2},
                     {exceptions,run,3},
                     {exceptions,run,0},
                     {init,start_it,1},
                     {init,start_em,1}]}}
8) Generating error, handled with 'try'.
8) Result: {error,"Error generated"}

An interesting thing we can see here is that, in case of an error, catch get a stack trace which can be very useful for debugging but try … catch does not get it.

I prefer try … catch syntax (and it's the recommended way to catch exceptions because you can choose what kind of exceptions you want to catch, catch catches everything) but it's regrettable that it does not return the stack trace.

You can use erlang:get_stacktrace but it returns the stack trace from where you are calling it. If the exception is generated deep inside the function you are calling, get_stacktrace does not gives the root cause of the exception.

Having a stack trace is very useful but it make things a bit slower. I made a simple benchmark:

bench() ->
    Throw_fun1 = fun(_) -> (catch exception('throw')) end,
    Error_fun1 = fun(_) -> (catch exception('error')) end,
    Throw_fun2 = fun(_) -> try exception('throw') catch Error:Reason -> {Error, Reason} end end,
    Error_fun2 = fun(_) -> try exception('error') catch Error:Reason -> {Error, Reason} end end,
    Seq = lists:seq(1, 100000),
    timer:sleep(1000),
    {Time_throw1, _} = timer:tc(lists, foreach, [Throw_fun1, Seq]),
    {Time_error1, _} = timer:tc(lists, foreach, [Error_fun1, Seq]),
    {Time_throw2, _} = timer:tc(lists, foreach, [Throw_fun2, Seq]),
    {Time_error2, _} = timer:tc(lists, foreach, [Error_fun2, Seq]),
    io:fwrite("Throw (catch): ~p micro seconds~n", [Time_throw1]),
    io:fwrite("Error (catch): ~p micro seconds~n", [Time_error1]),
    io:fwrite("Throw (try): ~p micro seconds~n", [Time_throw2]),
    io:fwrite("Error (try): ~p micro seconds~n", [Time_error2]),
    ok.

Results (for 100 000 calls):

Throw (catch): 73920 micro seconds
Error (catch): 169576 micro seconds
Throw (try): 64118 micro seconds
Error (try): 63125 micro seconds

Throwing an error or an exception take the same amount of time but using catch on an error is 2.5 times slower than using try … catch (my quick conclusion on that is because catch generates a stack trace).

I would like having only one way of catching exceptions (no catch, only try … catch) and a way to specify if I want a stack trace or not in case of error. Maybe something like this:

try Fun()
catch
    Error:Reason:Stack ->
        {Error:Reason:Stack}
end.

If a catch clause is waiting for 3 elements (Error, Reason, Stack), the compiler add the necessary stuff to call the Fun with a stack trace. If there are only 2 elements (Error, Reason), keep the actual behavior.

Do we really need exceptions in Erlang?

Exceptions may be useful but we can achieve the same goal in Erlang in other ways. Lots of functions have a signature like: {ok, Value} | {error, Reason}.

Those functions have different outputs between the normal case and the exceptional case. Combined with case we get the same behavior as catching an exception. If we don't use case, we get a badmatch error.

case_way() ->
    Fun = fun(ok) -> {ok, "It works"};
	     (nok) -> {error, "It does not work"}
	  end,
    run_case(1, Fun, ok),
    run_case(2, Fun, nok),
    ok.

run_case(ID, Fun, Arg) ->
    case Fun(Arg) of
	{error, Reason} ->
	    io:fwrite("~p) Error: ~p~n", [ID, Reason]);
	{ok, Value} ->
	    io:fwrite("~p) Value: ~p~n", [ID, Value])
    end.

Running case_way:

1) Value: "It works"
2) Error: "It does not work"

Boons:

  • Way much faster than exceptions (around 4 times faster according to my quick bench)
  • No specific syntax

Banes:

  • Normal case needs some encapsulation like {ok, Value} instead or returning Value directly (not doing so may lead to unknown states if the return is not pattern matched for errors)
  • No convention (you can return {exit, Type, Reason} if you want, or anything you want, developer needs to read carefully documentation of each function before using it)

In Joe Armstrong's book, he says that usually people use {error, Reason} when an error occurs quite often and exceptions for less frequent errors. It makes sense, but I do not completely agree with Joe on that matter. When you are developing a software for yourself (or you firm), you may know how your code is used so you know if an error occurs often or not. But when you are developing software for other developers, you don't know how they are going to use it, it's more difficult to "predict" if the error is going to be thrown often or not.

So I tend to prefer throwing exceptions (maybe I have used to much of Java). I think the added syntax is necessary in order to keep thing clear, homogeneous and easy to use. I would love seeing a version of Erlang without functions having a {error, Reason} thing in their return signature (but something like throw(Reason)).

Complete source code associated with this post (you can do whatever you want with it): exceptions.erl

lundi, décembre 22 2008

Object Oriented Programming

Programming languages use different paradigms, nowadays the most used is the Object Oriented paradigm. Object Oriented Programming was an attempt to simplify the creation of softwares because they were more and more complex.

The idea was to model the software with objects (a car for example) with some attributes (it has wheels, a motor, etc.) and some methods (start, move forward, etc.). Objects alone are not enough, they need to be able to communicate with each other by sending messages.

I have worked for several years with Java, a well known OOP language, and I'm now working with Erlang, a functional language.

A functional language has functions as first class citizens. Functions should be stateless, they get some parameters and return a deterministic result. Seems not enough to design complex softwares.

An other feature of Erlang is the concurrent paradigm. It allows the language to do a lot of tasks in parallel. At first, most people think about this feature has a great tool to distribute calculation and/or use all the cores of a modern CPU with no burden. I thought the same thing. But this is not the aim of Erlang. Concurrent processes are used for software design. A process is a system, mostly representing a real life system, sending and receiving messages from other processes.

After using Erlang for a while (I'm not a very fast thinker, or maybe I just don't think too often ;-)), I realized that Erlang is the most Object Oriented Language I have ever seen.

It matches the definition of OOP more than any other language. Objects are processes (with an internal state) talking to each others by sending messages.

Why? Because Erlang allows you to have several thousands threads (objects) running simultaneously, where other languages allows you to have several thousand objects running in few concurrent threads (so if you have X threads, only X objects are running simultaneously). In Erlang objects are really independent from others, while in languages like C++, C# or Java, objects are executed sequentially.

Interestingly, if they share the same base idea, those languages need a completely different approach. One of the big differences is that talking between objects in Erlang is done by passing messages while it's done by methods calls in traditional OOP languages.

In those languages we do not deal very often with multiple threads and we tend to not have threads talking with each others because it needs locks and synchronization. Erlang deals with that very easily.

So in an OOP language like Java, when a thread dies, a lot of objects dies, maybe the whole software, that's may be not so bad (kind of fail fast approach). In Erlang, when a process dies, it's just an object, everything else is still running (when a car crash, the neighborhood does not disappear), but this can lead to some unknown state (imagine a firm where the boss dies and nobodies notice it (and are glad to not receive orders anymore ;-))). Erlang provides some tools to manage that. Processes can be linked together, when a process dies, it sends a death note (sorry ;-)) to processes it was linked to. When a process receives such a message, if it does not know how to handle it, it dies too (my wife is dead, I'm lost, I must commit suicide), but of course it may know what to do with the message, like restarting a process like the dead one (let's find a new wife). In Erlang's libraries there are such modules, called supervisors, specialized in re-spawning a process when it dies. It may looks like overhead, but I think it's very useful to create robust and fault-tolerant applications.

Anyway, I have the feeling that in the Erlang's community, there is a strong opinion against OOP. I think I will have understood how to design softwares in Erlang faster if people told me first that Erlang is true OOP and why and how. Moreover, it may bring more people to Erlang to develop the language and reduce the number of people asking for such and such OOP feature that are already in Erlang but not the way they are thinking about it.

Ralph Johnson has an interesting (and short enough) article about Erlang being the next Java.

Same thing was discussed on PlanetErlang.

lundi, décembre 1 2008

Erlang hot swapping

I started using the erlang programming language few months ago. Erlang is a wonderful language if you want to do reliable, distributed, highly concurrent, fault-tolerant, soft-real-time, highly available, hot swapping applications :-).

So, today I will give you a glimpse on one cool functionality of Erlang: the ability to update your code without stopping your application. Here it's just about how it works for a process (the principle is the same for an application except that you have much more things to take into account when you have several thousands process talking to each other).

THE CODE

We are going to use a simple piece of code. We want a server thread listening for messages (each message is printed on the console with a sequence number), and a client thread, sending a message to the server every second. Here is the code (I usually don't use comments but I put some here because you may not understand erlang code):

-module(code_reload). % This is the module declaration.

-export([start_server/0, start_client/1]). % Exporting functions allow them to be called from outside the module.
-export([server_loop/1, client_loop/1]).

start_server() ->
    spawn(?MODULE, server_loop, [0]). % This function calls the server_loop function in a new thread.

server_loop(Count) ->
    receive % Wait for messages
        {From, quit} ->
            io:fwrite("Received quit command from n", [From]),
            ok;
        {From, Message} ->
            io:fwrite("p received message p~n", [Count, self(), Message, From]), % Display the message we received.
            ?MODULE:server_loop(Count); % Call the same function again to wait for an other message.
    	_ ->
            throw(unexpected_message)
    end.

start_client(ServerPid) ->
    spawn(?MODULE, client_loop, [ServerPid]).

client_loop(ServerPid) ->
    receive
        {From, quit} ->
            io:fwrite("Received quit command from n", [From]),
            ok
    after 1000 -> % If no messages were received after 1 second, send a message to the server.
            ServerPid ! {self(), now()},
            ?MODULE:client_loop(ServerPid)
    end.

Let's start an erlang shell:

$ erl
Erlang (BEAM) emulator version 5.6.2 [source] [smp:2] [async-threads:0] [kernel-poll:false]

Eshell V5.6.2  (abort with ^G)
1> c(code_reload). % This compile the module code_reload.
{ok,code_reload}
2> ServerPid = code_reload:start_server(). % Start the server and assign the server process id to the variable ServerPid.
<0.37.0>
3> ClientPid = code_reload:start_client(ServerPid).
<0.39.0>
0: Server <0.65.0> received message {1211,11183,231032} from <0.67.0>
0: Server <0.65.0> received message {1211,11184,232031} from <0.67.0>
0: Server <0.65.0> received message {1211,11185,233029} from <0.67.0>
0: Server <0.65.0> received message {1211,11186,234031} from <0.67.0>
0: Server <0.65.0> received message {1211,11187,235030} from <0.67.0>

We can see the server printing the messages. But oups, there is a bug! I forgot to increment the counter so each line start with "0" instead of an incrementing number.

Let's change the following line in the code:

?MODULE:server_loop(Count + 1);

We need to go back to the shell and compile the new code:

4> c(code_reload).
{ok,code_reload}
0: Server <0.65.0> received message {1211,11188,236023} from <0.67.0>
0: Server <0.65.0> received message {1211,11189,237031} from <0.67.0>
1: Server <0.65.0> received message {1211,11190,238021} from <0.67.0>
2: Server <0.65.0> received message {1211,11191,239031} from <0.67.0>
3: Server <0.65.0> received message {1211,11192,240018} from <0.67.0>
4: Server <0.65.0> received message {1211,11193,241022} from <0.67.0>
5: Server <0.65.0> received message {1211,11194,242021} from <0.67.0>
6: Server <0.65.0> received message {1211,11195,243031} from <0.67.0>

Ah! It's better, the message number is growing as expected. Notice that the threads are still the same (<0.65.0> for the server and <0.67.0> for the client).

There is no black magic here. The main part of the code is a loop. The server function is server_loop and it calls itself with ?MODULE:server_loop(…). The process is running inside the old version of the code, and when we call server_loop again, the new call is send to the new version of the code.

Be careful, this happens only because we call the function that way ?MODULE:server_loop (code_reload:server_loop also works). Hot upgrade does not work if we call server_loop directly with going through the module again. This allows you to control where and when code reloading is done.

Now we can send a quit message to the client and the server:

5> ClientPid ! {self(), quit}.
Received quit command from <0.30.0>
{<0.30.0>,quit}
6> ServerPid ! {self(), quit}.                     
Received quit command from <0.76.0>
{<0.76.0>,quit}