Saturday, June 18, 2005

Atom vs RSS

For those who prefer blogs publish in RSS rather than Atom, you can always use Feedburner to convert them.

For instance, my blog is available in RSS at:
http://feeds.feedburner.com/DavidKempsBlog

Beware of assuming ASCII encodings

Those of us who usually only read and write English tend to remain blissfully unaware of the problems associated with character encoding. I recently helped one my of ThoughtWorker colleges (Mike Williams) with a character encoding problem, and so I thought I would blog a little on the topic.

What Mike ran into was an XML document that included non-breaking spaces. The non-breaking space character is the (non-ASCII) Latin-1 character A0. The problem was that some system beyond our control was writing an XML file as Latin1 characters, but the XML document was missing the XML character encoding declaration.

What many of us do not realize is that, without a character encoding declaration, XML parsers will assume a UTF-8 encoding (unless it starts with a special "byte-order mark" for UTF-16) . Normally this is not a problem since UTF-8 maps the plain ASCII characters to themselves. However, all non-ASCII characters, including the extended Latin1 characters such as the non-breaking space and accented Latin characters like é, are mapped to two or more bytes per character. What's more, there are no valid UTF-8 character encodings that begin with the byte A0, and so the XML parser "blew-up" when it came across the A0.

For more information, some useful sites include: UTF-16 - Wikipedia and ISO 8859 Alphabet Soup

I also found the chapter on Internationalization in my old Java 1.1 edition of "Java in a Nutshell" a very good introduction to the topic. I have not looked, but I assume the later editions continue to be so..

Saturday, April 02, 2005

Futures and Eventual Values Part 3

Part 3 of my blogging on Futures and Eventual Values

In this part I discuss combining Futures with the Proxy Pattern. I will not discuss the differences between Futures and Eventual Values any further (unless provoked!), and indeed where ever I refer to Futures from now on applies equally to Eventual Values.

A big disadvantage with using futures as I have shown them so far is that the impact on the client is significant. Instead of returning a simple float, the getTemperature() method of my weather service is now returning a Future<Float>. This is unfortunate if I want the freedom to switch back and forth between the synchronous version and asynchronous version. It is also a bit ugly to be exposing the fact that we are dealing with futures deep down in the presentation logic (i.e. where I am printing the results).

One solution is to use the Proxy Pattern. Unfortunately, you can not create proxies for primitives like float. In Java, because Float does not implement an interface, you can not even create a proxy for a Float. Turning a problem into an opportunity, a good developer would say that getTemperature() should not be returning raw types anyway. Perhaps it should be returning a Temperature object with some useful methods (e.g. temperature conversion to/from celcius and fahrenheit).

Here is a suggestion for a Temperature interface:


public interface Temperature {
double getCelciusValue();
double getFahrenheitValue();
}


Similarly, our weather service should be implementing an interface:


public interface WeatherService {
Temperature getTemperature();
}


Now our client code looks like this (using a synchronous version of the weather service):


WeatherService melb = new RemoteWeatherService("Melbourne");
WeatherService sydney = new RemoteWeatherService("Sydney");
WeatherService brisbane = new RemoteWeatherService("Brisbane");
Temperature melbTemperature = melb.getTemperature();
Temperature sydneyTemperature = sydney.getTemperature();
Temperature brisbaneTemperature = brisbane.getTemperature();
System.out.println("City\t\tTemperature");
System.out.println("Melbourne\t" + melbTemperature.getCelciusValue());
System.out.println("Sydney\t\t" + sydneyTemperature.getCelciusValue());
System.out.println("Brisbane\t" + brisbaneTemperature.getCelciusValue());


It is now possible to develop an asynchronous version of the weather service that only requires the client to change the first three lines to use AsynchWeatherService instead of RemoteWeatherService to look like this:


// NOTE: Specifying a 10 second time out. This is discussed later.
WeatherService melb = new AsynchWeatherService("Melbourne", 10, TimeUnit.SECONDS);
WeatherService sydney = new AsynchWeatherService("Sydney", 10, TimeUnit.SECONDS);
WeatherService brisbane = new AsynchWeatherService("Brisbane", 10, TimeUnit.SECONDS);
Temperature melbTemperature = melb.getTemperature();
Temperature sydneyTemperature = sydney.getTemperature();
Temperature brisbaneTemperature = brisbane.getTemperature();
System.out.println("City\t\tTemperature");
System.out.println("Melbourne\t" + melbTemperature.getCelciusValue());
System.out.println("Sydney\t\t" + sydneyTemperature.getCelciusValue());
System.out.println("Brisbane\t" + brisbaneTemperature.getCelciusValue());


The key to this is that AsynchWeatherService is a proxy for RemoteWeatherService (both implementing WeatherService), and it returns a proxy for the Temperature object returned by RemoteWeatherService. The twist is that the Temperature proxy actually takes an object of type Future<Temperature>. First, let's look at AsynchWeatherService:


public class AsynchWeatherService implements WeatherService {
private RemoteWeatherService weatherStation;
private final long timeout;
private final TimeUnit timeoutUnit;

public AsynchWeatherService(String city, long timeout, TimeUnit timeoutUnit) {
this.weatherStation = new RemoteWeatherService(city);
this.timeout = timeout;
this.timeoutUnit = timeoutUnit;
}

public Temperature getTemperature() {
FutureTask<Temperature> result = new FutureTask<Temperature>(
new Callable<Temperature>() {
public Temperature call() {
return weatherStation.getTemperature();
}
});
new Thread(result).start();
return new FutureTemperature(result, timeout, timeoutUnit);
}

}


As before, the getTemperature() method creates a FutureTask object initialized with a Callable that simply invokes the getTemperature() method of the RemoteWeatherStation. It then hands the FutureTask to a Thread object and starts the new thread. Finally, it creates and returns a FutureTemperature object, where FutureTemperature is the Temperature proxy I mentioned (or could be thought of as an Adapter, as it adapts Future<Temperature> to look like a Temperature). It is implemented as follows:


public final class FutureTemperature implements Temperature {
private final Future<Temperature> futureTemperature;
private final long timeout;
private final TimeUnit timeoutUnit;

public FutureTemperature(Future<Temperature> futureTemperature, long timeout, TimeUnit timeoutUnit) {
this.futureTemperature = futureTemperature;
this.timeout = timeout;
this.timeoutUnit = timeoutUnit;
}

public double getCelciusValue() {
return getFutureTemperature().getCelciusValue();
}

public double getFahrenheitValue() {
return getFutureTemperature().getFahrenheitValue();
}

private Temperature getFutureTemperature() {
// TODO: Should give more thought to exceptions!
try {
return futureTemperature.get(timeout, timeoutUnit);
} catch (InterruptedException e) {
throw new RuntimeException(e);
} catch (ExecutionException e) {
throw new RuntimeException(e);
} catch (TimeoutException e) {
throw new RuntimeException(e);
}
}

}


I suppose it is quite a bit of work, but the end result is that the fact that futures are being used is now hidden from the client (other than the configuration of the weather services). Notice that in this example I allow for the specification of a timeout. This is important or otherwise the client thread may wait forever if the remote service fails. One drawback with using proxies as above is that the client can not specify the timeout at the point where they request the value, but instead have to specify it as part of the service configuration. For example, the client cannot pass a timeout to the getCelciusValue() method of the Temperature object since the Temperature interface quite naturally does not expect a timeout. In practice I suspect this is a small price to pay.

As hinted in my last blog, it is easy to fall into the trap of using futures when other concurrency patterns are more appropriate. Note that in my example above, by the time the main thread has invoked the getCelciusValue() method of the melb temperature object, you will end up with three threads all blocked waiting for responses from the three weather services, and the main thread blocked waiting on the melbourne temperature future. If this is a simple desktop client connecting to three different weather services then that is fine. But if you had to connect to thousands of weather services, or if this were part of a server side application that was expected to have a high transaction rate, then the result can be a large number of threads all sitting around doing effectively nothing. Threads can be expensive (they are usually an operating system level resource).

Often a more efficient approach is one based on message queues. Use of message queues could reduce the thread consumption to a single thread sending one-way messages to all of the remote services, and then fetching their responses off of a single in-coming message queue. But now I have gone completely off topic and so will stop for now!

Saturday, March 26, 2005

Futures and Eventual Values Part 2

This blog entry is Part 2 of my discussion of Futures and Eventual Values. I will assume that you have read my previous blog entry: Futures and Eventual Values Part 1

In this blog I will attempt to explain the difference between a future and an eventual value. I will also look at the implementation of futures in the concurrency library of the JDK 1.5.

Before I go on, I should warn the reader that I am not entirely sure that the distinction that I am making between futures and eventual values is standard. It would not surprise me if many "experts" in the field will claim that I am splitting hairs and that what I am calling an eventual value is just a different type of future. However, I personally think that what I am defining here as eventual values and futures do have interesting differences.

Let me start with another example using an eventual value.


final EventualValue<Integer> ev = new EventualValue<Integer>();
new Thread(){
public void run(){
ev.set(3 + 4);
}
}.start();
new Thread(){
public void run(){
ev.set(3 + 6);
}
}.start();
System.out.println(ev.get());
Thread.sleep(500);
System.out.println(ev.get());


You can not determine what will be printed by just looking at this program. I ran it a couple of times, and each time got 7 printed out followed by 9. However, there is no reason it could not have been the other way around, nor any reason it might not have printed out 7 twice or 9 twice. An important point that will make more sense when we look at futures is that the expression "3 + 4" or "3 + 6" that will be used to determine the value of the eventual value is not necessarily fixed at the time the eventual value is created.

An important feature of a future (at least how it is defined for Multilisp), is that the expression used to determine the value of the future is fixed at the time the future is created.

Have a look at the Future interface as it is defined in the JDK 1.5 and note carefully that it does not have a set() method. It has several useful methods besides get(), but it does not have any way of setting the wrapped value.


public interface Future<V> {
boolean cancel(boolean mayInterruptIfRunning);
boolean isCancelled();
boolean isDone();
V get() throws InterruptedException, ExecutionException;
V get(long timeout, TimeUnit unit)
throws InterruptedException, ExecutionException, TimeoutException;
}


When I first looked at the Java concurrency library I was perplexed by the lack of a setter on Future. My experience with the ACE C++ library of Doug Schmidt (http://www.cs.wustl.edu/~schmidt/ACE.html) had led me to assume that a Future would have a setter and a getter. I now believe that the ACE Future class is more like an eventual value than a Future.

The JDK provides a number of different ways of creating a Future, one of which is to construct a FutureTask (which implements Future). But even FutureTask does not have a set() method. Instead, FutureTask takes as a parameter in its constructor what is effectively the "expression" that will be used to determine the future's value. Of course in Java you cannot actually pass expressions around (the expression will be evaluated before it gets passed in) and so FutureTask takes the next best thing: a Callable.


public interface Callable<V> {
V call() throws Exception;
}


As you can see, a Callable is simply an interface with a call() method that returns a value. If I want to create a future whose value will eventually be the result of calling some method m(), then I simply create a Callable whose call() method calls m(). Using the example from Part 1, if I want a future whose value will be the result of invoking the getTemperature() method of a remote weather service, then I first need to define a Callable class:


public class TemperatureRequest implements Callable<Float> {
private RemoteWeatherStation weatherStation;
public TemperatureRequest(RemoteWeatherStation weatherStation) {
this.weatherStation = weatherStation;
}
public Float call() throws Exception {
return weatherStation.getTemperature();
}
}


The FutureTask object can now be created...


FutureTask<Float> result = new FutureTask<Float>(new TemperatureRequest(weatherStation));


The creation of the FutureTask does not automatically result in the Callable being invoked. My guess is that the designer (Doug Lea), realized that it was important that the FutureTask not be responsible for the creation and execution of the thread that calls the Callable since it is quite possible that you want to use a thread pool or perhaps schedule the execution for a later time. Hence FutureTask, as well as being a Future, is also a Runnable and so can be passed to any thread of your choosing for execution. Here is the full getTemperature() method of our new asynchronous weather service:


public Future<Float> getTemperature() {
FutureTask<Float> result = new FutureTask<Float>(new TemperatureRequest(weatherStation));
new Thread(result).start();
return result;
}


Hopefully, if you have lasted this long, you can now see my point that, unlike eventual values, the expression that will eventually provide a value for the future is specified as part of the creation of the future. This possibly makes futures easier to analyse. Indeed the result is something akin to an immutable object: its value is determined by what it is given in its constructor and, once it has a value, it is not going to change.

The advantages that a future has over an eventual object come at a price. To start with, in languages like Java that have poor syntactic support for closures, using Futures results in more code and possibly code that is harder to understand. Futures are also less flexible than eventual values: you could easily implement your own Future class using an eventual value, but would find it more difficult to do the reverse.

One of the other topics I covered in last Wednesday's meeting was how useful it may be to combine the Proxy pattern with futures. But that can wait for another blog. I should also blog about the risks of relying too heavily on futures when you should possibly be using more scalable patterns like message queues.

Friday, March 25, 2005

Futures and Eventual Values Part 1

Last Wednesday evening I gave a presentation on the "Futures" concurrency pattern at the Melbourne Patterns Group. Given that a couple of people have stated that they regret missing out on the presentation, and given that the power point presentation will be of little use without the talking that went with it, I thought that I should blog about the topic.

I first came across the concept of a Future six or so years ago while working on a C++ project a Ericsson. I found them to be an elegant and useful approach to certain types of concurrency problems. Since that time, most of my programming has been targetted at J2EE environments where creating your own threads is generally frowned upon and hence I have never had to use a Future since. So I have had fun this last month doing a bit of research on concurrency patterns and looking at the new concurrency library in the JDK 1.5.

A future is a placeholder for a value of an expression being computed by a separate thread.

I believe that the idea dates back to a language called Multilisp (see R. Halstead, Multilisp: A Language for Concurrent Symbolic Computation, TOPLAS pp.501-538 (Oct 1985). Available from the ACM digital library (http://www.acm.org/)). This paper in turn refers to Algol 68 (Algol 68 User Manual. March 8, 1978. http://members.dokom.net/w.kloke/a68s.txt) which has a similar feature called an Eventual Value.

I am not entirely sure that I understand the difference between an eventual value and a future, but I think that there is a difference as I will try to explain.

First, I will explain what I think an eventual value is. Given that most of us have been brainwashed into thinking in terms of objects, you can think of an eventual value as being a wrapper around a value and having a get() method and a set() method to get and set that value. The eventual value is deemed to initially be in an undetermined state and to remain so until the value is set via the set() method. After the set() method is called the eventual value is said to be in a determined state.

If the eventual value is in a determined state, then calls to the get() method will return the wrapped value without blocking. Otherwise, the get() method will block until some thread calls the set() method.

Imagine we are writing a remote proxy for a weather service able to return the current temperature of a specified city. Due to network latency and the load on the remote service, such a request could take several seconds to return. If we wish the get temperatures from a number of different services, then we can reduce the overall latency by making those requests concurrently. One way of achieving this is for our remote proxy to return eventual values like so:


public class WeatherService {

.....

public EventualValue getTemperature() {
final EventualValue result = new EventualValue();
new Thread() {
public void run() {
result.set(remoteWeatherStation.getTemperature());
}
}.start();
return result;
}
}



The intention is that WeatherService is able to fetch the current termperature of a specified city from a remote service. Rather than waiting for the remote service to reply, the getTemperature() method creates and returns an eventual value, while concurrently making the remote request in a separate thread. When the remote request returns (possibly several seconds later), that thread will call the set() method on the eventual value with the result.

Consider the following possible client code:


WeatherService melb = new WeatherService("Melbourne");
WeatherService sydney = new WeatherService("Sydney");
WeatherService brisbane = new WeatherService("Brisbane");
EventualValue<Float> melbTemperature = melb.getTemperature();
EventualValue<Float> sydneyTemperature = sydney.getTemperature();
EventualValue<Float> brisbaneTemperature = brisbane.getTemperature();
System.out.println("City\t\tTemperature");
System.out.println("Melbourne\t" + melbTemperature.get());
System.out.println("Sydney\t\t" + sydneyTemperature.get());
System.out.println("Brisbane\t" + brisbaneTemperature.get());


The three calls to getTemperature() will each return immediately, and the main thread will not actually block until it reaches melbTemperature.get(). At that point, the thread will block until the remote service returns its value. When the main thread reaches the next line and tries to call sydneyTemperature.get(), it is possible that the remote service has already returned the Sydney temperature, otherwise that call will also block. Either way, the total wall clock time taken to fetch and print the three temperatures should be significantly less than the sum of the times taken to get each temperature individually.

A fairly crude implementation of an EventualValue class might look something like this:


public class EventualValue<T> {
private T value;

private boolean ready = false;

public synchronized void set(T value) {
this.value = value;
ready = true;
notifyAll();
}

public synchronized T get() throws InterruptedException {
while (!ready) {
wait();
}
return value;
}

}


A real one would supply more methods, e.g. a get() method that takes a timeout. It may also use a separate object for synchronization purposes to prevent clients explicitly performing synchronization operations on the EventualValue themselves.

If you are using Java, you should try to use the features of the new currency library. It comes with the JDK1.5, and you can download a version for older versions of Java. However, instead of providing support for eventual values, Java provides support for futures, and I think a discussion of futures and how I think they differ from eventual values deserves a separate blog.

Saturday, March 05, 2005

Book Review: Agile Database Techniques

I recently finished reading
"Agile Database Techniques" by Scott Ambler

On the whole, I found the book enjoyable to read and reasonably informative. I would not rank it up as high as Domain Driven Design or Patterns of Enterprise Application Architecture, but I would certainly rank it above many other books I have read. I find it sad and frustrating that a lot of programmers actually take pride in not knowing anything about databases. While O/R mapping tools like Hibernate are excellent, ignorance of how databases work and lack of data modelling skills will lead to second rate solutions even when you use tools like Hibernate.

Things I liked about the book include:
  • Modelling tips.
  • Database refactoring tips
  • Performance tuning tips
  • Pros & cons of implementing referential integrity & business logic in the database versus implementing it application code.
  • Discussion on natural versus artificial primary keys
  • Discussion on database encapsulation strategies
  • Security issues.
What I particularly liked was that Scott seemed to have a very balanced view on a lot of these topics and points out that a lot of these issues are not black and white.

Things I think need improvement include:
  • I did not find the concept of "class normalization" all that helpful.
  • The same material could have been covered in a book half the size.
Despite its shortcomings, it is still definitely worth a read.

Monday, January 10, 2005

Default method implementations aren't so bad!

Goal for this year: Stop coming up with complicated solutions to simple problems :-)

I guess I should have dwelt on the problem further before blogging about it (though some feedback has helped!) Certainly for the problem described, having a default method implementation is not so bad, and perhaps I was too quick to suggest a complicated solution.

However, given that it is very easy in Java to miss the fact that a method had not been declared as final, I strongly believe you should at least comment such methods indicating that you expect subclasses to over-ride them and possibly even pointing out the consequences of doing so (See Joshua Bloch's item 15: "Design and document for inheritance or else prohibit it").

Sunday, January 09, 2005

Are default method implementations bad?

Surely I am not the only one who feels that, except for some very rare occasions, a method should either be abstract or final. Let me explain why.

When analysing a piece of code (e.g. to extend or to debug it), I am often fooled by an innocent looking method like this one (taken out of Martin Fowler’s excellent book on Refactoring, bottom of page 49):

class Price {
....
    int getFrequentRenterPoints(int daysRented) {
        return 1;
    }
}

When executing code like this, I may be surprised to find that the method is returning “2” instead of “1”. After wasting time adding logging or invoking the debugger, I eventually find that I am actually dealing with a subclass that overrides the method with:

class NewReleasePrice extends Price {
....
    int getFrequentRenterPoints(int daysRented) {
        return (daysRented > 1) ? 2 : 1;
    }
}

Am I the only one who falls for this time and time again? Am I over-engineering if I instead use a strategy pattern? E.g.:

class Price {
....
    private FrequentRenterStrategy frequentRenterStrategy;
    Price(FrequentRenterStrategy frequentRenterStrategy) {
        this.frequentRenterStrategy = frequentRenterStrategy;
    }

    final int getFrequentRenterPoints(int daysRented) {
        return frequentRenterStrategy.getFrequentRenterPoints(daysRented);
    }
}

interface FrequentRenterStrategy {
    int getFrequentRenterPoints(int daysRented);
}

final class DefaultFrequentRenterStrategy implements FrequentRenterStrategy
    int getFrequentRenterPoints(int daysRented) {
        return 1;
    }
}

final class NewReleaseFrequentRenterStrategy implements FrequentRenterStrategy
    int getFrequentRenterPoints(int daysRented) {
        return (daysRented > 1) ? 2 : 1;
    }
}

class RegularPrice extends Price {
    RegularPrice() {
        super(new DefaultFrequentRenterStrategy ());
    }
....
}


Admittedly, this results in more classes and more code to write, but I find that I am far less likely to misunderstand what is going on. Am I alone in feeling this way?

About the only time I am happy to see a default implementation of a method is in a Decorator framework (see “Design Patterns” by Gamma, Helm, Johnson, and Vlissides). If a number of decorators are to be developed, then it is convenient to have an abstract decorator that delegates everything to the object it is decorating. This is clearly designed to have its methods over-ridden as a convenience to the creators of concrete decorators that only need to over-ride a small subset of the methods.

I might have had a different opinion if methods in Java were final by default and, as with C++, you had to explicitly state when you want a method to be virtual. If the method had the keyword “virtual” out the front then I might be more likely to expect it to be over-ridden by subclasses. PLEASE, NO COMMENTS ABOUT HOW C# IS BETTER THAN JAVA WITH RESPECT TO THIS :-)