The entropy of logging

A couple of weeks ago, my colleague António Vieira made an internal presentation called the “The entropy of logging”. In this post I share the main points of the presentation and my thoughts on the subject.


"Event logs record events taking place in the execution of a system in order to provide an audit trail that can be used to understand the activity of the system and to diagnose problems. They are essential to understand the activities of complex systems, particularly in the case of applications with little user interaction (such as server applications)." - Wikipedia
We all know what logging is but how to do it right?
 Some guidelines might help…


Captain's Log, Stardate 43125.8. We have entered a spectacular binary star system in the Kavis Alpha sector on a most critical mission of astrophysical research. Our eminent guest, Dr. Paul Stubbs, will attempt to study the decay of neutronium expelled at relativistic speeds from a massive stellar explosion which will occur here in a matter of hours.


Captain's Log, Stardate 43125.8. Today I had baked beans for lunch. Lt Commander Data didn’t understand my irony. Lt Commander La Forge has the feeling unwell, probably the galactic flu.


7 guidelines

  1. Log relevant events
  2. Log the unpredictable
  3. 1 event = 1 log line
  4. Log the boundaries of the application
  5. Consistent and parsable
  6. End-to-end traceability
  7. Protect privacy

Should libraries include logging?

No! Logging is an application level concern
Libraries should expose all the relevant state as return values, exceptions or event listeners.
If you really need events on your library keep it very simple and plugable.

My (Sergio) conclusion

Logging is something that many of us, professional developers, take for granted. In every project I've worked, as soon as deploy an application to an environment, things start to fail (which is expected), but it’s rare that the existing log is enough to quickly debug the problem. Most often, critical information is missing, and the existing log entries are mostly just clutter.

Based on my experience, all of these guidelines are valid. The ones I hadn't applied in the past were point 3, as well as point 5 and 7, but simply because they didn't apply to my application's domain.

Having one log entry per event may seem strange at first because it seems a strong constrain, but I've recently have been applying it, and it has actually made the code cleaner. Basically, each domain event will produce a single log entry in case of success (INFO), or in the case of failure (ERROR)To me it feels the correct approach, particularly for a Domain Driven Design approach (although I haven't been using it currently), as you should have one log entry per domain event. Having that said, this shouldn't be applied too literally: it's still OK to have a random log entry at the DEBUG level, just not at INFO and ERROR (you are filtering DEBUG entries in production, right? :))

By applying the single log entry guideline, you'll probably have to log a lot of information, which leads to the parsable guideline. In our current development, we've found that the best approach is to log it as JSON. We are not constraining the event to an actual schema, although we try to be consistent. In practise, this means we are using Json.NET to serialize a dynamic object (in .NET), or Jackson to serialize an HashMap object. Log parsing tools (such as logstash) should have any problem with this.

Regarding privacy, we have been very careful. We try to use SecureString (in .NET) or char[] (in Java) types for holding passwords in memory, but for the remaining data the best approach is to remove sensitive fields from serialization (preferably in a declaratively form, to be clear about this in the code). When the sensitive information is relevant for logging issues, the standard approach is to log the first N characters of the text, and add place-holders (such as the asterisk character) to the remaining text.

Finally, regarding on the point of not logging on libraries, I can't say I totally agree with António. I mean, I agree if the library is a thin layer, such as in an infrastructure layer library; but this may harder to implement if the library is complex, such as a domain layer library (as I have implemented in the past). What I can add is that we should avoid logging in libraries if we possible; if not,we should use a logging framework facade, such as CommonLogging in .NET or SLF4J in Java, to reduce the change of friction.

Thanks to António Vieira for the interesting discussion and allowing me to share some of the content of his presentation.


Checked exceptions in Java

In this post I describe some issues I had with checked exceptions in Java, and my current approach to handle them.

When I developed in Java non-professionally some years ago, I found the checked exception mechanism interesting. The possibility to add to a method's contract the exceptions it throws, and therefore enforce that they are handled at compile seemed a good idea. (I actually missed them when I first started developing in C#). Now that I a more experienced developer and that I try to apply the SOLID principles, I'm not so sure.

Dependency Inversion Principle

When you are applying the Dependency Inversion Principle, classes depend on interfaces, not the actual implementations. Let's say you have a Serializer interface with a single serialize() method and SerializerImpl class. So if SerializerImpl has a serialize() implementation which throws an exception, then Serializer must also declare it throws that exception on serialize(). It seems to me a bit awkward that the interface must know which exceptions the actual implementation throws, but that doesn't seem to break any SOLID principle too much, so I can live with that.

What happens when you need to have must implementations of that class? Let's say you need to implement an XML, a JSON and a Protobuff serializer. Each serialize() implementation will probably need to declare the exceptions it throws. For simplicity sake, let's say XmlSerializer throws XmlException, JsonSerializer throws JsonException, and ProtobuffException (in reality, depending on the framework you use in each of the implementations that may need to throw more than on). But then, the Serializer interface must also declare it throws the exceptions each implementation throws, which mean that each implementation must know all the exceptions the other implementations throw. That means that all the Serializer implementation are coupled with each other, and the interface is coupled with each one of the implementations. This definitely doesn't seem right!

So, my approach was to introduce a new exception: SerializerException and change the serialize() method to throw only this exception. The purpose of this exception is to wrap the actual exceptions thrown by the frameworks used for each of the serialization methods. It adds the extra work of having to define a new exception, but it decouples the Serializer from all the implementations, and decouples each implementation from all the others.

Checked or unchecked?

While this approach seems good, it's only elegant for simple applications, that is, if your application entry point invokes the Serializer. If you have a non-trivial application with a more complex dependency tree, and you just want to handle exceptions at the high-level, then it becomes cumbersome. Let's say you entry point class invokes a method foobar() in class Foobar, which invokes a method foo() in class Foo, which then invokes a method bar() in class Bar, which finally invokes serialize(). All these methods: foo(), bar() and foobar() must declare that they throw SerializerException, regardless of their responsibility. Also, if you have unit tests for these classes, then each test method must also declare it throws SerializerException.

My approach was to make SerializerException extend RuntimeException instead of the (checked) Exception. This means that the Serializer interface (and any implementation) no longer need to declare it throws any exception, so the code becomes much simpler. However, I did document that the method throws the exception using the @throws javadoc comment. (This is pretty much the approach used in C#).

As a I did this, I did do some research. Oracle's documentation on the matter (Unchecked exceptions - the controversy) didn't convince me, and lot of people shared my opinion, but none of them was quite an authority. Then I found Bruce Eckel's book "Thinking in Java" (4th edition). Namely in Chapter 9, section "Perspectives", page 347, he has a similar opinion to mine (and so does Martin Fowler).


In my opinion, avoid checked exceptions in Java unless you are doing a trivial application. If you need to work with checked exceptions, wrap them in your own unchecked exception. The extra work needed outweighs the clarity and decoupling you obtain in your code base. Nevertheless, documenting these exceptions in javadocs and properly (unit) testing your code is also necessary,


Hiding protobuffer generated code in a Java library

In this post I describe how to hide Protocol Buffer's generated code in a Java library.

Recently, I was developing a Java library that used Google's Protocol Buffers as a serialization format. As part of the design of the library, we only expose some classes, the remaining implementation has package-private visibility. In particular, we de not want to exposed the generated object, as it was mapped into a domain class, but the protoc.exe executable produces public code. The solution was to perform a post-processing step invoking "protoc" to remove all public definitions used the sed command (if you are on Windows, you should have this command available by installing Git Bash). Here is an example of a batch file to this on Windows:

@echo off
:: compiling ProtoBuff definition
protoc.exe --java_out=java\ java\pt\invisibleobjects\MyObject.proto
:: setting package-private class visibility for classes
sed -i 's/^public final class/final class/' java\pt\invisibleobjects\MyObject.java
:: setting package-private class visibility for static classes
sed -i 's/^public static final class/static final class/' java\pt\invisibleobjects\MyObject.java
:: setting package-private class visibility for enums
sed -i 's/^public enum/enum/' java\pt\invisibleobjects\MyObject.java

Back to the Future

I've recently decided to close my old personal blog. However, I picked all the software development related posts (wrote somewhere between 2007 and 2008) and imported them into this blog.


return java;

As mentioned previously in my blog, I'm back to Java development. And it's been a while since my MSc AspectJ prototype. So how does a .NET developer survive in a Java world?

First, read this article: Java Doesn’t Suck – You’re Just Using it Wrong

Second, if you are lucky enough, try to lobby for using a modern Java stack. We are a couple of interesting technologies:

Finally, a good IDE is also important. The Eclipse IDE hasn't changed much over the years, and so I've been using IntelliJ IDEA, which is quite better (not surprising, since it's developed by JetBrains, the same people that developed ReSharper). There is a nice Community Edition, which is free. As a bonus, if you want to use the same short-cuts as ReSharper, you can get them here.

Design by Contract for Java on github

I've decided to share on GitHub the Design by Contract for Java (DbC4J) library. This was a prototype I developed back in 2007 for my Master Thesis, "An Aspect Oriented Infrastructure for Design by Contract in Java".

The DbC4J library allowed enforcing contract clauses using regular Java methods via naming conventions. Internally, it was implemented using AspectJ, the original Aspect-Oriented Programming (AOP) framework, and a lot reflection. The library has been available on-line since 2007, but the since the research group I was part of closed their website, it has been unavailable. Currently the source code available at GitHub is the last release from 2007, unchanged, so I'm not sure how the code compiles under the latest versions of Java/AspectJ (this was originally implemented under JDK/AspectJ/AJDT 5 with Eclipse). Also available on GitHub is my Master Thesis dissertation, in case anyone in interested in the research underlying the prototype.


The blog formely known as Sergio Agostinho .NET

I've recently started working professionally on Java and I would like to blog about it in the near future. As such, it makes on sense to create (yet) another blog, so I decided to rename this blog to "Sergio Agostinho Dev". This means I'll also rename the site URL to http://sergioagostinho-dev.blogspot.com Yes, this will break existing links and will not benefit the blog SEO, but it's a change I would have to do, sooner of later.


My first .gitconfig

I've recently started to use git on an everyday basis. Here's a couple of configurations that you may find useful, if you're just getting started:

 sslVerify = false
 postBuffer = 104857600
 default = simple
 name = James Bond
 email = bond.james@mi6.com
 ui = auto
 autocrlf = true
 excludesfile = C:\\Users\\james\\Documents\\gitignore_global.txt
[difftool "sourcetree"]
 cmd = 'C:/Program Files (x86)/WinMerge/WinMergeU.exe' -e -u -dl Mine -wr -dr Theirs $LOCAL $REMOTE $MERGED
    tool = kdiff3
[mergetool "kdiff3"]
    cmd = 'C:/Program Files/KDiff3/kdiff3' $BASE $LOCAL $REMOTE -o $MERGED
 helper = store

A couple of notes:

  • You set all of this in Git Bash using the git-config command.
  • Your ".gitconfig" file is usually found in the root of your user folder.
  • If you have pluralsight, you can check the Git Fundamentals course.


Should be or not to be, that is the question

If you are doing a lot of unit or acceptance testing, you've probably have been using FluentAssertions library (if not, give it a try!). This library presents a very good alternative to NUnit's Assert class (or equivalent unit test framework you are using). However, this are some caveats. One of them is that the ShouldBeEquivalent() extension method is case-sensitive when comparing string (and there is no AssertionOptions to do that). However, if you use the Should() extension method, it is case-insensitive. Putting it in simply terms:

        public void ThisWayItsCaseInsenstive()
            var foo = "XPTO";
            var bar = "xpto";


        public void ThisWayItsNot()
            var foo = "XPTO";
            var bar = "xpto";


Yes, I know it's documented, but it's not obvious, which is a very unfortunate design decision.


Really resetting IIS

If you are developing ASP.Net web sites with IIS on your local machine it's not very uncommon to update libraries and sometimes IIS just gets messed up, something along the lines of "Could not load file of assembly...".

My recipe for this kind of issues is usually something like:

  • clean + rebuild from Visual Studio
  • iisreset
If that doesn't work, then I might get desperate and do this:
  • iisreset /stop
  • delete the web site from IIS Manager
  • delete the "bin" and "obj" folders for the project
  • restart Visual Studio
  • restart Windows
  • leave the room and start over (!)
But today, I've remembered that somewhere in between, it might be useful to delete the Temporary ASP.NET files, thanks to StackOverflow.

So, why not automatize this in a batch file? Here the prototype for my "really_clean_iis.bat":
@echo off

echo stopping IIS...
iisreset /stop

rem del %TEMP%\Temporary ASP.NET Files\*.*

echo deleting temporary ASP.NET files for .NET 2.0/4.0 32/64 bit...
del C:\Windows\Microsoft.NET\Framework\v2.0.50727\"Temporary ASP.NET Files"\*.* /s /q
del C:\Windows\Microsoft.NET\Framework\v4.0.30319\"Temporary ASP.NET Files"\*.* /s /q
del C:\Windows\Microsoft.NET\Framework64\v2.0.50727\"Temporary ASP.NET Files"\*.* /s /q
del C:\Windows\Microsoft.NET\Framework64\v4.0.30319\"Temporary ASP.NET Files"\*.* /s /q

echo starting IIS..
iisreset /start