Obfuscation and Decompilation

Many people are concerned about the ability to decompile .NET programs into fairly readable C# or VB.NET code. This article aims to dispel some of the myths and give some options for developers who are worried.

The Problem

A program written in .NET can easily be decompiled by programs such as Reflector. The IL generated by the C# and VB.NET compilers is very closely aligned to the original code, and those compilers don't do much to optimise the IL generated - they leave that up to the JIT compiler, which is able to do a much better job in general. Names of types, methods, instance and static variables are preserved in the compiled IL, as are string literals, even in release mode. Comments and local variable names are not preserved, however. (If you also provide the PDB file containing debugging information, the local variable names are contained within that; this is relatively rare in commercial software however, and some decompilers don't even look for a PDB.)

Myth: "Decompilation Makes .NET Applications Open Source"

This is a common claim, and completely preposterous. Even if you could decompile to the absolute original source, including comments, local variable names etc, that wouldn't make the application Open Source Software. The Open Source movement is about software licences - what you have the legal right to do. If the author of the software doesn't let you have the source without reverse engineering it, that isn't Open Source Software. (There's more to it than that, of course, but that's enough to show the absurdity of the myth.)

How Severe Is The Problem?

Many developers have been shocked by how easy it is to decompile their code, and fear that it means they no longer have any way of protecting their intellectual property. In practice, I don't believe the problem is nearly as big as it's claimed to be. Firstly, intellectual property is almost always within the design of a system, not in the individual bits of implementation. If you design a world-beating application, chances are that the reason it's world-beating will be obvious to anyone who uses it anyway. Only a very few areas in computing are really all about which algorithms are used, and how they're implemented - areas such as sound and video compression.

Have you ever tried to read a large amount of code without any documentation, comments or meaningful local variable names? In my experience, it can be hard enough to understand code when you do have design documents and comments, let alone without it. Now, let's make it even harder...

Obfuscation

Obfuscation is the process of making code harder to understand in decompiled form, without changing the semantics. Simple obfuscators could just change the names of private variables and methods, while more complex ones can (with suitable configuration) change even public names to be unintelligible, changing references to those names from different assemblies etc. Others can make the code flow harder to follow, and some obfuscators can confuse some decompilers enough to stop them from being able to produce any code at all. Others will even encrypt your code, only decrypting it at runtime.

I don't have any personal experience with .NET obfuscators, though I have used some for Java (which has the same characteristics in terms of decompilation). Unfortunately, that means I can't actually recommend any. I'd strongly advise you to try several different obfuscators before settling on one - look at the results of decompilation with a few different tools if possible. Before buying one, try getting someone who you trust but who hasn't been part of the development of your code to try to read the decompiled version of your non-obfuscated application - you may well decide you don't need to use an obfuscator in the first place. Once you've settled on an obfuscator, use it all the time though - you should carry out as much testing as possible on the obfuscated version rather than the "clear" version, as obfuscation can raise some subtle problems.

"Linking"

There are a few "linkers" on the market which effectively take your .NET assemblies and turn them into something else entirely - whether that's a precompiled image, or a proprietary wrapped version of your assemblies along with whatever .NET system libraries you're using. These give you a higher level of protection, but are generally expensive - and are likely to make it harder to find problems from a support point of view, as you're no longer using the "standard" .NET way of distributing your apps. Again, I haven't used these products, so can't comment on the quality of them, but I have no reason to believe they don't work. As with obfuscators, if you're going to use a "linker" you should use it throughout the development lifecycle - deciding to use one after all the development and testing has been performed is a very bad idea.

Security

If you are relying on your code being unreadable for security reasons (otherwise known as "security through obscurity") then your code just isn't secure. In some cases, genuine security may be impossible to achieve at the same time as some of your other goals, in which case obscuring the code is a reasonable first step, but you should always be aware that you haven't achieved real security. Even when obfuscated - or even in native code - anyone sufficiently determined will work out what your code does. Genuine security isn't obtained through not letting people know how your product works.

Licence Checking Code

One area which I believe is of valid concern in some situations is licence checking. This is quite a similar situation to security in some ways, except it's almost always impossible to have a genuinely secure product in terms of licensing. You only need to take a look at the games industry to see huge amounts of money being spent on different ways of trying to make it impossible for people to use unlicensed software - and the same look will reveal most big name games being cracked within days of release. Note that almost all these games are written in native code rather than managed code.

You can pepper your code with licence checks, along with things which look like they might be doing licence checking, but aren't, and all kinds of other things to make it harder to see what's going on. Add obfuscation in there and it could take a significant amount of time for a cracker to bypass your licensing system - but you always have to face the possibility that there might be someone with enough skill and patience. The more confusing code you add to your product, the harder it is for you to read the code too. Depending on the nature of your product, you may wish to spend a lot of effort on this and raise the bar on potential crackers, or just put a simple system in place and take the risk that your software will be used by people who haven't paid for it. It's a balancing act, basically. In the end, non-technological means (i.e. lawyers) are required if someone is really determined to crack your code.

No Software Is Truly Safe

In the end, if you provide people with the means to run your software on a PC, they will eventually be able to find out how it works, given enough time and patience. If the computer can run the code, so can a human - just much slower. This is as true of native code as it is of managed code; native code can't be decompiled as well as managed code, but the vast amount of cracked software in the world shows that clearly it can be understood well enough to work round licensing restrictions. It will therefore always be a matter of balance - how many of your potential customers do you think want to rip you off? (If someone isn't going to buy your software anyway, it doesn't really matter to you financially whether or not they use it illegally; only potential customers really matter.) If you can trust your customer base, don't worry about it at all. If you think there may be some bad apples, look at obfuscation and some of the other options - but always weigh up how much it will cost you to implement these solutions against the potential lost revenue.


Back to the main page.