Hex-Rays decompiler
Summary:
- Hex-Rays Decompiler is a C decompiler
- Decompilers turn machine code into code in the original language, which helps readability
- Hex-Rays Decompiler requires IDA Pro 5.5 to work
(edit)
Comments
Thank you! You will be redirected to your new comment shortly.
Thank you! You will be redirected to your new comment shortly.
I wonder how decompilers work. I've written a disassembler as a side project a long time, but that was straight-forward. Decompilers seem much harder.
Thank you! You will be redirected to your new comment shortly.
The Hex-Rays decompiler requires IDA Pro 5.5 to work. According to this price listing, the bundle costs $2558. Pricey.
Thank you! You will be redirected to your new comment shortly.
Houses are not very cheap and not everyone is able to buy it. However, personal loans was created to aid different people in such kind of cases.
Similar
People who this link also
Submitted by circaee
Short url:
I've written a C decompiler with alistra (it's on github if you're curious), so I can shed some light on this.
Decompilation is a tricky process. In the general case, it is undecidable, meaning that in general, you cannot write an algorithm that would decompile every possible piece of code. It remains undecidable even if you disallow self-modifying code.
However, there are still many patterns in compiled machine code which you can use to try to construct a partial decompilation.
Our approach is best shown using a flowchart (yes):
I will describe briefly the stages:
Object dump
Usually, executable programs are in a specific format (ELF and a.out in Unix, PE on Windows, etc.). Object dump is a process which takes an object file and extracts information from it, such as: the machine code, the entry point (where the code starts when you run the executable), list of symbols (function names, data, etc. with their associated addresses), and so on.
For that, you can use the objdump utility available on most GNU systems as part of GNU Binutils.
An example object dump (to get a list of symbols).
For reference, tests/test_ack.c (I will be using this example throughout the post):
(The code implements the Ackermann function)
Disassembly
Compiled executable code is usually stored on disk as machine code (i.e. a set of binary instructions that is understood by CPUs directly). There is usually a human-readable way to transcribe this machine code, called assembly language.
Disassembly consists of turning machine code into assembly language. For example, here's a snippet of the disassembly of test_ack into x64 code:
ocd disassembled into an intermediate language, so that more machine code types could be plugged in in the future.
Control flow analysis
A control flow graph is generated, which encodes information about the different paths a program execution might take, depending on control statements of the language (jumps, conditional statemenents, loops).
Such control flow graphs have distinct patterns for certain control statments, here's a sample table:
Those patterns can be then rewritten (using a graph rewriting system, not unlike a grammar rewriting system) into single nodes until no more patterns can be found. With this process the original code patterns emerge and we get some insight into the original program structure.
Here is an example of such rewriting. This is the control flow graph for test_ack as it is rewritten into ultimately one node.
Data flow analysis
First, it makes sense to convert things like register operands, etc. into fresh variables. For example, this:
might be converted to
Then, the program's computation flow is analyzed. Dead instructions are removed and others are folded into more succinct expressions. This
might be converted into
Often, compilers have idioms. For example, instead of writing
this might be used:
to save cycles. A decompiler might keep a "dictionary" of such idioms to simplify the output.
Program output
Finally, the program can be output.
For example, this is what ocd outputs for test_ack:
As you can see, it's not perfect, but you can see the general structure.
For a more in-depth analysis of the problem, you can read Cristina Cifuentes' PhD thesis, Reverse Compilation Techniques.
Do you have any background in reverse enginnering? Do you know if decompilers are used in practice?
I used to reverse engineer games a lot as a hobby. A decompiler would have been useful with old games written in C/C++, but only slightly.
Decompilers aren't used widely, they don't give enough value to bother trying to get them to work. Perhaps if the interface was better?
I wrote the decompiler for fun, to see what could be done, rather than to be useful.