Recently, I did some research on Hidden VNCs (HVNC). This is a neat feature for attackers to have, since it allows to remotely control a compromised system on a new and separate virtual desktop that is not visible to the victim. This way, it is possible to remotely launch and use GUI programs without generating any visual indicators for victims.
Implementing HVNC is complicated. First of all, you need to be able to grab the screen contents of the hidden desktop. On Windows, there are no APIs available to do this easily. You can only take screenshots of the default desktop, which is the one a potential victim is operating on. You rather have to use something like EnumDesktopWindows()
to get a list of windows present on the hidden desktop and use another API call to get the window contents. In case that works: Great. But sometimes it just doesn’t because some developer didn’t care about implementing the functionality required to grab screen contents (WM_PRINT
, for example). On top of that, you have to do all of that in the correct order, since windows are often layered on top of each other. Sending input to a specific window is quite another thing: On the regular desktop you can just use SendInput()
and coordinates to simulate a click. On a hidden desktop, this API cannot be used as well. It is required to check what kind of GUI element a user clicks on, e.g. a close button, and send the correct message to the correct window. So, you ultimately have to implement your own window manager. Coooooooool :/
Sometimes we might not need all of that complicated and complex stuff. Watching users and sending input on the same desktop may be just enough to pull off a successful attack. For example, we wait until the user has logged into a 2FA-protected web GUI and grab the cookie once we see the session was established. We can even take over the session once we think the user is not watching and interact with the web application. There are many scenarios where this may be of use. Stealing a specific secret stored in a password manager for example.
Microsoft Teams is able to constantly grab screen contents of the user’s desktop, so I wanted to replicate that in combination with the ability to interact with the machine. Also, web sockets seem to be a good way of communicating with C2 servers nowadays, so I want that as well. I’ve created a proof of concept application that accomplishes this, and it’s called reinschauer:
It uses a Linux-based server and the client application is obviously created for the Windows platform. An attacker is executing the server component, which is waiting for a client to connect. I’ve implemented two clients: A Go-based client that can be used as a plain exe
. Also, there’s a client based on the .NET framework that has the following benefits:
- Smaller executable compared to the Golang version.
- Can be executed in-memory with Cobaltstrike BOF.NET.
Creating an executable compatible with BOF.NET is quite easy. We only need to implement a class that inherits from BeaconObject
of the BOFNET
NuGet package. Also, it has to call the Main()
method of our project:
using BOFNET;
[...]
namespace reinschauer_dotnet
{
public class BofStuff : BeaconObject
{
public BofStuff(BeaconApi api) : base(api) { }
// has to be called `Go`
public override void Go(string[] args)
{
try
{
Reinschauer.Main(args);
}
catch (Exception ex)
{
BeaconConsole.WriteLine(String.Format("\nException: {0}.", ex));
}
}
}
}
But now, let’s dive into some details that are important when implementing an implant like reinschauer
.
Grabbing Screen Contents, efficiently
Getting and sending screen contents happens quite a few times per second. Therefore, it would be beneficial to implement this piece of code as efficient as possible. In general, I’ve used these steps to minimize load and message size:
- Allocate memory required for the bitmap data once and try to re-use it for every frame.
- Use JPG compression. Most APIs allow specifying a compression level between 0 and 100. This step drastically reduces data volume with almost no impact on the resulting image.
- Scale the image before sending it.
The WinAPI functions BitBlt()
and GetDIBits()
allow reading screen contents into a bitmap. We can use these APIs in Golang by using a wrapper like win:
if !win.BitBlt(MEMORY_DEVICE, 0, 0, int32(WIDTH), int32(HEIGHT), HDC, 0, 0, win.SRCCOPY) {
return nil, errors.New("BitBlt failed")
}
if win.GetDIBits(HDC, BITMAP, 0, uint32(HEIGHT), (*uint8)(memptr), (*win.BITMAPINFO)(unsafe.Pointer(&HEADER)), win.DIB_RGB_COLORS) == 0 {
return nil, errors.New("GetDIBits failed")
}
Using the .NET framework, it’s even more straight forward:
bm = new Bitmap(Screen.PrimaryScreen.Bounds.Width, Screen.PrimaryScreen.Bounds.Height, PixelFormat.Format32bppArgb);
g = Graphics.FromImage(bm);
g.CopyFromScreen(0, 0, 0, 0, bm.Size, CopyPixelOperation.SourceCopy);
resized = new Bitmap(bm, new Size(bm.Width / scaler, bm.Height / scaler));
memStream = new MemoryStream();
resized.Save(memStream, codecfInfo, myEncoderParameters);
Emulating Input
Since the implant is operating on the default desktop, APIs like SendInput() can be used to emulate keyboard and mouse input. The required functionality is similar to an aimbot, since its task is to send keyboard inputs and clicks on a specific pixel on the screen.
The only tricky thing is handling different screen sizes and resolutions correctly: Clicks originating from the attacker’s machine have to be translated to click events on the target machine. But an attacker may use an ultra-wide screen and the target machine may be a laptop with a much smaller screen. Therefore, we can’t just transmit X and Y coordinates and generate click events for these values.
Luckily, the WinAPI can already handle this kind of thing: The MouseInput struct contains a bit field called dwFlags
. It is possible to specify MOUSEEVENTF_ABSOLUTE to normalize the dimensions of the screen:
If MOUSEEVENTF_ABSOLUTE value is specified, dx and dy contain normalized absolute coordinates between 0 and 65,535. The event procedure maps these coordinates onto the display surface. Coordinate (0,0) maps onto the upper-left corner of the display surface; coordinate (65535,65535) maps onto the lower-right corner. In a multimonitor system, the coordinates map to the primary monitor.
This means that click coordinates can be converted to use absolute positioning by using a simple code snippet:
var CONV_BASE = float32(65535)
// The size of the attacker's GUI
curr_size := IMG.Size()
factor_x := CONV_BASE / curr_size.Width
factor_y := CONV_BASE / curr_size.Height
tap_x := int(event.Position.X * factor_x)
tap_y := int(event.Position.Y * factor_y)
Now we can resize the GUI however we want and the click events get translated correctly.
Merging .NET Executables and DLLs
This is only relevant for the .NET variant. When using external libraries like NuGet packages, they need to be bundled with the resulting .NET executable. Otherwise, the implant can not be executed on the target system because the required assemblies are missing. There’s a simple solution for this and it can be automated: The tool ILMerge
allows bundling DLL files into existing exe
files with a single command. Just download the ILMerge
NuGet package, put this into your csproj
file and build the project:
<Target Name="AfterBuild">
<ItemGroup>
<MergeAssemblies Include="$(OutputPath)\reinschauer-dotnet.exe" />
<MergeAssemblies Include="$(OutputPath)\System.Reactive.dll" />
<MergeAssemblies Include="$(OutputPath)\System.ValueTuple.dll" />
<MergeAssemblies Include="$(OutputPath)\System.Runtime.CompilerServices.Unsafe.dll" />
<MergeAssemblies Include="$(OutputPath)\System.Threading.Channels.dll" />
<MergeAssemblies Include="$(OutputPath)\System.Threading.Tasks.Extensions.dll" />
<MergeAssemblies Include="$(OutputPath)\Websocket.Client.dll" />
</ItemGroup>
<PropertyGroup>
<OutputAssembly>$(OutputPath)\reinschauer-dotnet-standalone.exe</OutputAssembly>
<Merger>"$(SolutionDir)\packages\ILMerge.3.0.41\tools\net452\ILMerge.exe"</Merger>
</PropertyGroup>
<Message Text="MERGING: @(MergeAssemblies->'%(Filename)') into $(OutputAssembly)" Importance="High" />
<Exec Command="$(Merger) /out:"$(OutputAssembly)" @(MergeAssemblies->'"%(FullPath)"', ' ')" />
</Target>
The resulting file reinschauer-dotnet-standalone.exe
then contains all DLLs specified with MergeAssemblies
elements.
Tunneling Traffic
Most of the time, establishing direct TCP connections between an attacker and a target machine may not be possible. Therefore, an Internet-facing server is needed to relay traffic to and from both peers. The most convenient way is to tunnel traffic via a Beacon connection that is already existing. This is especially handy for the in-memory variant of reinschauer
.
Beacon implements the rportfwd_local
that allows remote port forwarding to forward the port reinschauer
is listening on to the target machine. The reinschauer
client then only has to connect to localhost to establish a connection. The command rportfwd_local
seems to cause Beacon to listen on 0.0.0.0
and this can’t be changed apparently. Listening on 127.0.0.1
would be enough, but okay, it still works.
An alternative is to use OpenSSH and the GatewayPorts feature. Using that, we can forward the reinschauer
port from the attacker’s machine to an Internet-facing port on another server. More information regarding traffic tunneling can be found in the reinschauer repository on GitHub.