Introduction

When analysing malware, string extraction is one of the first things to do to briefly extract useful information such as IP address, domains, functions, data, or any other information that has not been removed by the developer. This short notebook will leverage strings extraction coupled with graph theory to understand and show strings similarity between a set of malware.

Import

Directory of your Samples

Create list

Strings Comparison

String similarity functions measure how similar two strings are. The unit that measures string similarity is the distance between strings. By setting distance thresholds it is possible to use string metrics to identify similar but different strings. This is a useful property for spotting patterns in binaries.

Full Strings Correlation

This section will extract all the strings from your list of samples and compare them using the jacquard distance.

Specific Strings Correlation

In this section we will extract specific strings from the binaries and compare them. The first thing to do is to modify the following variables. In the following example, we are looking for strings related to the C2 connexion.

If you like this content you can follow me on Twitter @fr0gger_ for more stuffs such as this one. ❤