Read and write operations for memory-heavy cyber security data using MMAP

I really enjoy living in the woodland area. You can walk for hours without meeting anyone, which helps to clear up your mind of the constant thinking and pictures it brings along. Sometimes you just need to stop, focus on priorities and push the other thoughts to your brain’s long-term memory storage. That is what we do. And we can even teach our programs to learn that ability as well.

In the cyber security and artificial intelligence area, one of the most common tasks is to store filtered data for future analysis. For instance, when you study behavior of users within your system, you not only need long series of data monitoring user activity, failed logins, permission violation and so on, but, at the same time, you have to analyze the data in real-time with maximum delay being a couple of seconds since the time the data first appeared. To put it simply, you need to be fast.

In order to do so, your program has to find a way to quickly recover data from long-term memory, analyze them in real-time, and leave them, if they are not the priority anymore (there is no security incident detected as the result of the analysis). You can start by keeping the data in the limited short-term dynamically allocated memory, but they will be lost, once the program is restarted or killed.

Of course, the next step could be file persistence, when you can periodically store the data in the long-term memory (filesystem). However, what if the program stops at the time, when a security incident is evaluated? There would be no time for the periodic store to persist the data, hence you would lose the result of the analysis. And some additional orchestration including keeping only relevant data in the limited dynamically allocated memory would be way too slow.

There is, however, a native and efficient approach to keep the data synchronized between memory and files (thus avoiding data loss on restart), while utilizing the limited short-term memory only for actually used, prioritized and analyzed data. And this native functionality is called swap. What does it mean? Swap happens, when the applications running on the operating system consume all the available RAM memory. The operating system then creates a swap file, where it stores some of the less used memory pages. Hence the memory available to the application, the so-called virtual memory, is actually larger than the physical RAM space. There are obviously performance issues connected with swap and thus MMAP, that is why it should not happen very often.

If we just take the swap function, additionally instruct it to mirror everything from the allocated memory to files, we get the functionality we need: Fast, low-level way to persistently store large data, while keeping them instantly available for real-time analysis. On UNIX systems, this “instruction” is quite easy, achieved via calling the memory map (MMAP) function. The signature of the MMAP function is straightforward:

void * mmap (void *address, size_t length, int protect, int flags, int filedes, off_t offset)

1	void * mmap (void *address, size_t length, int protect, int flags, int filedes, off_t offset)

In our case, we are not interested in a specific location in the available memory space our mapping should take place, hence the first argument called address can be NULL. It is up to the kernel to choose the place for the memory mapping. The second argument called length specifies the amount of the memory we want to use. And here comes the trick: the length of the memory used is the same as the size of the file, that we want the mirroring to synchronize the data in. So the first step would be to create a file, whose size must be a multiple of a memory page size (usually 4096 bytes), and then use the size of the file here as the length of the memory.

The third argument called protect must be set to allow both read and write operations in the virtual memory, hence always set to PROT_READ | PROT_WRITE. The most interesting argument is the fourth one named flags, that can instruct the MMAP to use the swapping/mirroring ability. To do so, set the flags to MAP_SHARED, which means that all changes in the memory will be also written to the file and, on top of that, can be used by other processes (here comes the keyword: shared), which is especially useful for parallel tasks.

The fifth argument is a file descriptor of the open file, we want the mapping to store its data to and whose size is the same as the length of the memory specified in the second argument. File descriptor can be obtained via calling the system function open. You see, MMAP is really a fast low-level function, since it requires only the file descriptor instead of more advanced library structures such as FILE obtained via fopen. File descriptor received from open gives MMAP an access to the file without any other orchestration. The last argument, offset, we will just set to zero, since we want to map the entire file, not only its parts (if you need some more orchestration, it is better to do so using structures, since offset is not flexible, because it must always be a multiple of the memory page).

The return value of MMAP is the pointer to the allocated memory, so the behavior is similar to malloc (however, this “virtual memory” cannot be actually freed). The return value can also be MAP_FAILED, if the mapping fails for some reason. Here is a sample program in C using MMAP:

#define _GNU_SOURCE
#include <stdio.h>
#include <time.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>


int main() {
    int file_descriptor;
    size_t page_size;
    char * memory_content;

    // Open the file to be used for "swapping" inside mmap
    file_descriptor = open("./mymemoryfile.bin", O_RDWR | O_CREAT, (mode_t)0644);

    // Size of the file must be based on page size of the system
    // Select any size you want, but multiple it by the memory page size
    page_size = getpagesize();
    fallocate(file_descriptor, 0, 0, page_size * 1024);

    // Obtain the memory and map it to characters/bytes
    // The length of the memory is the same as the size of the mapped file
    // MAP_SHARED - enables "swapping" between the actual RAM and file
    memory_content = (char *)mmap(
        NULL,
        page_size * 1024,
        PROT_READ | PROT_WRITE,
        MAP_SHARED,
        file_descriptor,
        0
    );

    // Read the data
    fprintf(stdout, "Old memory content: '%s'.\n", memory_content);

    // Write new data
    snprintf(memory_content, 1024, "Modification time: %lu", time(NULL));

    // Print the new segment data
    fprintf(stdout, "New memory content: '%s'.\n", memory_content);

    // Make sure to rerun the application to see changes in the output

    return 0;
}

#define _GNU_SOURCE

#include <stdio.h>

#include <time.h>

#include <sys/mman.h>

#include <unistd.h>

#include <fcntl.h>

int main() {

int file_descriptor;

size_t page_size;

char * memory_content;

// Open the file to be used for "swapping" inside mmap

file_descriptor = open("./mymemoryfile.bin", O_RDWR | O_CREAT, (mode_t)0644);

// Size of the file must be based on page size of the system

// Select any size you want, but multiple it by the memory page size

page_size = getpagesize();

fallocate(file_descriptor, 0, 0, page_size * 1024);

// Obtain the memory and map it to characters/bytes

// The length of the memory is the same as the size of the mapped file

// MAP_SHARED - enables "swapping" between the actual RAM and file

memory_content = (char *)mmap(

NULL,

page_size * 1024,

PROT_READ | PROT_WRITE,

MAP_SHARED,

file_descriptor,

);

// Read the data

fprintf(stdout, "Old memory content: '%s'.\n", memory_content);

// Write new data

snprintf(memory_content, 1024, "Modification time: %lu", time(NULL));

// Print the new segment data

fprintf(stdout, "New memory content: '%s'.\n", memory_content);

// Make sure to rerun the application to see changes in the output

return 0;

}

This sample code creates a file named mymemoryfile.bin (if it does not already exist) and sets its size to page_size * 1024 bytes using fallocate, which is a command that works on modern file systems and basically allows to specify the file’s virtual size without being filled with actual bytes, the so-called sparse file. There should be additional checks for every function to see if the opening of the file, allocation etc. was successful, but for the purposes of this sample I decided to avoid them.

Afterwards, the MMAP is called in the way described above with the file and the MAP_SHARED flag. Then the content of the memory is printed to standard console output as string, modified (snprintf) and printed out again. The most interesting part comes, when you try to rerun the application. During the first run, the mymemoryfile.bin is not yet created, hence the output looks as follows:

$ ./mmapsample
Old memory content: ''.
New memory content: 'Modification time: 1650958473'.

$ ./mmapsample

Old memory content: ''.

New memory content: 'Modification time: 1650958473'.

When the memory is used to add the “modification time” message, the result is shown in the output. Now, when you rerun the application again, because of the automatic mirroring of the memory to the file, you should see the message as the “old memory content”:

$ ./mmapsample
Old memory content: 'Modification time: 1650958473'.
New memory content: 'Modification time: 1650958606'.

$ ./mmapsample

Old memory content: 'Modification time: 1650958473'.

New memory content: 'Modification time: 1650958606'.

And yet one more run:

$ ./mmapsample
Old memory content: 'Modification time: 1650958606'.
New memory content: 'Modification time: 1650958620'.

$ ./mmapsample

Old memory content: 'Modification time: 1650958606'.

New memory content: 'Modification time: 1650958620'.

The old content of the actual run is always the same as the new content of the previous run. When you delete the mymemoryfile.bin file and run the application again, there is no content available:

$ ./mmapsample
Old memory content: ''.
New memory content: 'Modification time: 1650958690'.

$ ./mmapsample

Old memory content: ''.

New memory content: 'Modification time: 1650958690'.

Since we are storing only ASCII characters as bytes in the memory/file, you should be able to see the content of the file simply by using cat command:

$ cat mymemoryfile.bin
Modification time: 1650958690

1 2	$ cat mymemoryfile.bin Modification time: 1650958690

That’s it! Just play with the code, try to set different MMAP flags (see: https://linuxhint.com/using_mmap_function_linux/) and measure the performance. In the following article, I would like to continue in the topic and show you more specifically, how MMAP can be used in the cyber security area.

Read and write operations for memory-heavy cyber security data using MMAP

One thought on “Read and write operations for memory-heavy cyber security data using MMAP”

Leave a Reply Cancel reply

Recent Posts

Categories

pagancoder

Sponsored links

Pagan Coder

Sponsored links

Archives

One thought on “Read and write operations for memory-heavy cyber security data using MMAP”

Leave a Reply Cancel reply

Recent Posts

Categories

pagancoder

Sponsored links

Pagan Coder

Follow me

Sponsored links

Archives