Anatomy of DDoS attacks

Mon Dec 06 2021

Authored by Kevin Jackson (Principal Software Engineer at THG)

As part of the Soteria research project at THG, we needed to look at DDoS attacks, their features and how to generate the amount of traffic required to simulate an actual attack.

There are few resources available for folks researching cyber attacks and DDoS attacks in general. Attack data corpus are few and far between and typically very small samples or synthetic/generated data.

There are many approaches to DDoS mitigation, from simple rate-limiting and dropping traffic, to manipulating iptables/host firewall rules in reaction to signalling, to upstream routers to initiate congestion-control-like approaches.

Which of these approaches warrants further investigation depends upon factors such as where in the network the mitigation will sit; from directly on attacked host [iptables], to between the datacentre and the “wild internet” [BGP and routing based]. Large solution vendors (e.g. Cloudflare, Akamai Prolexic) in this space typically favour being between the attack and the customer datacentre.

post 4 image 1

A subset of the current DDoS attack space

In order to verify approaches to DDoS mitigation and alongside literature review of prior-art around the problem, we needed to create a DDoS simulation test-bed.

Simulating a real attack

By definition a DDoS attack has the capability to cause harm to a network and attached services. As such even creating a controlled simulation of an attack should be a task taken with great care.

To ensure that the attack software didn’t impact our operations, the first step was to build a separate network fabric and devices which would be decoupled from anything production related. However for the simulation to provide useful information for us, the design of the test network had to match the production network — same hardware devices, switches, same SDN, same NIC provider, same drivers etc.

Simplified ‘equivalent’ network environment

Other aspects of the network environment to take into consideration are;

  • the DDoS traffic generation software should not be limited by any hardware bottlenecks
  • the switches etc should not ‘mitigate’ the attack due to memory constraints or other configuration limits

Essentially we need to ensure that the traffic being generated as the ‘attack’ is all received at the ‘attacked device’ network card and packets are not being lost along the way due to incorrect config (poorly sized packet buffers etc).

Working out all the points in the network where packets could be lost

With a test environment / laboratory configured we had a safe platform to test the DDoS packet generation software. Next we had to evaluate the various tools and software packages for packet generation at scale.

hping3, trex & bonesi

After an initial evaluation period we focused our efforts on the following; hping3, trex and bonesi. Each of these could generate packets in a fashion that could simulate various types of attacks. As we required a scriptable or progammable solution, it was important that each of these have an API which we can program against.

DDoS Botnet Simulator is a Tool to simulate Botnet Traffic in a testbed environment on the wire —

We considered bonesi as it is a C project and that fits well with the rest of the code the team is working with and, it has features around TCP traffic that are harder to find in other packages. It’s also explicitly designed to simulate a botnet-based attack which is a good foundation for learning about how a botnet could be constructed and used.

hping is a command-line oriented TCP/IP packet assembler/analyzer —

Hping3 is again a C project, but this time the API is a Tcl/Tk script interface which allows for more rapid prototyping of ideas. It’s fairly old at this point and there is a decent amount of documentation around it.

While hping3 and bonesi are designed to create Denial of Service style attacks, trex is a little different in that it is designed around generating (realistic) packets and is less focused purely on security testing and more on generating packets quickly and efficiently:

TRex is an open source, low cost, stateful and stateless traffic generator fuelled by DPDK. —

For our DDoS simulation we required the following additional characteristics:

  • repeatable
  • controllable
  • the packet generator must be able to saturate the network interface


A key focus for the team was to ensure that the simulated DDoS packets/traffic although similar in content, flow rate etc as a real attack, would be 100% reproducible. This meant that we had to remove source of randomness from the generation and instead use seeded random-number-generators in the code.


It was of utmost importance that the packet generation was under control at all times and the process could easily be killed if needed — for fairly obvious reasons! We also had a requirement to be able to feed “industry standard” pcapfiles into the packet generator so we could use these files as the corpus of packets that we wished to model our simulated or generated packets on.

Network interface saturation

As the network lab has limited hardware capacity, it was important from an efficiency (and cost) point of view that we could generate sufficient traffic to simulate an attack using the least resources possible. Ideally we should be limited only by the network interface capability and the software stack should scale to use the resources available.

Crafting the ‘attack’

With a set of requirements defined and short-list of candidate tools to use, theteam started testing the tooling and modifying where required.

Modified Trex code

Each packet “stream” takes a field engine (udp_fe in this case) which uses the random seed to generate values in the range specified by the field engine. As one of our requirements is reproducibility, it was important to us to specify a random_seed value so we could guarantee that each stream would generate the same data.

This required a small adjustment to TRex to allow us to manipulate the udp field engine to our liking. We also modified the code to allow us to feed in a pcap file and use that along with the normally generated packets. Using a pcap as the source of some of the packets allows us to determine the distribution of different types of packets.

With the changes to TRex implemented, we could fire-up the test and see what the performance was.

Stats from the packet generation machine

As we started the test, we recorded the key statistics from the packet generation device. Here we can see that we are generating about 53 million packets per second. This initial attack consumed 16 CPU cores on the packet generation device.

34 million packets / second received

However on the device under test, we are only receiving 34 million packets per second. Despite this drop in packets received (something we need to investigate further), the test proved that with the correct configuration, and some modifications to how the packets are generated for our usecase, we are able to generate enough packets to simulate an attack which would cause disruption in a real-world scenario.

These stats from the device under test, were retrieved via some XDP code that was used to record the received packets.

static bool map_collect(int fd, __u32 map_type, __u32 key, struct record *rec)
	struct datarec value;

	/* Get time as close as possible to reading map contents */
	rec->timestamp = gettime();

	switch (map_type) {
		map_get_value_array(fd, key, &value);
		map_get_value_percpu_array(fd, key, &value);
		fprintf(stderr, "ERR: Unknown map_type(%u) cannot handle\n",
		return false;

	rec->total.rx_packets = value.rx_packets;
	rec->total.rx_bytes   = value.rx_bytes;
	rec->total.rx_tcp_packets = value.rx_tcp_packets;
	rec->total.rx_tcp_bytes = value.rx_tcp_bytes;
	rec->total.rx_syn_packets = value.rx_syn_packets;
	rec->total.rx_syn_bytes = value.rx_syn_bytes;
	rec->total.rx_rst_packets = value.rx_rst_packets;
	rec->total.rx_rst_bytes = value.rx_rst_bytes;
	rec->total.rx_udp_packets = value.rx_udp_packets;
	rec->total.rx_udp_bytes = value.rx_udp_bytes;
	return true;

static void stats_collect(int map_fd, __u32 map_type,
			  struct stats_record *stats_rec)
	/* Collect all XDP actions stats  */
	__u32 key;

	for (key = 0; key < XDP_ACTION_MAX; key++) {
		map_collect(map_fd, map_type, key, &stats_rec->stats[key]);

static int stats_poll(const char *pin_dir, int map_fd, __u32 id,
		      __u32 map_type, int interval)
	struct bpf_map_info info = {};
	struct stats_record prev, record = { 0 };

	/* Trick to pretty printf with thousands separators use %' */
	setlocale(LC_NUMERIC, "en_US");

	/* Get initial reading quickly */
	stats_collect(map_fd, map_type, &record);

	while (1) {
		prev = record; /* struct copy */

		map_fd = open_bpf_map_file(pin_dir, "xdp_stats_map", &info);
		if (map_fd < 0) {
			return EXIT_FAIL_BPF;
		} else if (id != {
			printf("BPF map xdp_stats_map changed its ID, restarting\n");
			return 0;

		stats_collect(map_fd, map_type, &record);
		stats_print(&record, &prev);

	return 0;

void map_get_value_array(int fd, __u32 key, struct datarec *value)
	if ((bpf_map_lookup_elem(fd, &key, value)) != 0) {
			"ERR: bpf_map_lookup_elem failed key:0x%X\n", key);

void map_get_value_percpu_array(int fd, __u32 key, struct datarec *value)
	/* For percpu maps, userspace gets a value per possible CPU */
	unsigned int nr_cpus = bpf_num_possible_cpus();
	struct datarec values[nr_cpus];
	__u64 sum_bytes = 0;
	__u64 sum_pkts = 0;
	__u64 sum_tcp_pkts = 0;
	__u64 sum_tcp_bytes = 0;
	__u64 sum_syn_pkts = 0;
	__u64 sum_syn_bytes = 0;
	__u64 sum_rst_pkts = 0;
	__u64 sum_rst_bytes = 0;
	__u64 sum_udp_pkts = 0;
	__u64 sum_udp_bytes = 0;
	int i;

	if ((bpf_map_lookup_elem(fd, &key, values)) != 0) {
			"ERR: bpf_map_lookup_elem failed key:0x%X\n", key);

	/* Sum values from each CPU */
	for (i = 0; i < nr_cpus; i++) {
		sum_pkts  += values[i].rx_packets;
		sum_bytes += values[i].rx_bytes;
		sum_tcp_pkts += values[i].rx_tcp_packets;
		sum_tcp_bytes += values[i].rx_tcp_bytes;
		sum_syn_pkts += values[i].rx_syn_packets;
		sum_syn_bytes += values[i].rx_syn_bytes;
		sum_rst_pkts += values[i].rx_rst_packets;
		sum_rst_bytes += values[i].rx_rst_bytes;
		sum_udp_pkts += values[i].rx_udp_packets;
		sum_udp_bytes += values[i].rx_udp_bytes;
	value->rx_packets = sum_pkts;
	value->rx_bytes   = sum_bytes;
	value->rx_tcp_packets = sum_tcp_pkts;
	value->rx_tcp_bytes = sum_tcp_bytes;
	value->rx_syn_packets = sum_syn_pkts;
	value->rx_syn_bytes = sum_syn_bytes;
	value->rx_rst_packets = sum_rst_pkts;
	value->rx_rst_bytes = sum_rst_bytes;
	value->rx_udp_packets = sum_udp_pkts;
	value->rx_udp_bytes = sum_udp_bytes;

static void stats_print(struct stats_record *stats_rec,
			struct stats_record *stats_prev)
	struct record *rec;
	struct record *prev;
	__u64 packets, bytes, tcp_packets, tcp_bytes, udp_bytes, udp_packets, syn_packets, syn_bytes;
	__u64 rst_packets, rst_bytes;
	double period;
	double pps, bps, tcp_pps, tcp_bps, udp_pps, udp_bps, syn_pps, syn_bps; /* packets per sec */
	double rst_pps, rst_bps;

	stats_print_header(); /* Print stats "header" */
	rec  = &stats_rec->stats[2];
	prev = &stats_prev->stats[2];

	period = calc_period(rec, prev);
	if (period == 0)

	tcp_packets = rec->total.rx_tcp_packets - prev->total.rx_tcp_packets;
	tcp_pps = tcp_packets / period;

	tcp_bytes = rec->total.rx_tcp_bytes - prev->total.rx_tcp_bytes;
	tcp_bps = (tcp_bytes * 8) / period;

	syn_packets = rec->total.rx_syn_packets - prev->total.rx_syn_packets;
	syn_pps = syn_packets / period;

	syn_bytes = rec->total.rx_syn_bytes - prev->total.rx_syn_bytes;
	syn_bps = (syn_bytes * 8) / period;

	rst_packets = rec->total.rx_rst_packets - prev->total.rx_rst_packets;
	rst_pps = rst_packets / period;

	rst_bytes = rec->total.rx_rst_bytes - prev->total.rx_rst_bytes;
	rst_bps = (rst_bytes * 8) / period;

	udp_packets = rec->total.rx_udp_packets - prev->total.rx_udp_packets;
	udp_pps = udp_packets / period;

	udp_bytes = rec->total.rx_udp_bytes - prev->total.rx_udp_bytes;
	udp_bps = (udp_bytes * 8) / period;

	packets = rec->total.rx_packets - prev->total.rx_packets;
	pps     = packets / period;

	bytes   = rec->total.rx_bytes - prev->total.rx_bytes;
	bps     = (bytes * 8)/ period;

	printf("TCP: %12s %12s %12s %12s  \n", calculateInt(rec->total.rx_tcp_packets), calculateSize(rec->total.rx_tcp_bytes), calculateInt(tcp_pps), calculateSize(tcp_bps));
	printf("SYN: %12s %12s %12s %12s \n", calculateInt(rec->total.rx_syn_packets), calculateSize(rec->total.rx_syn_bytes), calculateInt(syn_pps), calculateSize(syn_bps));
	printf("RST: %12s %12s %12s %12s \n", calculateInt(rec->total.rx_rst_packets), calculateSize(rec->total.rx_rst_bytes), calculateInt(rst_pps), calculateSize(rst_bps));
	printf("UDP: %12s %12s %12s %12s \n", calculateInt(rec->total.rx_udp_packets), calculateSize(rec->total.rx_udp_bytes), calculateInt(udp_pps), calculateSize(udp_bps));
	printf("Tot: %12s %12s %12s %12s", calculateInt(rec->total.rx_packets), calculateSize(rec->total.rx_bytes), calculateInt(pps), calculateSize(bps));
Core functions of XDP packet stats recording code

Next steps

With the initial testing of the setup completed the team came up with some refinements and ideas for improving the testing.

First we needed to capture the stats of packets received at various places through the network fabric so we could discover the source of the packet losses. This is essential to allow us to have a robust testing platform for the future.

After we have managed to debug and resolve all the packet losses in the test network, our plans involve packet fingerprinting and how we can adjust packet features to avoid fingerprinting.

The code for packet generation can be found in the Soteria Research github repository.