Performance Tools Benchmarking

With more load testing tools coming up to the market, performance engineers are in a need to make a better informed decision as to the tool that best suites their needs. One area of concern, is the tools computational resource consumption such as CPU and memory and how it effects the tools performance and ability to generate high loads efficiently.

When the use of resources is excessive and the load testing tool is not performant, this can lead to unreliable results or high performance testing costs.

In our experiment, we’ve created a load testing script of equal load model using 3 popular open-source load testing tools

  1. JMeter
  2. k6
  3. Locust

We’ve run these scripts on a sterile environment and collected performance metrics from the load generator. Overall, we saw that JMeter has severe difficulties generating high load, while locust and K6 performed better, with a slight advantage over K6.


Tools Overview

JMeter

First introduced in 1998, JMeter is one of the longest standing load testing tools. It’s written in Java programming language and implements a thread based architecture, this means that every virtual user is a thread. running on our operating system.

Scripting is done using a GUI but can be extended with scripting code, mostly in the Groovy language.

JMeter supports distributed execution using a manager-worker architecture. This way, we can generate our load from multiple load agents.

JMeter supports various protocols like HTTP, MQTT, JMS, SMTP and many others, and it can be extended with plugins.

JMeter is supported by many Platforms as a Service (PaaS) applications, enabling execution of performance tests in a cloud environment easily.

Since JMeter uses threads to generate load, it’s usually recommended to optimize JAVA heap memory, so as part of our experiment, we will use JMeter both with the default heap size (1 GB) and also with enlarged heap size (4 GB).

Locust

Locust is a python based open-source tool.

Unlike JMeter’s thread based architecture, Locust is based on python asyncio module, which means that it runs on a single thread while I\O operations are performed concurrently.

Scripting is done in Python and Locust provides an easy interface to write performance test scripts, making the scripting super easy and readable.

Locust supports distributed execution using a manager-worker architecture, but unlike JMeter, Locust also allows intercommunication between the nodes, which improves the ability to synchronize between the nodes and share data at run time.

Owing to pythons Global Interpreter Lock (GIL), Locust can only use a single CPU core at a time. To take advantage of multiple cores, it is recommended to instantiate multiple workers on a single machine.

Locust supports various protocols like HTTP, MQTT, JMS, SMTP and many others, and it can be extended with plugins.

To the best of our knowledge there aren’t many (or any) PaaS applications supporting locust for in-cloud execution, which means that the platform for cloud execution needs to be implemented (and maintained) by developers.

k6

k6 has recently been acquired by Grafana-labs, and it is being strongly maintained. Written in Golang, it takes advantage of Golang’s powerful concurrency capabilities.

Unlike JMeter and Locust, k6 does not support distributed execution, for that purpose you’d need to use the commercial version that allows in-cloud distributed execution.

A key advantage of k6 is the ease of integration with visualization tools, namely Grafana, Datadog or CloudWatch, as well as integration with IDEs such as Visual Studio Code or IntelliJ.

K6 supports various protocols like HTTP, MQTT, JMS, SMTP and many others, and it can be extended with plugins.

Feature JMeter K6 Locust
Runtime Java Go Python
Scripting GUI + Groovy JavaScript Python
Architecture Threads Go routine Asyncio
Protocols supported Extensive Extensive Extensive
Plugin extension Difficult Easy Very Easy
Distributed Mode Supported Commercial Supported
Tools Comparison

Experimental material

Test Setup

To evaluate the performance of the 3 tools, we first set up a testing environment using a m4-large EC2 instance on AWS, It has 2 vCPU cores and 8 GB of memory. We installed all the required prerequisites. We used AWS cloud-watch to gather performance insights from the EC2 Instance.

Mode Configuration Reference
Vanilla JMeter Execute load with JMeter, using the default 1GB heap size link
JMeter-4GB Execute load with JMeter, using the a 4GB heap size link
k6 Execute load with k6
link

Locust Single
Execute load with Locust in non distributed mode link
Locust Distributed Difficult link
Experiments

Software Under Test

Our Software Under Test (SUT) is a demo ‘pet clinic’ website developed by yCrash, we are using its root path as our targeted API. It runs on EC2 Instance t3a.medium with 2 vCPUs, 4 GB of memory and a standalone load balancer.

Performance Test Scenario

Our performance test scenario is very simple, we spin up 1000 virtual users in a ramp-up period of 60 seconds, then each of the virtual users sends an HTTP request every 1-5 seconds in a uniform distribution. The load was kept for 1 hour.

Measurements

  1. CPU usage – collected from AWSs cloud-watch
  2. Memory usage – collected from AWSs cloud-watch
  3. Request rate – collected individually from the tools reporter
  4. Network bytes sent – collected from AWSs cloud-watch

Results

CPU Consumption

JMeter with default heap size
JMeter with default heap size
JMeter with 4GB heap size
JMeter with 4GB heap size
Locust single executor
Locust single executor
Locust with 2 workers
Locust with 2 workers
k6
k6

JMeter consumed around 20% of the CPU and no difference was observed between the default 1 GB heap size and 4 GB heap size.

Locust, when executed with a single worker, consumed 40% of the CPU, and when bumped up to 2 worker nodes, CPU consumption went up to 45%.

K6 consumed slightly more than 40% of the CPU.

So at a glimpse, it might seem like JMeter consumes far less CPU than the 2 other tools.

Memory Consumption

JMeter with default heap size
JMeter with default heap size
JMeter with 4GB heap size
JMeter with 4GB heap size
Locust single executor
Locust single executor
Locust with 2 workers
Locust with 2 workers
K6

JMeter went up to 20% of memory usage, and like with CPU consumption, it doesn’t seem to make any difference whether it was configured to 1 GB of heap size or 4 GB. Locust in single mode consumed 4.8% of memory while Locust with 2 executors consumed 5.7%. K6 Memory usage progressed gradually from 13% to 18%

It seems like in all instances, memory usage was not an issue, with JMeter performing worst, k6 performing slightly better, and Locust being most efficient at memory use.

Request Rate

JMeter with default heap size
JMeterwith 4GB heap size:
JMeter with 4GB heap size
Locust single executor
Locust single executor
Locust with 2 workers
Locust with 2 workers
k6 Request rate
K6

When looking at request rates we see that all 5 executions performed ~315 requests per second.

However, locust performed with 0 errors, K6 performed with 59 errors in total, which is negligible (0.006%), while JMeter performed at a constant rate of about 71 failures per second, around 21% of total requests failed. Results were similar when executed with 4 GB of heap size.

So, despite the fact that the CPU load was down by half in JMeter, JMeter was out-performed by Locust and K6 when it comes to success rates.

When looking at the errors in JMeters dashboard, we can see the error types:

JMeter dashboard
JMeter dashboard

Non HTTP response code: org.apache.http.NoHttpResponseException/Non HTTP response message: petclinic.ycrash.io:443 failed to respond

This is a low level communication problem.

The interpretation we can give to this, is that the CPU performs many contexts switching between the threads. And despite the fact that it is not shown in the CPU consumption, it nonetheless stifles the ability to generate high loads on a single instance.

Network – bytes sent

JMeter with default heap size
JMeter with default heap size
JMeter with 4GB heap size
JMeter with 4GB heap size
Locust single executor
Locust single executor
Locust with 2 workers
Locust with 2 workers
k6
k6

Locust did not show any difference in network traffic when executed in parallel mode vs a single process. In both cases, traffic went up to 25M bytes per minute. The expectation was that adding additional workers would improve performance, but it did not seem to be the case in this experiment.

One possible explanation for this could be that any improvement gained by using the 2 cores is lost to costly inter-process communication between the manager and the two workers. K6 performed slightly better than Locust with 30M bytes per minute. JMeter however, lags far behind with 12M bytes per minute, demonstrating a severe performance bottleneck. This remains the same regardless of the heap size being setup to 1 GB or 4 GB.

Conclusion

To our best knowledge, this is the first attempt to compare different load testing tools one against another. Our experiment shows that JMeter can be costly and requires more computational resources than the other tools evaluated.

While JMeter failed to generate load from 1000 virtual users from a single m4 instance, both k6 and Locust had successfully done so with no access resource usage. Furthermore, our experiment shows a slight advantage to k6 over Locust when it comes to network traffic, which is expected due to Golang concurrency support.

While our experiment shows that JMeter requires more computational resources, further experiments need to be done to evaluate the extent to which JMeter is more costly and what might be the implications of that on performance assurance efforts.

Acknowledgement

We would like to thank our friends at yCrash for allowing us to use their pet clinic website as a demo application. yCrash is a state-of-the-art troubleshooting and root cause analysis tool for Java applications, using cutting edge log file analysis, captures 360-degree artifacts from your technology stack such as garbage collection logs, thread dumps, heap dumps, netstat, vmstat, kernel logs analyzes them and instantly identifies the root cause of the problem.

References

Performance Tools Benchmarking

2 thoughts on “Performance Tools Benchmarking”

  1. “failed to respond” usually indicates that indeed the server failed to respond. Default JMeter will always fail on this unless you add some retry. Most other tools already have some retry implemented which would explain the diferrence but as a test I’d much rather know if there’s a network issue than miss it because of build-in retries.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Join the fastest growing Performance Engineers club at ClubhouseSHOW ME HOW
+ +
Share via
Copy link