1
0
Fork 0
mirror of https://github.com/zigzap/zap.git synced 2025-10-21 07:34:08 +00:00

merge upstream

This commit is contained in:
Alex Pyattaev 2023-08-22 19:48:39 +03:00
commit 2579a5fff8
22 changed files with 501 additions and 510 deletions

2
.gitignore vendored
View file

@ -14,3 +14,5 @@ scratch
docs/
.DS_Store
.vs/
**/*.perflog
wrk/*.png

View file

@ -1,543 +1,245 @@
# ⚡blazingly fast⚡
I conducted a series of quick tests, using wrk with simple HTTP servers written
in GO and in zig zap. I made sure that all servers only output 17 bytes of HTTP
body.
Initially, I conducted a series of quick tests, using wrk with simple HTTP
servers written in GO and in zig zap. I made sure that all servers only output
17 bytes of HTTP body.
Just to get some sort of indication, I also included measurements for python
since I used to write my REST APIs in python before creating zig zap.
You can check out the scripts I used for the tests in [./wrk](wrk/).
## results
## Why
You can see the verbatim output of `wrk`, and some infos about the test machine
below the code snippets.
I aimed to enhance the performance of my Python + Flask backends by replacing
them with a Zig version. To evaluate the success of this transition, I compared
the performance of a static HTTP server implemented in Python and its Zig
counterpart, which showed significant improvements.
**Update**: I was intrigued comparing to a basic rust HTTP server.
Unfortunately, knowing nothing at all about rust, I couldn't find one and hence
tried to go for the one in [The Rust Programming
Language](https://doc.rust-lang.org/book/ch20-00-final-project-a-web-server.html).
Wanting it to be of a somewhat fair comparison, I opted for the multi-threaded
example. It didn't work out-of-the-box, but I got it to work and changed it to
not read files but outputting a static text just like in the other examples.
**maybe someone with rust experience** can have a look at my
[wrk/rust/hello](wrk/rust/hello) code and tell me why it's surprisingly slow, as
I expected it to be faster than the basic GO example. I'll enable the
GitHub discussions for this matter. My suspicion is bad performance of the
mutexes.
To further assess the Zig server's performance, I compared it with a Go
implementation, to compare against a widely used industry-standard. I expected
similar performance levels but was pleasantly surprised when Zap outperformed Go
by approximately 30% on my test machine.
![](wrk_tables.png)
Intrigued by Rust's reputed performance capabilities, I also experimented with a
Rust version. The results of this experiment are discussed in the
[Flaws](#flaws) section below.
### requests / sec
## What
![](wrk_requests.png)
So, what are the benchmarks testing?
### transfer MB / sec
- simple http servers that reply to GET requests with a constant, 17-bytes long response
- 4 cores are assigned to the subject under test (the respective server)
- 4 cores are assigned to `wrk`
- using 4 threads
- aiming at 400 concurrent connections
![](wrk_transfer.png)
## How
I have fully automated the benchmarks and graph generation.
## zig code
To generate the data:
zig version .11.0-dev.1265+3ab43988c
```zig
const std = @import("std");
const zap = @import("zap");
fn on_request_minimal(r: zap.SimpleRequest) void {
_ = r.sendBody("Hello from ZAP!!!");
}
pub fn main() !void {
var listener = zap.SimpleHttpListener.init(.{
.port = 3000,
.on_request = on_request_minimal,
.log = false,
.max_clients = 100000,
});
try listener.listen();
std.debug.print("Listening on 0.0.0.0:3000\n", .{});
// start worker threads
zap.start(.{
.threads = 4,
.workers = 4,
});
}
```console
$ ./wrk/measure_all.sh
```
## go code
To generate the graphs:
go version go1.16.9 linux/amd64
```go
package main
import (
"fmt"
"net/http"
)
func hello(w http.ResponseWriter, req *http.Request) {
fmt.Fprintf(w, "hello from GO!!!\n")
}
func main() {
print("listening on 0.0.0.0:8090\n")
http.HandleFunc("/hello", hello)
http.ListenAndServe(":8090", nil)
}
```console
$ python wrk/graph.py
```
## python code
For dependencies, please see the [flake.nix](./flake.nix#L46).
python version 3.9.6
## Flaws
```python
# Python 3 server example
from http.server import BaseHTTPRequestHandler, HTTPServer
The benchmarks have limitations, such as the lack of request latencies. The Rust
community has often criticized these benchmarks as biased. However, no such
criticisms have come from the Go or Python communities.
hostName = "127.0.0.1"
serverPort = 8080
In response to the Rust community's concerns, we've added three Rust
implementations for comparison:
- A standard version from [the Rust book](https://doc.rust-lang.org/book/ch20-00-final-project-a-web-server.html).
- An "axum" version to highlight Rust's speed.
- A refined version of the Rust book version.
Originally, the goal was to compare "batteries included" versions, which created
a disparity by comparing the optimized zap / facil.io code with basic bundled
functionalities. These tests were for personal interest and not meant to be
definitive benchmarks.
To address this bias, we've added the Rust-axum and Python-sanic benchmarks. For
more information, refer to the relevant discussions and pull requests.
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("HI FROM PYTHON!!!", "utf-8"))
## More benchmarks?
def log_message(self, format, *args):
return
I often receive requests or PRs to include additional benchmarks, which a lot of
times I find to be either ego-driven or a cause for unnecessary disputes. People
tend to favor their preferred language or framework. Zig, Rust, C, and C++ are
all capable of efficiently creating fast web servers, with different frameworks
potentially excelling in certain benchmarks. My main concern was whether Zap,
given its current level of abstraction, could compete with standard web servers.
This question has been answered, and I see no need for further benchmarks.
if __name__ == "__main__":
webServer = HTTPServer((hostName, serverPort), MyServer)
print("Server started http://%s:%s" % (hostName, serverPort))
## The computer makes the difference
try:
webServer.serve_forever()
except KeyboardInterrupt:
pass
webServer.server_close()
print("Server stopped.")
```
## rust code
[main.rs](wrk/rust/hello/src/main.rs)
```rust
use hello::ThreadPool;
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
let pool = ThreadPool::new(4);
// for stream in listener.incoming().take(2) {
for stream in listener.incoming() {
let stream = stream.unwrap();
pool.execute(|| {
handle_connection(stream);
});
}
println!("Shutting down.");
}
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
After automating the performance benchmarks, I gathered data from three
different computers. It's interesting to see the variation in relative numbers.
let status_line = "HTTP/1.1 200 OK";
### The test machine (graphs in the README)
let contents = "HELLO from RUST!";
To be added when I get home.
let response = format!(
"{}\r\nContent-Length: {}\r\n\r\n{}",
status_line,
contents.len(),
contents
);
### Workstation at work
stream.write_all(response.as_bytes()).unwrap();
stream.flush().unwrap();
}
```
A beast. Many cores (which we don't use).
[lib.rs](wrk/rust/hello/src/lib.rs)
![](./wrk/samples/workstation_req_per_sec_graph.png)
```rust
use std::{
sync::{mpsc, Arc, Mutex},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: Option<mpsc::Sender<Job>>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool {
workers,
sender: Some(sender),
}
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.as_ref().unwrap().send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
drop(self.sender.take());
for worker in &mut self.workers {
println!("Shutting down worker {}", worker.id);
if let Some(thread) = worker.thread.take() {
thread.join().unwrap();
}
}
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
let message = receiver.lock().unwrap().recv();
match message {
Ok(job) => {
// println!("Worker got a job; executing.");
job();
}
Err(_) => {
// println!("Worker disconnected; shutting down.");
break;
}
}
});
Worker {
id,
thread: Some(thread),
}
}
}
```
## wrk output
wrk version: `wrk 4.1.0 [epoll] Copyright (C) 2012 Will Glozer`
![](./wrk/samples/workstation_xfer_per_sec_graph.png)
```
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
Listening on 0.0.0.0:3000
========================================================================
zig
========================================================================
Running 10s test @ http://127.0.0.1:3000
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 331.40us 115.09us 8.56ms 91.94%
Req/Sec 159.51k 9.44k 175.23k 56.50%
Latency Distribution
50% 312.00us
75% 341.00us
90% 375.00us
99% 681.00us
6348945 requests in 10.01s, 0.94GB read
Requests/sec: 634220.13
Transfer/sec: 96.17MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
Listening on 0.0.0.0:3000
========================================================================
zig
========================================================================
Running 10s test @ http://127.0.0.1:3000
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 322.43us 103.25us 3.72ms 86.57%
Req/Sec 166.35k 2.89k 182.78k 68.00%
Latency Distribution
50% 297.00us
75% 330.00us
90% 482.00us
99% 657.00us
6619245 requests in 10.02s, 0.98GB read
Requests/sec: 660803.71
Transfer/sec: 100.20MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
Listening on 0.0.0.0:3000
========================================================================
zig
========================================================================
Running 10s test @ http://127.0.0.1:3000
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 325.47us 105.86us 4.03ms 87.27%
Req/Sec 164.60k 4.69k 181.85k 84.75%
Latency Distribution
50% 300.00us
75% 333.00us
90% 430.00us
99% 667.00us
6549594 requests in 10.01s, 0.97GB read
Requests/sec: 654052.56
Transfer/sec: 99.18MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
listening on 0.0.0.0:8090
========================================================================
go
========================================================================
Running 10s test @ http://127.0.0.1:8090/hello
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 680.63us 692.05us 12.09ms 88.04%
Req/Sec 126.49k 4.28k 139.26k 71.75%
Latency Distribution
50% 403.00us
75% 822.00us
90% 1.52ms
99% 3.34ms
5033360 requests in 10.01s, 643.22MB read
Requests/sec: 502584.84
Transfer/sec: 64.23MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
listening on 0.0.0.0:8090
========================================================================
go
========================================================================
Running 10s test @ http://127.0.0.1:8090/hello
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 683.97us 695.78us 10.01ms 88.04%
Req/Sec 126.31k 4.34k 137.63k 65.00%
Latency Distribution
50% 408.00us
75% 829.00us
90% 1.53ms
99% 3.34ms
5026848 requests in 10.01s, 642.39MB read
Requests/sec: 502149.91
Transfer/sec: 64.17MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
listening on 0.0.0.0:8090
========================================================================
go
========================================================================
Running 10s test @ http://127.0.0.1:8090/hello
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 688.89us 702.75us 12.70ms 88.09%
Req/Sec 126.06k 4.20k 138.00k 70.25%
Latency Distribution
50% 414.00us
75% 836.00us
90% 1.54ms
99% 3.36ms
5015716 requests in 10.01s, 640.97MB read
Requests/sec: 500968.28
Transfer/sec: 64.02MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
Server started http://127.0.0.1:8080
========================================================================
python
========================================================================
Running 10s test @ http://127.0.0.1:8080
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.89ms 101.69ms 1.79s 97.76%
Req/Sec 1.83k 2.11k 7.53k 82.18%
Latency Distribution
50% 215.00us
75% 260.00us
90% 363.00us
99% 485.31ms
34149 requests in 10.02s, 4.33MB read
Socket errors: connect 0, read 34149, write 0, timeout 15
Requests/sec: 3407.63
Transfer/sec: 442.60KB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
Server started http://127.0.0.1:8080
========================================================================
python
========================================================================
Running 10s test @ http://127.0.0.1:8080
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.87ms 90.32ms 1.79s 98.21%
Req/Sec 2.16k 2.17k 7.49k 80.10%
Latency Distribution
50% 234.00us
75% 353.00us
90% 378.00us
99% 363.73ms
43897 requests in 10.02s, 5.57MB read
Socket errors: connect 0, read 43897, write 0, timeout 14
Requests/sec: 4379.74
Transfer/sec: 568.85KB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
Server started http://127.0.0.1:8080
========================================================================
python
========================================================================
Running 10s test @ http://127.0.0.1:8080
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.98ms 51.85ms 1.66s 99.16%
Req/Sec 2.69k 2.58k 7.61k 51.14%
Latency Distribution
50% 234.00us
75% 357.00us
90% 381.00us
99% 568.00us
50165 requests in 10.02s, 6.36MB read
Socket errors: connect 0, read 50165, write 0, timeout 9
Requests/sec: 5004.06
Transfer/sec: 649.95KB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
[rene@nixos:~]$ neofetch --stdout
rene@nixos
----------
OS: NixOS 23.05.2947.475d5ae2c4cb (Stoat) x86_64
Host: LENOVO 1038
Kernel: 6.1.46
Uptime: 26 mins
Packages: 5804 (nix-system), 566 (nix-user)
Shell: bash 5.2.15
Terminal: /dev/pts/2
CPU: Intel Xeon Gold 5218 (64) @ 3.900GHz
GPU: NVIDIA Quadro P620
GPU: NVIDIA Tesla M40
Memory: 1610MiB / 95247MiB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust
Finished release [optimized] target(s) in 0.00s
========================================================================
rust
========================================================================
Running 10s test @ http://127.0.0.1:7878
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.20ms 1.38ms 208.35ms 99.75%
Req/Sec 34.06k 2.00k 38.86k 75.25%
Latency Distribution
50% 1.32ms
75% 1.63ms
90% 1.87ms
99% 2.32ms
1356449 requests in 10.01s, 71.15MB read
Socket errors: connect 0, read 1356427, write 0, timeout 0
Requests/sec: 135446.00
Transfer/sec: 7.10MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust
Finished release [optimized] target(s) in 0.00s
========================================================================
rust
========================================================================
Running 10s test @ http://127.0.0.1:7878
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.21ms 592.89us 10.02ms 63.64%
Req/Sec 32.93k 2.91k 37.94k 80.50%
Latency Distribution
50% 1.31ms
75% 1.64ms
90% 1.90ms
99% 2.48ms
1311445 requests in 10.02s, 68.79MB read
Socket errors: connect 0, read 1311400, write 0, timeout 0
Requests/sec: 130904.50
Transfer/sec: 6.87MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust
Finished release [optimized] target(s) in 0.00s
========================================================================
rust
========================================================================
Running 10s test @ http://127.0.0.1:7878
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.26ms 2.88ms 211.74ms 99.92%
Req/Sec 33.92k 2.04k 38.99k 74.00%
Latency Distribution
50% 1.34ms
75% 1.66ms
90% 1.91ms
99% 2.38ms
1350527 requests in 10.02s, 70.84MB read
Socket errors: connect 0, read 1350474, write 0, timeout 0
Requests/sec: 134830.39
Transfer/sec: 7.07MB
[rene@nixos:~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
Stepping: 7
CPU(s) scaling MHz: 57%
CPU max MHz: 3900,0000
CPU min MHz: 1000,0000
BogoMIPS: 4600,00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs b
ts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_
deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpri
ority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xs
avec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabi
lities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 1 MiB (32 instances)
L1i: 1 MiB (32 instances)
L2: 32 MiB (32 instances)
L3: 44 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Vulnerabilities:
Gather data sampling: Mitigation; Microcode
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Retbleed: Mitigation; Enhanced IBRS
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Mitigation; TSX disabled
```
## test machine
### Work Laptop
Very strange. It absolutely **LOVES** zap 🤣!
![](./wrk/samples/laptop_req_per_sec_graph.png)
![](./wrk/samples/laptop_xfer_per_sec_graph.png)
```
▗▄▄▄ ▗▄▄▄▄ ▄▄▄▖ rs@ryzen
▜███▙ ▜███▙ ▟███▛ --------
▜███▙ ▜███▙▟███▛ OS: NixOS 22.05 (Quokka) x86_64
▜███▙ ▜██████▛ Host: Micro-Star International Co., Ltd. B550-A PRO (MS-7C56)
▟█████████████████▙ ▜████▛ ▟▙ Kernel: 6.0.15
▟███████████████████▙ ▜███▙ ▟██▙ Uptime: 7 days, 5 hours, 29 mins
▄▄▄▄▖ ▜███▙ ▟███▛ Packages: 5950 (nix-system), 893 (nix-user), 5 (flatpak)
▟███▛ ▜██▛ ▟███▛ Shell: bash 5.1.16
▟███▛ ▜▛ ▟███▛ Resolution: 3840x2160
▟███████████▛ ▟██████████▙ DE: none+i3
▜██████████▛ ▟███████████▛ WM: i3
▟███▛ ▟▙ ▟███▛ Terminal: Neovim Terminal
▟███▛ ▟██▙ ▟███▛ CPU: AMD Ryzen 5 5600X (12) @ 3.700GHz
▟███▛ ▜███▙ ▝▀▀▀▀ GPU: AMD ATI Radeon RX 6700/6700 XT / 6800M
▜██▛ ▜███▙ ▜██████████████████▛ Memory: 10378MiB / 32033MiB
▜▛ ▟████▙ ▜████████████████▛
▟██████▙ ▜███▙
▟███▛▜███▙ ▜███▙
▟███▛ ▜███▙ ▜███▙
▝▀▀▀ ▀▀▀▀▘ ▀▀▀▘
➜ neofetch --stdout
rs@nixos
--------
OS: NixOS 23.05.2918.4cdad15f34e6 (Stoat) x86_64
Host: LENOVO 20TKS0W700
Kernel: 6.1.45
Uptime: 1 day, 4 hours, 50 mins
Packages: 6259 (nix-system), 267 (nix-user), 9 (flatpak)
Shell: bash 5.2.15
Resolution: 3840x1600, 3840x2160
DE: none+i3
WM: i3
Terminal: tmux
CPU: Intel i9-10885H (16) @ 5.300GHz
GPU: NVIDIA GeForce GTX 1650 Ti Mobile
Memory: 4525MiB / 31805MiB
➜ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
CPU family: 6
Model: 165
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 56%
CPU max MHz: 5300.0000
CPU min MHz: 800.0000
BogoMIPS: 4800.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke sgx_lc md_clear flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 2 MiB (8 instances)
L3 cache: 16 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Not affected
```

18
flake.lock generated
View file

@ -92,11 +92,11 @@
},
"locked": {
"dir": "contrib",
"lastModified": 1691226237,
"narHash": "sha256-/+JDL1T9dFh2NqCOXqsLSNjrRcsKAMWdJiARq54qx6c=",
"lastModified": 1692702614,
"narHash": "sha256-FeY8hAB77tnUTDbK6WYA+DG3Nx5xQrbOC17Cfl3pTm4=",
"owner": "neovim",
"repo": "neovim",
"rev": "42630923fc00633d806af97c1792b2ed4a71e1cc",
"rev": "014b87646fc3273a09d6b20ebb648a8eb24a0a98",
"type": "github"
},
"original": {
@ -108,11 +108,11 @@
},
"nixpkgs": {
"locked": {
"lastModified": 1691235410,
"narHash": "sha256-kdUw6loESRxuQEz+TJXE9TdSBs2aclaF1Yrro+u8NlM=",
"lastModified": 1692698134,
"narHash": "sha256-YtMmZWR/dlTypOcwiZfZTMPr3tj9fwr05QTStfSyDSg=",
"owner": "nixos",
"repo": "nixpkgs",
"rev": "d814a2776b53f65ea73c7403f3efc2e3511c7dbb",
"rev": "a16f7eb56e88c8985fcc6eb81dabd6cade4e425a",
"type": "github"
},
"original": {
@ -184,11 +184,11 @@
"nixpkgs": "nixpkgs_2"
},
"locked": {
"lastModified": 1691237213,
"narHash": "sha256-RReB+o6jjJXjCHHJSny0p7NR/kNOu57jXEDX7jq9bp0=",
"lastModified": 1692663634,
"narHash": "sha256-wioqr80UOA0tNXaJy4D0i9fFaLG2RoQi5e9Dpd4WojE=",
"owner": "mitchellh",
"repo": "zig-overlay",
"rev": "a9d85674542108318187831fbf376704b71590f3",
"rev": "d666e5137fe0c43353c555fb47748813084decab",
"type": "github"
},
"original": {

View file

@ -48,6 +48,7 @@
wrk
python39
python39Packages.sanic
python39Packages.matplotlib
poetry
poetry
pkgs.rustc
@ -65,6 +66,9 @@
pkgs.zlib
pkgs.icu
pkgs.openssl
pkgs.neofetch
pkgs.util-linux # lscpu
];
buildInputs = with pkgs; [

90
wrk/graph.py Normal file
View file

@ -0,0 +1,90 @@
import re
import os
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from collections import defaultdict
import statistics
directory = "./wrk" # Replace with the actual directory path
requests_sec = defaultdict(list)
transfers_sec = defaultdict(list)
mean_requests = {}
mean_transfers = {}
def plot(kind='', title='', ylabel='', means=None):
# Sort the labels and requests_sec lists together based on the requests_sec values
labels = []
values = []
# silly, I know
for k, v in means.items():
labels.append(k)
values.append(v)
# sort the labels and value lists
labels, values = zip(*sorted(zip(labels, values), key=lambda x: x[1], reverse=True))
# Plot the graph
plt.figure(figsize=(10, 6)) # Adjust the figure size as needed
bars = plt.bar(labels, values)
plt.xlabel("Subject")
plt.ylabel(ylabel)
plt.title(title)
plt.xticks(rotation=45) # Rotate x-axis labels for better readability
# Display the actual values on top of the bars
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width() / 2, yval, f'{yval:,.2f}', ha='center', va='bottom')
plt.tight_layout() # Adjust the spacing of the graph elements
png_name = f"{directory}/{kind.lower()}_graph.png"
plt.savefig(png_name) # Save the graph as a PNG file
print(f"Generated: {png_name}")
if __name__ == '__main__':
if not os.path.isdir(".git"):
print("Please run from root directory of the repository!")
print("e.g. python wrk/graph.py")
import sys
sys.exit(1)
# Iterate over the files in the directory
for filename in os.listdir(directory):
if filename.endswith(".perflog"):
label = os.path.splitext(filename)[0]
file_path = os.path.join(directory, filename)
with open(file_path, "r") as file:
lines = file.readlines()
for line in lines:
# Extract the Requests/sec value using regular expressions
match = re.search(r"Requests/sec:\s+([\d.]+)", line)
if match:
requests_sec[label].append(float(match.group(1)))
match = re.search(r"Transfer/sec:\s+([\d.]+)", line)
if match:
value = float(match.group(1))
if 'KB' in line:
value *= 1024
elif 'MB' in line:
value *= 1024 * 1024
value /= 1024.0 * 1024
transfers_sec[label].append(value)
# calculate means
for k, v in requests_sec.items():
mean_requests[k] = statistics.mean(v)
for k, v in transfers_sec.items():
mean_transfers[k] = statistics.mean(v)
# save the plots
plot(kind='req_per_sec', title='Requests/sec Comparison',
ylabel='requests/sec', means=mean_requests)
plot(kind='xfer_per_sec', title='Transfer/sec Comparison',
ylabel='transfer/sec [MB]', means=mean_transfers)

View file

@ -14,7 +14,7 @@ if [ "$SUBJECT" = "" ] ; then
exit 1
fi
if [ "$SUBJECT" = "zig" ] ; then
if [ "$SUBJECT" = "zig-zap" ] ; then
zig build -Doptimize=ReleaseFast wrk > /dev/null
$TSK_SRV ./zig-out/bin/wrk &
PID=$!
@ -41,20 +41,27 @@ if [ "$SUBJECT" = "python" ] ; then
URL=http://127.0.0.1:8080
fi
if [ "$SUBJECT" = "sanic" ] ; then
if [ "$SUBJECT" = "python-sanic" ] ; then
$TSK_SRV python wrk/sanic/sanic-app.py &
PID=$!
URL=http://127.0.0.1:8000
fi
if [ "$SUBJECT" = "rust" ] ; then
cd wrk/rust/hello && cargo build --release
if [ "$SUBJECT" = "rust-bythebook" ] ; then
cd wrk/rust/bythebook && cargo build --release
$TSK_SRV ./target/release/hello &
PID=$!
URL=http://127.0.0.1:7878
fi
if [ "$SUBJECT" = "axum" ] ; then
if [ "$SUBJECT" = "rust-clean" ] ; then
cd wrk/rust/clean && cargo build --release
$TSK_SRV ./target/release/hello &
PID=$!
URL=http://127.0.0.1:7878
fi
if [ "$SUBJECT" = "rust-axum" ] ; then
cd wrk/axum/hello-axum && cargo build --release
$TSK_SRV ./target/release/hello-axum &
PID=$!
@ -68,7 +75,7 @@ if [ "$SUBJECT" = "csharp" ] ; then
URL=http://127.0.0.1:5026
fi
if [ "$SUBJECT" = "cpp" ] ; then
if [ "$SUBJECT" = "cpp-beast" ] ; then
cd wrk/cpp && zig build -Doptimize=ReleaseFast
$TSK_SRV ./zig-out/bin/cpp-beast 127.0.0.1 8070 . &
PID=$!

20
wrk/measure_all.sh Executable file
View file

@ -0,0 +1,20 @@
#! /usr/bin/env bash
if [ ! -d ".git" ] ; then
echo "This script must be run from the root directory of the repository!"
echo "./wrk/measure_all.sh"
exit 1
fi
SUBJECTS="zig-zap go python python-sanic rust-bythebook rust-clean rust-axum csharp cpp-beast"
rm -f wrk/*.perflog
for S in $SUBJECTS; do
L="$S.perflog"
for R in 1 2 3 ; do
./wrk/measure.sh $S | tee -a wrk/$L
done
done
echo "Finished"

14
wrk/rust/clean/.gitignore vendored Normal file
View file

@ -0,0 +1,14 @@
# Generated by Cargo
# will have compiled files and executables
debug/
target/
# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
Cargo.lock
# These are backup files generated by rustfmt
**/*.rs.bk
# MSVC Windows builds of rustc generate these, which store debugging information
*.pdb

View file

@ -0,0 +1,9 @@
[package]
name = "hello"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
# crossbeam = { version = "0.8.2", features = ["crossbeam-channel"] }

View file

@ -0,0 +1 @@
Hello from RUST!

101
wrk/rust/clean/src/lib.rs Normal file
View file

@ -0,0 +1,101 @@
//Crossbeam should, but does not make this faster.
//use crossbeam::channel::bounded;
use std::{net::TcpStream, sync::mpsc, thread};
type Job = (fn(TcpStream), TcpStream);
type Sender = mpsc::Sender<Job>;
//type Sender = crossbeam::channel::Sender<Job>;
type Receiver = mpsc::Receiver<Job>;
//type Receiver = crossbeam::channel::Receiver<Job>;
pub struct ThreadPool {
workers: Vec<Worker>,
senders: Vec<Sender>,
next_sender: usize,
}
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let mut workers = Vec::with_capacity(size);
let mut senders = Vec::with_capacity(size);
for id in 0..size {
//let (sender, receiver) = bounded(2);
let (sender, receiver) = mpsc::channel();
senders.push(sender);
workers.push(Worker::new(id, receiver));
}
ThreadPool {
workers,
senders,
next_sender: 0,
}
}
/// round robin over available workers to ensure we never have to buffer requests
pub fn execute(&mut self, handler: fn(TcpStream), stream: TcpStream) {
let job = (handler, stream);
self.senders[self.next_sender].send(job).unwrap();
//self.senders[self.next_sender].try_send(job).unwrap();
self.next_sender += 1;
if self.next_sender == self.senders.len() {
self.next_sender = 0;
}
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
self.senders.clear();
for worker in &mut self.workers {
println!("Shutting down worker {}", worker.id);
if let Some(thread) = worker.thread.take() {
thread.join().unwrap();
}
}
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
}
impl Worker {
fn new(id: usize, receiver: Receiver) -> Worker {
let thread = thread::spawn(move || Self::work(receiver));
Worker {
id,
thread: Some(thread),
}
}
fn work(receiver: Receiver) {
loop {
let message = receiver.recv();
match message {
Ok((handler, stream)) => {
// println!("Worker got a job; executing.");
handler(stream);
}
Err(_) => {
// println!("Worker disconnected; shutting down.");
break;
}
}
}
}
}

View file

@ -0,0 +1,41 @@
use hello::ThreadPool;
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
//Creating a massive amount of threads so we can always have one ready to go.
let mut pool = ThreadPool::new(128);
// for stream in listener.incoming().take(2) {
for stream in listener.incoming() {
let stream = stream.unwrap();
//handle_connection(stream);
pool.execute(handle_connection, stream);
}
println!("Shutting down.");
}
fn handle_connection(mut stream: TcpStream) {
stream.set_nodelay(true).expect("set_nodelay call failed");
let mut buffer = [0; 1024];
let nbytes = stream.read(&mut buffer).unwrap();
if nbytes == 0 {
return;
}
let status_line = "HTTP/1.1 200 OK";
let contents = "HELLO from RUST!";
let response = format!(
"{}\r\nContent-Length: {}\r\n\r\n{}",
status_line,
contents.len(),
contents
);
stream.write_all(response.as_bytes()).unwrap();
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

View file

@ -19,6 +19,6 @@ pub fn main() !void {
// start worker threads
zap.start(.{
.threads = 4,
.workers = 4,
.workers = 2, // empirical tests: yield best perf on my machine
});
}