1
0
Fork 0
mirror of https://github.com/zigzap/zap.git synced 2025-10-21 07:34:08 +00:00

merge upstream

This commit is contained in:
Alex Pyattaev 2023-08-22 19:48:39 +03:00
commit 2579a5fff8
22 changed files with 501 additions and 510 deletions

2
.gitignore vendored
View file

@ -14,3 +14,5 @@ scratch
docs/ docs/
.DS_Store .DS_Store
.vs/ .vs/
**/*.perflog
wrk/*.png

View file

@ -1,543 +1,245 @@
# ⚡blazingly fast⚡ # ⚡blazingly fast⚡
I conducted a series of quick tests, using wrk with simple HTTP servers written Initially, I conducted a series of quick tests, using wrk with simple HTTP
in GO and in zig zap. I made sure that all servers only output 17 bytes of HTTP servers written in GO and in zig zap. I made sure that all servers only output
body. 17 bytes of HTTP body.
Just to get some sort of indication, I also included measurements for python Just to get some sort of indication, I also included measurements for python
since I used to write my REST APIs in python before creating zig zap. since I used to write my REST APIs in python before creating zig zap.
You can check out the scripts I used for the tests in [./wrk](wrk/). You can check out the scripts I used for the tests in [./wrk](wrk/).
## results ## Why
You can see the verbatim output of `wrk`, and some infos about the test machine I aimed to enhance the performance of my Python + Flask backends by replacing
below the code snippets. them with a Zig version. To evaluate the success of this transition, I compared
the performance of a static HTTP server implemented in Python and its Zig
counterpart, which showed significant improvements.
**Update**: I was intrigued comparing to a basic rust HTTP server. To further assess the Zig server's performance, I compared it with a Go
Unfortunately, knowing nothing at all about rust, I couldn't find one and hence implementation, to compare against a widely used industry-standard. I expected
tried to go for the one in [The Rust Programming similar performance levels but was pleasantly surprised when Zap outperformed Go
Language](https://doc.rust-lang.org/book/ch20-00-final-project-a-web-server.html). by approximately 30% on my test machine.
Wanting it to be of a somewhat fair comparison, I opted for the multi-threaded
example. It didn't work out-of-the-box, but I got it to work and changed it to
not read files but outputting a static text just like in the other examples.
**maybe someone with rust experience** can have a look at my
[wrk/rust/hello](wrk/rust/hello) code and tell me why it's surprisingly slow, as
I expected it to be faster than the basic GO example. I'll enable the
GitHub discussions for this matter. My suspicion is bad performance of the
mutexes.
![](wrk_tables.png) Intrigued by Rust's reputed performance capabilities, I also experimented with a
Rust version. The results of this experiment are discussed in the
[Flaws](#flaws) section below.
### requests / sec ## What
![](wrk_requests.png) So, what are the benchmarks testing?
### transfer MB / sec - simple http servers that reply to GET requests with a constant, 17-bytes long response
- 4 cores are assigned to the subject under test (the respective server)
- 4 cores are assigned to `wrk`
- using 4 threads
- aiming at 400 concurrent connections
![](wrk_transfer.png) ## How
I have fully automated the benchmarks and graph generation.
## zig code To generate the data:
zig version .11.0-dev.1265+3ab43988c ```console
$ ./wrk/measure_all.sh
```zig
const std = @import("std");
const zap = @import("zap");
fn on_request_minimal(r: zap.SimpleRequest) void {
_ = r.sendBody("Hello from ZAP!!!");
}
pub fn main() !void {
var listener = zap.SimpleHttpListener.init(.{
.port = 3000,
.on_request = on_request_minimal,
.log = false,
.max_clients = 100000,
});
try listener.listen();
std.debug.print("Listening on 0.0.0.0:3000\n", .{});
// start worker threads
zap.start(.{
.threads = 4,
.workers = 4,
});
}
``` ```
## go code To generate the graphs:
go version go1.16.9 linux/amd64 ```console
$ python wrk/graph.py
```go
package main
import (
"fmt"
"net/http"
)
func hello(w http.ResponseWriter, req *http.Request) {
fmt.Fprintf(w, "hello from GO!!!\n")
}
func main() {
print("listening on 0.0.0.0:8090\n")
http.HandleFunc("/hello", hello)
http.ListenAndServe(":8090", nil)
}
``` ```
## python code For dependencies, please see the [flake.nix](./flake.nix#L46).
python version 3.9.6 ## Flaws
```python The benchmarks have limitations, such as the lack of request latencies. The Rust
# Python 3 server example community has often criticized these benchmarks as biased. However, no such
from http.server import BaseHTTPRequestHandler, HTTPServer criticisms have come from the Go or Python communities.
hostName = "127.0.0.1" In response to the Rust community's concerns, we've added three Rust
serverPort = 8080 implementations for comparison:
- A standard version from [the Rust book](https://doc.rust-lang.org/book/ch20-00-final-project-a-web-server.html).
- An "axum" version to highlight Rust's speed.
- A refined version of the Rust book version.
Originally, the goal was to compare "batteries included" versions, which created
a disparity by comparing the optimized zap / facil.io code with basic bundled
functionalities. These tests were for personal interest and not meant to be
definitive benchmarks.
To address this bias, we've added the Rust-axum and Python-sanic benchmarks. For
more information, refer to the relevant discussions and pull requests.
class MyServer(BaseHTTPRequestHandler): ## More benchmarks?
def do_GET(self):
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("HI FROM PYTHON!!!", "utf-8"))
def log_message(self, format, *args): I often receive requests or PRs to include additional benchmarks, which a lot of
return times I find to be either ego-driven or a cause for unnecessary disputes. People
tend to favor their preferred language or framework. Zig, Rust, C, and C++ are
all capable of efficiently creating fast web servers, with different frameworks
potentially excelling in certain benchmarks. My main concern was whether Zap,
given its current level of abstraction, could compete with standard web servers.
This question has been answered, and I see no need for further benchmarks.
if __name__ == "__main__": ## The computer makes the difference
webServer = HTTPServer((hostName, serverPort), MyServer)
print("Server started http://%s:%s" % (hostName, serverPort))
try: After automating the performance benchmarks, I gathered data from three
webServer.serve_forever() different computers. It's interesting to see the variation in relative numbers.
except KeyboardInterrupt:
pass
webServer.server_close()
print("Server stopped.")
```
## rust code
[main.rs](wrk/rust/hello/src/main.rs)
```rust
use hello::ThreadPool;
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
let pool = ThreadPool::new(4);
// for stream in listener.incoming().take(2) {
for stream in listener.incoming() {
let stream = stream.unwrap();
pool.execute(|| {
handle_connection(stream);
});
}
println!("Shutting down.");
}
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let status_line = "HTTP/1.1 200 OK"; ### The test machine (graphs in the README)
let contents = "HELLO from RUST!"; To be added when I get home.
let response = format!( ### Workstation at work
"{}\r\nContent-Length: {}\r\n\r\n{}",
status_line,
contents.len(),
contents
);
stream.write_all(response.as_bytes()).unwrap(); A beast. Many cores (which we don't use).
stream.flush().unwrap();
}
```
[lib.rs](wrk/rust/hello/src/lib.rs) ![](./wrk/samples/workstation_req_per_sec_graph.png)
```rust ![](./wrk/samples/workstation_xfer_per_sec_graph.png)
use std::{
sync::{mpsc, Arc, Mutex},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: Option<mpsc::Sender<Job>>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool {
workers,
sender: Some(sender),
}
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.as_ref().unwrap().send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
drop(self.sender.take());
for worker in &mut self.workers {
println!("Shutting down worker {}", worker.id);
if let Some(thread) = worker.thread.take() {
thread.join().unwrap();
}
}
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
let message = receiver.lock().unwrap().recv();
match message {
Ok(job) => {
// println!("Worker got a job; executing.");
job();
}
Err(_) => {
// println!("Worker disconnected; shutting down.");
break;
}
}
});
Worker {
id,
thread: Some(thread),
}
}
}
```
## wrk output
wrk version: `wrk 4.1.0 [epoll] Copyright (C) 2012 Will Glozer`
``` ```
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig [rene@nixos:~]$ neofetch --stdout
Listening on 0.0.0.0:3000 rene@nixos
======================================================================== ----------
zig OS: NixOS 23.05.2947.475d5ae2c4cb (Stoat) x86_64
======================================================================== Host: LENOVO 1038
Running 10s test @ http://127.0.0.1:3000 Kernel: 6.1.46
4 threads and 400 connections Uptime: 26 mins
Thread Stats Avg Stdev Max +/- Stdev Packages: 5804 (nix-system), 566 (nix-user)
Latency 331.40us 115.09us 8.56ms 91.94% Shell: bash 5.2.15
Req/Sec 159.51k 9.44k 175.23k 56.50% Terminal: /dev/pts/2
Latency Distribution CPU: Intel Xeon Gold 5218 (64) @ 3.900GHz
50% 312.00us GPU: NVIDIA Quadro P620
75% 341.00us GPU: NVIDIA Tesla M40
90% 375.00us Memory: 1610MiB / 95247MiB
99% 681.00us
6348945 requests in 10.01s, 0.94GB read
Requests/sec: 634220.13
Transfer/sec: 96.17MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
Listening on 0.0.0.0:3000
========================================================================
zig
========================================================================
Running 10s test @ http://127.0.0.1:3000
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 322.43us 103.25us 3.72ms 86.57%
Req/Sec 166.35k 2.89k 182.78k 68.00%
Latency Distribution
50% 297.00us
75% 330.00us
90% 482.00us
99% 657.00us
6619245 requests in 10.02s, 0.98GB read
Requests/sec: 660803.71
Transfer/sec: 100.20MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
Listening on 0.0.0.0:3000
========================================================================
zig
========================================================================
Running 10s test @ http://127.0.0.1:3000
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 325.47us 105.86us 4.03ms 87.27%
Req/Sec 164.60k 4.69k 181.85k 84.75%
Latency Distribution
50% 300.00us
75% 333.00us
90% 430.00us
99% 667.00us
6549594 requests in 10.01s, 0.97GB read
Requests/sec: 654052.56
Transfer/sec: 99.18MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
listening on 0.0.0.0:8090
========================================================================
go
========================================================================
Running 10s test @ http://127.0.0.1:8090/hello
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 680.63us 692.05us 12.09ms 88.04%
Req/Sec 126.49k 4.28k 139.26k 71.75%
Latency Distribution
50% 403.00us
75% 822.00us
90% 1.52ms
99% 3.34ms
5033360 requests in 10.01s, 643.22MB read
Requests/sec: 502584.84
Transfer/sec: 64.23MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
listening on 0.0.0.0:8090
========================================================================
go
========================================================================
Running 10s test @ http://127.0.0.1:8090/hello
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 683.97us 695.78us 10.01ms 88.04%
Req/Sec 126.31k 4.34k 137.63k 65.00%
Latency Distribution
50% 408.00us
75% 829.00us
90% 1.53ms
99% 3.34ms
5026848 requests in 10.01s, 642.39MB read
Requests/sec: 502149.91
Transfer/sec: 64.17MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
listening on 0.0.0.0:8090
========================================================================
go
========================================================================
Running 10s test @ http://127.0.0.1:8090/hello
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 688.89us 702.75us 12.70ms 88.09%
Req/Sec 126.06k 4.20k 138.00k 70.25%
Latency Distribution
50% 414.00us
75% 836.00us
90% 1.54ms
99% 3.36ms
5015716 requests in 10.01s, 640.97MB read
Requests/sec: 500968.28
Transfer/sec: 64.02MB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
Server started http://127.0.0.1:8080
========================================================================
python
========================================================================
Running 10s test @ http://127.0.0.1:8080
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.89ms 101.69ms 1.79s 97.76%
Req/Sec 1.83k 2.11k 7.53k 82.18%
Latency Distribution
50% 215.00us
75% 260.00us
90% 363.00us
99% 485.31ms
34149 requests in 10.02s, 4.33MB read
Socket errors: connect 0, read 34149, write 0, timeout 15
Requests/sec: 3407.63
Transfer/sec: 442.60KB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
Server started http://127.0.0.1:8080
========================================================================
python
========================================================================
Running 10s test @ http://127.0.0.1:8080
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.87ms 90.32ms 1.79s 98.21%
Req/Sec 2.16k 2.17k 7.49k 80.10%
Latency Distribution
50% 234.00us
75% 353.00us
90% 378.00us
99% 363.73ms
43897 requests in 10.02s, 5.57MB read
Socket errors: connect 0, read 43897, write 0, timeout 14
Requests/sec: 4379.74
Transfer/sec: 568.85KB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
Server started http://127.0.0.1:8080
========================================================================
python
========================================================================
Running 10s test @ http://127.0.0.1:8080
4 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.98ms 51.85ms 1.66s 99.16%
Req/Sec 2.69k 2.58k 7.61k 51.14%
Latency Distribution
50% 234.00us
75% 357.00us
90% 381.00us
99% 568.00us
50165 requests in 10.02s, 6.36MB read
Socket errors: connect 0, read 50165, write 0, timeout 9
Requests/sec: 5004.06
Transfer/sec: 649.95KB
(base) rs@ryzen:~/code/github.com/renerocksai/zap$
[rene@nixos:~]$ lscpu
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust Architecture: x86_64
Finished release [optimized] target(s) in 0.00s CPU op-mode(s): 32-bit, 64-bit
======================================================================== Address sizes: 46 bits physical, 48 bits virtual
rust Byte Order: Little Endian
======================================================================== CPU(s): 64
Running 10s test @ http://127.0.0.1:7878 On-line CPU(s) list: 0-63
4 threads and 400 connections Vendor ID: GenuineIntel
Thread Stats Avg Stdev Max +/- Stdev Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Latency 1.20ms 1.38ms 208.35ms 99.75% CPU family: 6
Req/Sec 34.06k 2.00k 38.86k 75.25% Model: 85
Latency Distribution Thread(s) per core: 2
50% 1.32ms Core(s) per socket: 16
75% 1.63ms Socket(s): 2
90% 1.87ms Stepping: 7
99% 2.32ms CPU(s) scaling MHz: 57%
1356449 requests in 10.01s, 71.15MB read CPU max MHz: 3900,0000
Socket errors: connect 0, read 1356427, write 0, timeout 0 CPU min MHz: 1000,0000
Requests/sec: 135446.00 BogoMIPS: 4600,00
Transfer/sec: 7.10MB Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs b
ts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_
deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpri
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust ority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xs
Finished release [optimized] target(s) in 0.00s avec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabi
======================================================================== lities
rust Virtualization features:
======================================================================== Virtualization: VT-x
Running 10s test @ http://127.0.0.1:7878 Caches (sum of all):
4 threads and 400 connections L1d: 1 MiB (32 instances)
Thread Stats Avg Stdev Max +/- Stdev L1i: 1 MiB (32 instances)
Latency 1.21ms 592.89us 10.02ms 63.64% L2: 32 MiB (32 instances)
Req/Sec 32.93k 2.91k 37.94k 80.50% L3: 44 MiB (2 instances)
Latency Distribution NUMA:
50% 1.31ms NUMA node(s): 2
75% 1.64ms NUMA node0 CPU(s): 0-15,32-47
90% 1.90ms NUMA node1 CPU(s): 16-31,48-63
99% 2.48ms Vulnerabilities:
1311445 requests in 10.02s, 68.79MB read Gather data sampling: Mitigation; Microcode
Socket errors: connect 0, read 1311400, write 0, timeout 0 Itlb multihit: KVM: Mitigation: VMX disabled
Requests/sec: 130904.50 L1tf: Not affected
Transfer/sec: 6.87MB Mds: Not affected
Meltdown: Not affected
Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust Retbleed: Mitigation; Enhanced IBRS
Finished release [optimized] target(s) in 0.00s Spec rstack overflow: Not affected
======================================================================== Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
rust Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
======================================================================== Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Running 10s test @ http://127.0.0.1:7878 Srbds: Not affected
4 threads and 400 connections Tsx async abort: Mitigation; TSX disabled
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.26ms 2.88ms 211.74ms 99.92%
Req/Sec 33.92k 2.04k 38.99k 74.00%
Latency Distribution
50% 1.34ms
75% 1.66ms
90% 1.91ms
99% 2.38ms
1350527 requests in 10.02s, 70.84MB read
Socket errors: connect 0, read 1350474, write 0, timeout 0
Requests/sec: 134830.39
Transfer/sec: 7.07MB
``` ```
## test machine
### Work Laptop
Very strange. It absolutely **LOVES** zap 🤣!
![](./wrk/samples/laptop_req_per_sec_graph.png)
![](./wrk/samples/laptop_xfer_per_sec_graph.png)
``` ```
▗▄▄▄ ▗▄▄▄▄ ▄▄▄▖ rs@ryzen ➜ neofetch --stdout
▜███▙ ▜███▙ ▟███▛ -------- rs@nixos
▜███▙ ▜███▙▟███▛ OS: NixOS 22.05 (Quokka) x86_64 --------
▜███▙ ▜██████▛ Host: Micro-Star International Co., Ltd. B550-A PRO (MS-7C56) OS: NixOS 23.05.2918.4cdad15f34e6 (Stoat) x86_64
▟█████████████████▙ ▜████▛ ▟▙ Kernel: 6.0.15 Host: LENOVO 20TKS0W700
▟███████████████████▙ ▜███▙ ▟██▙ Uptime: 7 days, 5 hours, 29 mins Kernel: 6.1.45
▄▄▄▄▖ ▜███▙ ▟███▛ Packages: 5950 (nix-system), 893 (nix-user), 5 (flatpak) Uptime: 1 day, 4 hours, 50 mins
▟███▛ ▜██▛ ▟███▛ Shell: bash 5.1.16 Packages: 6259 (nix-system), 267 (nix-user), 9 (flatpak)
▟███▛ ▜▛ ▟███▛ Resolution: 3840x2160 Shell: bash 5.2.15
▟███████████▛ ▟██████████▙ DE: none+i3 Resolution: 3840x1600, 3840x2160
▜██████████▛ ▟███████████▛ WM: i3 DE: none+i3
▟███▛ ▟▙ ▟███▛ Terminal: Neovim Terminal WM: i3
▟███▛ ▟██▙ ▟███▛ CPU: AMD Ryzen 5 5600X (12) @ 3.700GHz Terminal: tmux
▟███▛ ▜███▙ ▝▀▀▀▀ GPU: AMD ATI Radeon RX 6700/6700 XT / 6800M CPU: Intel i9-10885H (16) @ 5.300GHz
▜██▛ ▜███▙ ▜██████████████████▛ Memory: 10378MiB / 32033MiB GPU: NVIDIA GeForce GTX 1650 Ti Mobile
▜▛ ▟████▙ ▜████████████████▛ Memory: 4525MiB / 31805MiB
▟██████▙ ▜███▙
▟███▛▜███▙ ▜███▙
▟███▛ ▜███▙ ▜███▙ ➜ lscpu
▝▀▀▀ ▀▀▀▀▘ ▀▀▀▘ Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
CPU family: 6
Model: 165
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 56%
CPU max MHz: 5300.0000
CPU min MHz: 800.0000
BogoMIPS: 4800.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke sgx_lc md_clear flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 2 MiB (8 instances)
L3 cache: 16 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Not affected
``` ```

18
flake.lock generated
View file

@ -92,11 +92,11 @@
}, },
"locked": { "locked": {
"dir": "contrib", "dir": "contrib",
"lastModified": 1691226237, "lastModified": 1692702614,
"narHash": "sha256-/+JDL1T9dFh2NqCOXqsLSNjrRcsKAMWdJiARq54qx6c=", "narHash": "sha256-FeY8hAB77tnUTDbK6WYA+DG3Nx5xQrbOC17Cfl3pTm4=",
"owner": "neovim", "owner": "neovim",
"repo": "neovim", "repo": "neovim",
"rev": "42630923fc00633d806af97c1792b2ed4a71e1cc", "rev": "014b87646fc3273a09d6b20ebb648a8eb24a0a98",
"type": "github" "type": "github"
}, },
"original": { "original": {
@ -108,11 +108,11 @@
}, },
"nixpkgs": { "nixpkgs": {
"locked": { "locked": {
"lastModified": 1691235410, "lastModified": 1692698134,
"narHash": "sha256-kdUw6loESRxuQEz+TJXE9TdSBs2aclaF1Yrro+u8NlM=", "narHash": "sha256-YtMmZWR/dlTypOcwiZfZTMPr3tj9fwr05QTStfSyDSg=",
"owner": "nixos", "owner": "nixos",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "d814a2776b53f65ea73c7403f3efc2e3511c7dbb", "rev": "a16f7eb56e88c8985fcc6eb81dabd6cade4e425a",
"type": "github" "type": "github"
}, },
"original": { "original": {
@ -184,11 +184,11 @@
"nixpkgs": "nixpkgs_2" "nixpkgs": "nixpkgs_2"
}, },
"locked": { "locked": {
"lastModified": 1691237213, "lastModified": 1692663634,
"narHash": "sha256-RReB+o6jjJXjCHHJSny0p7NR/kNOu57jXEDX7jq9bp0=", "narHash": "sha256-wioqr80UOA0tNXaJy4D0i9fFaLG2RoQi5e9Dpd4WojE=",
"owner": "mitchellh", "owner": "mitchellh",
"repo": "zig-overlay", "repo": "zig-overlay",
"rev": "a9d85674542108318187831fbf376704b71590f3", "rev": "d666e5137fe0c43353c555fb47748813084decab",
"type": "github" "type": "github"
}, },
"original": { "original": {

View file

@ -48,6 +48,7 @@
wrk wrk
python39 python39
python39Packages.sanic python39Packages.sanic
python39Packages.matplotlib
poetry poetry
poetry poetry
pkgs.rustc pkgs.rustc
@ -65,6 +66,9 @@
pkgs.zlib pkgs.zlib
pkgs.icu pkgs.icu
pkgs.openssl pkgs.openssl
pkgs.neofetch
pkgs.util-linux # lscpu
]; ];
buildInputs = with pkgs; [ buildInputs = with pkgs; [

90
wrk/graph.py Normal file
View file

@ -0,0 +1,90 @@
import re
import os
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from collections import defaultdict
import statistics
directory = "./wrk" # Replace with the actual directory path
requests_sec = defaultdict(list)
transfers_sec = defaultdict(list)
mean_requests = {}
mean_transfers = {}
def plot(kind='', title='', ylabel='', means=None):
# Sort the labels and requests_sec lists together based on the requests_sec values
labels = []
values = []
# silly, I know
for k, v in means.items():
labels.append(k)
values.append(v)
# sort the labels and value lists
labels, values = zip(*sorted(zip(labels, values), key=lambda x: x[1], reverse=True))
# Plot the graph
plt.figure(figsize=(10, 6)) # Adjust the figure size as needed
bars = plt.bar(labels, values)
plt.xlabel("Subject")
plt.ylabel(ylabel)
plt.title(title)
plt.xticks(rotation=45) # Rotate x-axis labels for better readability
# Display the actual values on top of the bars
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width() / 2, yval, f'{yval:,.2f}', ha='center', va='bottom')
plt.tight_layout() # Adjust the spacing of the graph elements
png_name = f"{directory}/{kind.lower()}_graph.png"
plt.savefig(png_name) # Save the graph as a PNG file
print(f"Generated: {png_name}")
if __name__ == '__main__':
if not os.path.isdir(".git"):
print("Please run from root directory of the repository!")
print("e.g. python wrk/graph.py")
import sys
sys.exit(1)
# Iterate over the files in the directory
for filename in os.listdir(directory):
if filename.endswith(".perflog"):
label = os.path.splitext(filename)[0]
file_path = os.path.join(directory, filename)
with open(file_path, "r") as file:
lines = file.readlines()
for line in lines:
# Extract the Requests/sec value using regular expressions
match = re.search(r"Requests/sec:\s+([\d.]+)", line)
if match:
requests_sec[label].append(float(match.group(1)))
match = re.search(r"Transfer/sec:\s+([\d.]+)", line)
if match:
value = float(match.group(1))
if 'KB' in line:
value *= 1024
elif 'MB' in line:
value *= 1024 * 1024
value /= 1024.0 * 1024
transfers_sec[label].append(value)
# calculate means
for k, v in requests_sec.items():
mean_requests[k] = statistics.mean(v)
for k, v in transfers_sec.items():
mean_transfers[k] = statistics.mean(v)
# save the plots
plot(kind='req_per_sec', title='Requests/sec Comparison',
ylabel='requests/sec', means=mean_requests)
plot(kind='xfer_per_sec', title='Transfer/sec Comparison',
ylabel='transfer/sec [MB]', means=mean_transfers)

View file

@ -14,7 +14,7 @@ if [ "$SUBJECT" = "" ] ; then
exit 1 exit 1
fi fi
if [ "$SUBJECT" = "zig" ] ; then if [ "$SUBJECT" = "zig-zap" ] ; then
zig build -Doptimize=ReleaseFast wrk > /dev/null zig build -Doptimize=ReleaseFast wrk > /dev/null
$TSK_SRV ./zig-out/bin/wrk & $TSK_SRV ./zig-out/bin/wrk &
PID=$! PID=$!
@ -41,20 +41,27 @@ if [ "$SUBJECT" = "python" ] ; then
URL=http://127.0.0.1:8080 URL=http://127.0.0.1:8080
fi fi
if [ "$SUBJECT" = "sanic" ] ; then if [ "$SUBJECT" = "python-sanic" ] ; then
$TSK_SRV python wrk/sanic/sanic-app.py & $TSK_SRV python wrk/sanic/sanic-app.py &
PID=$! PID=$!
URL=http://127.0.0.1:8000 URL=http://127.0.0.1:8000
fi fi
if [ "$SUBJECT" = "rust" ] ; then if [ "$SUBJECT" = "rust-bythebook" ] ; then
cd wrk/rust/hello && cargo build --release cd wrk/rust/bythebook && cargo build --release
$TSK_SRV ./target/release/hello & $TSK_SRV ./target/release/hello &
PID=$! PID=$!
URL=http://127.0.0.1:7878 URL=http://127.0.0.1:7878
fi fi
if [ "$SUBJECT" = "axum" ] ; then if [ "$SUBJECT" = "rust-clean" ] ; then
cd wrk/rust/clean && cargo build --release
$TSK_SRV ./target/release/hello &
PID=$!
URL=http://127.0.0.1:7878
fi
if [ "$SUBJECT" = "rust-axum" ] ; then
cd wrk/axum/hello-axum && cargo build --release cd wrk/axum/hello-axum && cargo build --release
$TSK_SRV ./target/release/hello-axum & $TSK_SRV ./target/release/hello-axum &
PID=$! PID=$!
@ -68,7 +75,7 @@ if [ "$SUBJECT" = "csharp" ] ; then
URL=http://127.0.0.1:5026 URL=http://127.0.0.1:5026
fi fi
if [ "$SUBJECT" = "cpp" ] ; then if [ "$SUBJECT" = "cpp-beast" ] ; then
cd wrk/cpp && zig build -Doptimize=ReleaseFast cd wrk/cpp && zig build -Doptimize=ReleaseFast
$TSK_SRV ./zig-out/bin/cpp-beast 127.0.0.1 8070 . & $TSK_SRV ./zig-out/bin/cpp-beast 127.0.0.1 8070 . &
PID=$! PID=$!

20
wrk/measure_all.sh Executable file
View file

@ -0,0 +1,20 @@
#! /usr/bin/env bash
if [ ! -d ".git" ] ; then
echo "This script must be run from the root directory of the repository!"
echo "./wrk/measure_all.sh"
exit 1
fi
SUBJECTS="zig-zap go python python-sanic rust-bythebook rust-clean rust-axum csharp cpp-beast"
rm -f wrk/*.perflog
for S in $SUBJECTS; do
L="$S.perflog"
for R in 1 2 3 ; do
./wrk/measure.sh $S | tee -a wrk/$L
done
done
echo "Finished"

14
wrk/rust/clean/.gitignore vendored Normal file
View file

@ -0,0 +1,14 @@
# Generated by Cargo
# will have compiled files and executables
debug/
target/
# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
Cargo.lock
# These are backup files generated by rustfmt
**/*.rs.bk
# MSVC Windows builds of rustc generate these, which store debugging information
*.pdb

View file

@ -0,0 +1,9 @@
[package]
name = "hello"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
# crossbeam = { version = "0.8.2", features = ["crossbeam-channel"] }

View file

@ -0,0 +1 @@
Hello from RUST!

101
wrk/rust/clean/src/lib.rs Normal file
View file

@ -0,0 +1,101 @@
//Crossbeam should, but does not make this faster.
//use crossbeam::channel::bounded;
use std::{net::TcpStream, sync::mpsc, thread};
type Job = (fn(TcpStream), TcpStream);
type Sender = mpsc::Sender<Job>;
//type Sender = crossbeam::channel::Sender<Job>;
type Receiver = mpsc::Receiver<Job>;
//type Receiver = crossbeam::channel::Receiver<Job>;
pub struct ThreadPool {
workers: Vec<Worker>,
senders: Vec<Sender>,
next_sender: usize,
}
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let mut workers = Vec::with_capacity(size);
let mut senders = Vec::with_capacity(size);
for id in 0..size {
//let (sender, receiver) = bounded(2);
let (sender, receiver) = mpsc::channel();
senders.push(sender);
workers.push(Worker::new(id, receiver));
}
ThreadPool {
workers,
senders,
next_sender: 0,
}
}
/// round robin over available workers to ensure we never have to buffer requests
pub fn execute(&mut self, handler: fn(TcpStream), stream: TcpStream) {
let job = (handler, stream);
self.senders[self.next_sender].send(job).unwrap();
//self.senders[self.next_sender].try_send(job).unwrap();
self.next_sender += 1;
if self.next_sender == self.senders.len() {
self.next_sender = 0;
}
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
self.senders.clear();
for worker in &mut self.workers {
println!("Shutting down worker {}", worker.id);
if let Some(thread) = worker.thread.take() {
thread.join().unwrap();
}
}
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
}
impl Worker {
fn new(id: usize, receiver: Receiver) -> Worker {
let thread = thread::spawn(move || Self::work(receiver));
Worker {
id,
thread: Some(thread),
}
}
fn work(receiver: Receiver) {
loop {
let message = receiver.recv();
match message {
Ok((handler, stream)) => {
// println!("Worker got a job; executing.");
handler(stream);
}
Err(_) => {
// println!("Worker disconnected; shutting down.");
break;
}
}
}
}
}

View file

@ -0,0 +1,41 @@
use hello::ThreadPool;
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
//Creating a massive amount of threads so we can always have one ready to go.
let mut pool = ThreadPool::new(128);
// for stream in listener.incoming().take(2) {
for stream in listener.incoming() {
let stream = stream.unwrap();
//handle_connection(stream);
pool.execute(handle_connection, stream);
}
println!("Shutting down.");
}
fn handle_connection(mut stream: TcpStream) {
stream.set_nodelay(true).expect("set_nodelay call failed");
let mut buffer = [0; 1024];
let nbytes = stream.read(&mut buffer).unwrap();
if nbytes == 0 {
return;
}
let status_line = "HTTP/1.1 200 OK";
let contents = "HELLO from RUST!";
let response = format!(
"{}\r\nContent-Length: {}\r\n\r\n{}",
status_line,
contents.len(),
contents
);
stream.write_all(response.as_bytes()).unwrap();
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

View file

@ -19,6 +19,6 @@ pub fn main() !void {
// start worker threads // start worker threads
zap.start(.{ zap.start(.{
.threads = 4, .threads = 4,
.workers = 4, .workers = 2, // empirical tests: yield best perf on my machine
}); });
} }