merge upstream

2026-01-29 17:16:56 +00:00 · 2023-08-22 19:48:39 +03:00 · 2023-08-22 19:48:39 +03:00 · 2579a5fff8
commit 2579a5fff8
parent 47e0fcc994 6562d9ed3e
22 changed files with 501 additions and 510 deletions
--- a/.gitignore
+++ b/.gitignore
@ -14,3 +14,5 @@ scratch
 docs/
 .DS_Store
 .vs/
 **/*.perflog
 wrk/*.png
--- a/blazingly-fast.md
+++ b/blazingly-fast.md
@ -1,543 +1,245 @@
 # ⚡blazingly fast⚡
-I conducted a series of quick tests, using wrk with simple HTTP servers written
+Initially, I conducted a series of quick tests, using wrk with simple HTTP
-in GO and in zig zap. I made sure that all servers only output 17 bytes of HTTP
+servers written in GO and in zig zap. I made sure that all servers only output
-body.
+17 bytes of HTTP body.
 Just to get some sort of indication, I also included measurements for python
 since I used to write my REST APIs in python before creating zig zap.
 You can check out the scripts I used for the tests in [./wrk](wrk/).
-## results
+## Why 
-You can see the verbatim output of `wrk`, and some infos about the test machine
+I aimed to enhance the performance of my Python + Flask backends by replacing
-below the code snippets.
+them with a Zig version. To evaluate the success of this transition, I compared
 the performance of a static HTTP server implemented in Python and its Zig
 counterpart, which showed significant improvements. 
-**Update**: I was intrigued comparing to a basic rust HTTP server.
+To further assess the Zig server's performance, I compared it with a Go
-Unfortunately, knowing nothing at all about rust, I couldn't find one and hence
+implementation, to compare against a widely used industry-standard. I expected
-tried to go for the one in [The Rust Programming
+similar performance levels but was pleasantly surprised when Zap outperformed Go
-Language](https://doc.rust-lang.org/book/ch20-00-final-project-a-web-server.html).
+by approximately 30% on my test machine. 
 Wanting it to be of a somewhat fair comparison, I opted for the multi-threaded
 example. It didn't work out-of-the-box, but I got it to work and changed it to
 not read files but outputting a static text just like in the other examples.
 **maybe someone with rust experience** can have a look at my
 [wrk/rust/hello](wrk/rust/hello) code and tell me why it's surprisingly slow, as
 I expected it to be faster than the basic GO example. I'll enable the
 GitHub discussions for this matter. My suspicion is bad performance of the
 mutexes.
-![](wrk_tables.png)
+Intrigued by Rust's reputed performance capabilities, I also experimented with a
 Rust version. The results of this experiment are discussed in the
 [Flaws](#flaws) section below.
-### requests / sec
+## What 
-![](wrk_requests.png)
+So, what are the benchmarks testing?
-### transfer MB / sec
+- simple http servers that reply to GET requests with a constant, 17-bytes long response
 - 4 cores are assigned to the subject under test (the respective server)
 - 4 cores are assigned to `wrk`
    - using 4 threads
    - aiming at 400 concurrent connections
-![](wrk_transfer.png)
+## How
 I have fully automated the benchmarks and graph generation.
-## zig code 
+To generate the data:
-zig version .11.0-dev.1265+3ab43988c
+```console
-
+$ ./wrk/measure_all.sh
 ```zig 
 const std = @import("std");
 const zap = @import("zap");
 fn on_request_minimal(r: zap.SimpleRequest) void {
    _ = r.sendBody("Hello from ZAP!!!");
 }
 pub fn main() !void {
    var listener = zap.SimpleHttpListener.init(.{
        .port = 3000,
        .on_request = on_request_minimal,
        .log = false,
        .max_clients = 100000,
    });
    try listener.listen();
    std.debug.print("Listening on 0.0.0.0:3000\n", .{});
    // start worker threads
    zap.start(.{
        .threads = 4,
        .workers = 4,
    });
 }
 ```
-## go code 
+To generate the graphs:
-go version go1.16.9 linux/amd64
+```console
-
+$ python wrk/graph.py
 ```go 
 package main
 import (
 	"fmt"
 	"net/http"
 )
 func hello(w http.ResponseWriter, req *http.Request) {
 	fmt.Fprintf(w, "hello from GO!!!\n")
 }
 func main() {
 	print("listening on 0.0.0.0:8090\n")
 	http.HandleFunc("/hello", hello)
 	http.ListenAndServe(":8090", nil)
 }
 ```
-## python code
+For dependencies, please see the [flake.nix](./flake.nix#L46).
-python version 3.9.6
+## Flaws
-```python 
+The benchmarks have limitations, such as the lack of request latencies. The Rust
-# Python 3 server example
+community has often criticized these benchmarks as biased. However, no such
-from http.server import BaseHTTPRequestHandler, HTTPServer
+criticisms have come from the Go or Python communities.
-hostName = "127.0.0.1"
+In response to the Rust community's concerns, we've added three Rust
-serverPort = 8080
+implementations for comparison:
 - A standard version from [the Rust book](https://doc.rust-lang.org/book/ch20-00-final-project-a-web-server.html).
 - An "axum" version to highlight Rust's speed.
 - A refined version of the Rust book version.
 Originally, the goal was to compare "batteries included" versions, which created
 a disparity by comparing the optimized zap / facil.io code with basic bundled
 functionalities. These tests were for personal interest and not meant to be
 definitive benchmarks.
 To address this bias, we've added the Rust-axum and Python-sanic benchmarks. For
 more information, refer to the relevant discussions and pull requests.
-class MyServer(BaseHTTPRequestHandler):
+## More benchmarks?
    def do_GET(self):
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
        self.wfile.write(bytes("HI FROM PYTHON!!!", "utf-8"))
-    def log_message(self, format, *args):
+I often receive requests or PRs to include additional benchmarks, which a lot of
-        return
+times I find to be either ego-driven or a cause for unnecessary disputes. People
 tend to favor their preferred language or framework. Zig, Rust, C, and C++ are
 all capable of efficiently creating fast web servers, with different frameworks
 potentially excelling in certain benchmarks. My main concern was whether Zap,
 given its current level of abstraction, could compete with standard web servers.
 This question has been answered, and I see no need for further benchmarks.
-if __name__ == "__main__":
+## The computer makes the difference
    webServer = HTTPServer((hostName, serverPort), MyServer)
    print("Server started http://%s:%s" % (hostName, serverPort))
-    try:
+After automating the performance benchmarks, I gathered data from three
-        webServer.serve_forever()
+different computers. It's interesting to see the variation in relative numbers.
    except KeyboardInterrupt:
        pass
    webServer.server_close()
    print("Server stopped.")
 ```
 ## rust code 
 [main.rs](wrk/rust/hello/src/main.rs)
 ```rust
 use hello::ThreadPool;
 use std::io::prelude::*;
 use std::net::TcpListener;
 use std::net::TcpStream;
 fn main() {
    let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
    let pool = ThreadPool::new(4);
    // for stream in listener.incoming().take(2) {
    for stream in listener.incoming() {
        let stream = stream.unwrap();
        pool.execute(|| {
            handle_connection(stream);
        });
    }
    println!("Shutting down.");
 }
 fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).unwrap();
-    let status_line = "HTTP/1.1 200 OK";
+### The test machine (graphs in the README)
-    let contents = "HELLO from RUST!";
+To be added when I get home.
-    let response = format!(
+### Workstation at work
        "{}\r\nContent-Length: {}\r\n\r\n{}",
        status_line,
        contents.len(),
        contents
    );
-    stream.write_all(response.as_bytes()).unwrap();
+A beast. Many cores (which we don't use). 
    stream.flush().unwrap();
 }
 ```
-[lib.rs](wrk/rust/hello/src/lib.rs)
+![](./wrk/samples/workstation_req_per_sec_graph.png)
-```rust
+![](./wrk/samples/workstation_xfer_per_sec_graph.png)
 use std::{
    sync::{mpsc, Arc, Mutex},
    thread,
 };
 pub struct ThreadPool {
    workers: Vec<Worker>,
    sender: Option<mpsc::Sender<Job>>,
 }
 type Job = Box<dyn FnOnce() + Send + 'static>;
 impl ThreadPool {
    /// Create a new ThreadPool.
    ///
    /// The size is the number of threads in the pool.
    ///
    /// # Panics
    ///
    /// The `new` function will panic if the size is zero.
    pub fn new(size: usize) -> ThreadPool {
        assert!(size > 0);
        let (sender, receiver) = mpsc::channel();
        let receiver = Arc::new(Mutex::new(receiver));
        let mut workers = Vec::with_capacity(size);
        for id in 0..size {
            workers.push(Worker::new(id, Arc::clone(&receiver)));
        }
        ThreadPool {
            workers,
            sender: Some(sender),
        }
    }
    pub fn execute<F>(&self, f: F)
    where
        F: FnOnce() + Send + 'static,
    {
        let job = Box::new(f);
        self.sender.as_ref().unwrap().send(job).unwrap();
    }
 }
 impl Drop for ThreadPool {
    fn drop(&mut self) {
        drop(self.sender.take());
        for worker in &mut self.workers {
            println!("Shutting down worker {}", worker.id);
            if let Some(thread) = worker.thread.take() {
                thread.join().unwrap();
            }
        }
    }
 }
 struct Worker {
    id: usize,
    thread: Option<thread::JoinHandle<()>>,
 }
 impl Worker {
    fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
        let thread = thread::spawn(move || loop {
            let message = receiver.lock().unwrap().recv();
            match message {
                Ok(job) => {
                    // println!("Worker  got a job; executing.");
                    job();
                }
                Err(_) => {
                    // println!("Worker  disconnected; shutting down.");
                    break;
                }
            }
        });
        Worker {
            id,
            thread: Some(thread),
        }
    }
 }
 ```
 ## wrk output
 wrk version: `wrk 4.1.0 [epoll] Copyright (C) 2012 Will Glozer`
 ```
-(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
+[rene@nixos:~]$ neofetch --stdout
-Listening on 0.0.0.0:3000
+rene@nixos 
-========================================================================
+---------- 
-                          zig
+OS: NixOS 23.05.2947.475d5ae2c4cb (Stoat) x86_64 
-========================================================================
+Host: LENOVO 1038 
-Running 10s test @ http://127.0.0.1:3000
+Kernel: 6.1.46 
-  4 threads and 400 connections
+Uptime: 26 mins 
-  Thread Stats   Avg      Stdev     Max   +/- Stdev
+Packages: 5804 (nix-system), 566 (nix-user) 
-    Latency   331.40us  115.09us   8.56ms   91.94%
+Shell: bash 5.2.15 
-    Req/Sec   159.51k     9.44k  175.23k    56.50%
+Terminal: /dev/pts/2 
-  Latency Distribution
+CPU: Intel Xeon Gold 5218 (64) @ 3.900GHz 
-     50%  312.00us
+GPU: NVIDIA Quadro P620 
-     75%  341.00us
+GPU: NVIDIA Tesla M40 
-     90%  375.00us
+Memory: 1610MiB / 95247MiB 
     99%  681.00us
  6348945 requests in 10.01s, 0.94GB read
 Requests/sec: 634220.13
 Transfer/sec:     96.17MB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ 
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
 Listening on 0.0.0.0:3000
 ========================================================================
                          zig
 ========================================================================
 Running 10s test @ http://127.0.0.1:3000
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   322.43us  103.25us   3.72ms   86.57%
    Req/Sec   166.35k     2.89k  182.78k    68.00%
  Latency Distribution
     50%  297.00us
     75%  330.00us
     90%  482.00us
     99%  657.00us
  6619245 requests in 10.02s, 0.98GB read
 Requests/sec: 660803.71
 Transfer/sec:    100.20MB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ 
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh zig
 Listening on 0.0.0.0:3000
 ========================================================================
                          zig
 ========================================================================
 Running 10s test @ http://127.0.0.1:3000
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   325.47us  105.86us   4.03ms   87.27%
    Req/Sec   164.60k     4.69k  181.85k    84.75%
  Latency Distribution
     50%  300.00us
     75%  333.00us
     90%  430.00us
     99%  667.00us
  6549594 requests in 10.01s, 0.97GB read
 Requests/sec: 654052.56
 Transfer/sec:     99.18MB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ 
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
 listening on 0.0.0.0:8090
 ========================================================================
                          go
 ========================================================================
 Running 10s test @ http://127.0.0.1:8090/hello
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   680.63us  692.05us  12.09ms   88.04%
    Req/Sec   126.49k     4.28k  139.26k    71.75%
  Latency Distribution
     50%  403.00us
     75%  822.00us
     90%    1.52ms
     99%    3.34ms
  5033360 requests in 10.01s, 643.22MB read
 Requests/sec: 502584.84
 Transfer/sec:     64.23MB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
 listening on 0.0.0.0:8090
 ========================================================================
                          go
 ========================================================================
 Running 10s test @ http://127.0.0.1:8090/hello
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   683.97us  695.78us  10.01ms   88.04%
    Req/Sec   126.31k     4.34k  137.63k    65.00%
  Latency Distribution
     50%  408.00us
     75%  829.00us
     90%    1.53ms
     99%    3.34ms
  5026848 requests in 10.01s, 642.39MB read
 Requests/sec: 502149.91
 Transfer/sec:     64.17MB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh go
 listening on 0.0.0.0:8090
 ========================================================================
                          go
 ========================================================================
 Running 10s test @ http://127.0.0.1:8090/hello
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   688.89us  702.75us  12.70ms   88.09%
    Req/Sec   126.06k     4.20k  138.00k    70.25%
  Latency Distribution
     50%  414.00us
     75%  836.00us
     90%    1.54ms
     99%    3.36ms
  5015716 requests in 10.01s, 640.97MB read
 Requests/sec: 500968.28
 Transfer/sec:     64.02MB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
 Server started http://127.0.0.1:8080
 ========================================================================
                          python
 ========================================================================
 Running 10s test @ http://127.0.0.1:8080
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.89ms  101.69ms   1.79s    97.76%
    Req/Sec     1.83k     2.11k    7.53k    82.18%
  Latency Distribution
     50%  215.00us
     75%  260.00us
     90%  363.00us
     99%  485.31ms
  34149 requests in 10.02s, 4.33MB read
  Socket errors: connect 0, read 34149, write 0, timeout 15
 Requests/sec:   3407.63
 Transfer/sec:    442.60KB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
 Server started http://127.0.0.1:8080
 ========================================================================
                          python
 ========================================================================
 Running 10s test @ http://127.0.0.1:8080
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.87ms   90.32ms   1.79s    98.21%
    Req/Sec     2.16k     2.17k    7.49k    80.10%
  Latency Distribution
     50%  234.00us
     75%  353.00us
     90%  378.00us
     99%  363.73ms
  43897 requests in 10.02s, 5.57MB read
  Socket errors: connect 0, read 43897, write 0, timeout 14
 Requests/sec:   4379.74
 Transfer/sec:    568.85KB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh python
 Server started http://127.0.0.1:8080
 ========================================================================
                          python
 ========================================================================
 Running 10s test @ http://127.0.0.1:8080
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.98ms   51.85ms   1.66s    99.16%
    Req/Sec     2.69k     2.58k    7.61k    51.14%
  Latency Distribution
     50%  234.00us
     75%  357.00us
     90%  381.00us
     99%  568.00us
  50165 requests in 10.02s, 6.36MB read
  Socket errors: connect 0, read 50165, write 0, timeout 9
 Requests/sec:   5004.06
 Transfer/sec:    649.95KB
 (base) rs@ryzen:~/code/github.com/renerocksai/zap$ 
-
+[rene@nixos:~]$ lscpu
-(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust
+Architecture:            x86_64
-    Finished release [optimized] target(s) in 0.00s
+  CPU op-mode(s):        32-bit, 64-bit
-========================================================================
+  Address sizes:         46 bits physical, 48 bits virtual
-                          rust
+  Byte Order:            Little Endian
-========================================================================
+CPU(s):                  64
-Running 10s test @ http://127.0.0.1:7878
+  On-line CPU(s) list:   0-63
-  4 threads and 400 connections
+Vendor ID:               GenuineIntel
-  Thread Stats   Avg      Stdev     Max   +/- Stdev
+  Model name:            Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
-    Latency     1.20ms    1.38ms 208.35ms   99.75%
+    CPU family:          6
-    Req/Sec    34.06k     2.00k   38.86k    75.25%
+    Model:               85
-  Latency Distribution
+    Thread(s) per core:  2
-     50%    1.32ms
+    Core(s) per socket:  16
-     75%    1.63ms
+    Socket(s):           2
-     90%    1.87ms
+    Stepping:            7
-     99%    2.32ms
+    CPU(s) scaling MHz:  57%
-  1356449 requests in 10.01s, 71.15MB read
+    CPU max MHz:         3900,0000
-  Socket errors: connect 0, read 1356427, write 0, timeout 0
+    CPU min MHz:         1000,0000
-Requests/sec: 135446.00
+    BogoMIPS:            4600,00
-Transfer/sec:      7.10MB
+    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs b
-
+                         ts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_
-
+                         deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpri
-(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust
+                         ority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xs
-    Finished release [optimized] target(s) in 0.00s
+                         avec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabi
-========================================================================
+                         lities
-                          rust
+Virtualization features: 
-========================================================================
+  Virtualization:        VT-x
-Running 10s test @ http://127.0.0.1:7878
+Caches (sum of all):     
-  4 threads and 400 connections
+  L1d:                   1 MiB (32 instances)
-  Thread Stats   Avg      Stdev     Max   +/- Stdev
+  L1i:                   1 MiB (32 instances)
-    Latency     1.21ms  592.89us  10.02ms   63.64%
+  L2:                    32 MiB (32 instances)
-    Req/Sec    32.93k     2.91k   37.94k    80.50%
+  L3:                    44 MiB (2 instances)
-  Latency Distribution
+NUMA:                    
-     50%    1.31ms
+  NUMA node(s):          2
-     75%    1.64ms
+  NUMA node0 CPU(s):     0-15,32-47
-     90%    1.90ms
+  NUMA node1 CPU(s):     16-31,48-63
-     99%    2.48ms
+Vulnerabilities:         
-  1311445 requests in 10.02s, 68.79MB read
+  Gather data sampling:  Mitigation; Microcode
-  Socket errors: connect 0, read 1311400, write 0, timeout 0
+  Itlb multihit:         KVM: Mitigation: VMX disabled
-Requests/sec: 130904.50
+  L1tf:                  Not affected
-Transfer/sec:      6.87MB
+  Mds:                   Not affected
-
+  Meltdown:              Not affected
-
+  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
-(base) rs@ryzen:~/code/github.com/renerocksai/zap$ ./wrk/measure.sh rust
+  Retbleed:              Mitigation; Enhanced IBRS
-    Finished release [optimized] target(s) in 0.00s
+  Spec rstack overflow:  Not affected
-========================================================================
+  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
-                          rust
+  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
-========================================================================
+  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
-Running 10s test @ http://127.0.0.1:7878
+  Srbds:                 Not affected
-  4 threads and 400 connections
+  Tsx async abort:       Mitigation; TSX disabled
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.26ms    2.88ms 211.74ms   99.92%
    Req/Sec    33.92k     2.04k   38.99k    74.00%
  Latency Distribution
     50%    1.34ms
     75%    1.66ms
     90%    1.91ms
     99%    2.38ms
  1350527 requests in 10.02s, 70.84MB read
  Socket errors: connect 0, read 1350474, write 0, timeout 0
 Requests/sec: 134830.39
 Transfer/sec:      7.07MB
 ```
-## test machine
+
 ### Work Laptop
 Very strange. It absolutely **LOVES** zap 🤣!
 ![](./wrk/samples/laptop_req_per_sec_graph.png)
 ![](./wrk/samples/laptop_xfer_per_sec_graph.png)
 ```
-          ▗▄▄▄       ▗▄▄▄▄    ▄▄▄▖            rs@ryzen 
+➜ neofetch --stdout
-          ▜███▙       ▜███▙  ▟███▛            -------- 
+rs@nixos
-           ▜███▙       ▜███▙▟███▛             OS: NixOS 22.05 (Quokka) x86_64 
+--------
-            ▜███▙       ▜██████▛              Host: Micro-Star International Co., Ltd. B550-A PRO (MS-7C56) 
+OS: NixOS 23.05.2918.4cdad15f34e6 (Stoat) x86_64
-     ▟█████████████████▙ ▜████▛     ▟▙        Kernel: 6.0.15 
+Host: LENOVO 20TKS0W700
-    ▟███████████████████▙ ▜███▙    ▟██▙       Uptime: 7 days, 5 hours, 29 mins 
+Kernel: 6.1.45
-           ▄▄▄▄▖           ▜███▙  ▟███▛       Packages: 5950 (nix-system), 893 (nix-user), 5 (flatpak) 
+Uptime: 1 day, 4 hours, 50 mins
-          ▟███▛             ▜██▛ ▟███▛        Shell: bash 5.1.16 
+Packages: 6259 (nix-system), 267 (nix-user), 9 (flatpak)
-         ▟███▛               ▜▛ ▟███▛         Resolution: 3840x2160 
+Shell: bash 5.2.15
-▟███████████▛                  ▟██████████▙   DE: none+i3 
+Resolution: 3840x1600, 3840x2160
-▜██████████▛                  ▟███████████▛   WM: i3 
+DE: none+i3
-      ▟███▛ ▟▙               ▟███▛            Terminal: Neovim Terminal 
+WM: i3
-     ▟███▛ ▟██▙             ▟███▛             CPU: AMD Ryzen 5 5600X (12) @ 3.700GHz 
+Terminal: tmux
-    ▟███▛  ▜███▙           ▝▀▀▀▀              GPU: AMD ATI Radeon RX 6700/6700 XT / 6800M 
+CPU: Intel i9-10885H (16) @ 5.300GHz
-    ▜██▛    ▜███▙ ▜██████████████████▛        Memory: 10378MiB / 32033MiB 
+GPU: NVIDIA GeForce GTX 1650 Ti Mobile
-     ▜▛     ▟████▙ ▜████████████████▛
+Memory: 4525MiB / 31805MiB
-           ▟██████▙       ▜███▙                                       
+
-          ▟███▛▜███▙       ▜███▙                                      
+
-         ▟███▛  ▜███▙       ▜███▙
+➜ lscpu
-         ▝▀▀▀    ▀▀▀▀▘       ▀▀▀▘
+Architecture:                       x86_64
 CPU op-mode(s):                     32-bit, 64-bit
 Address sizes:                      39 bits physical, 48 bits virtual
 Byte Order:                         Little Endian
 CPU(s):                             16
 On-line CPU(s) list:                0-15
 Vendor ID:                          GenuineIntel
 Model name:                         Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
 CPU family:                         6
 Model:                              165
 Thread(s) per core:                 2
 Core(s) per socket:                 8
 Socket(s):                          1
 Stepping:                           2
 CPU(s) scaling MHz:                 56%
 CPU max MHz:                        5300.0000
 CPU min MHz:                        800.0000
 BogoMIPS:                           4800.00
 Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke sgx_lc md_clear flush_l1d arch_capabilities
 Virtualization:                     VT-x
 L1d cache:                          256 KiB (8 instances)
 L1i cache:                          256 KiB (8 instances)
 L2 cache:                           2 MiB (8 instances)
 L3 cache:                           16 MiB (1 instance)
 NUMA node(s):                       1
 NUMA node0 CPU(s):                  0-15
 Vulnerability Gather data sampling: Mitigation; Microcode
 Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
 Vulnerability L1tf:                 Not affected
 Vulnerability Mds:                  Not affected
 Vulnerability Meltdown:             Not affected
 Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
 Vulnerability Retbleed:             Mitigation; Enhanced IBRS
 Vulnerability Spec rstack overflow: Not affected
 Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
 Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
 Vulnerability Srbds:                Mitigation; Microcode
 Vulnerability Tsx async abort:      Not affected
 ```
--- a/flake.lock
+++ b/flake.lock
@ -92,11 +92,11 @@
      },
      "locked": {
        "dir": "contrib",
-        "lastModified": 1691226237,
+        "lastModified": 1692702614,
-        "narHash": "sha256-/+JDL1T9dFh2NqCOXqsLSNjrRcsKAMWdJiARq54qx6c=",
+        "narHash": "sha256-FeY8hAB77tnUTDbK6WYA+DG3Nx5xQrbOC17Cfl3pTm4=",
        "owner": "neovim",
        "repo": "neovim",
-        "rev": "42630923fc00633d806af97c1792b2ed4a71e1cc",
+        "rev": "014b87646fc3273a09d6b20ebb648a8eb24a0a98",
        "type": "github"
      },
      "original": {
@ -108,11 +108,11 @@
    },
    "nixpkgs": {
      "locked": {
-        "lastModified": 1691235410,
+        "lastModified": 1692698134,
-        "narHash": "sha256-kdUw6loESRxuQEz+TJXE9TdSBs2aclaF1Yrro+u8NlM=",
+        "narHash": "sha256-YtMmZWR/dlTypOcwiZfZTMPr3tj9fwr05QTStfSyDSg=",
        "owner": "nixos",
        "repo": "nixpkgs",
-        "rev": "d814a2776b53f65ea73c7403f3efc2e3511c7dbb",
+        "rev": "a16f7eb56e88c8985fcc6eb81dabd6cade4e425a",
        "type": "github"
      },
      "original": {
@ -184,11 +184,11 @@
        "nixpkgs": "nixpkgs_2"
      },
      "locked": {
-        "lastModified": 1691237213,
+        "lastModified": 1692663634,
-        "narHash": "sha256-RReB+o6jjJXjCHHJSny0p7NR/kNOu57jXEDX7jq9bp0=",
+        "narHash": "sha256-wioqr80UOA0tNXaJy4D0i9fFaLG2RoQi5e9Dpd4WojE=",
        "owner": "mitchellh",
        "repo": "zig-overlay",
-        "rev": "a9d85674542108318187831fbf376704b71590f3",
+        "rev": "d666e5137fe0c43353c555fb47748813084decab",
        "type": "github"
      },
      "original": {
--- a/flake.nix
+++ b/flake.nix
@ -48,6 +48,7 @@
            wrk
            python39
            python39Packages.sanic
            python39Packages.matplotlib
            poetry
            poetry
            pkgs.rustc 
@ -65,6 +66,9 @@
            pkgs.zlib
            pkgs.icu
            pkgs.openssl
            pkgs.neofetch
            pkgs.util-linux    # lscpu
          ];
          buildInputs = with pkgs; [
--- a/wrk/graph.py
+++ b/wrk/graph.py
@ -0,0 +1,90 @@
 import re
 import os
 import matplotlib.pyplot as plt
 from matplotlib.ticker import FuncFormatter
 from collections import defaultdict
 import statistics
 directory = "./wrk"  # Replace with the actual directory path
 requests_sec = defaultdict(list)
 transfers_sec = defaultdict(list)
 mean_requests = {}
 mean_transfers = {}
 def plot(kind='', title='', ylabel='', means=None):
    # Sort the labels and requests_sec lists together based on the requests_sec values
    labels = []
    values = []
    # silly, I know
    for k, v in means.items():
        labels.append(k)
        values.append(v)
    # sort the labels and value lists 
    labels, values = zip(*sorted(zip(labels, values), key=lambda x: x[1], reverse=True))
    # Plot the graph
    plt.figure(figsize=(10, 6))  # Adjust the figure size as needed
    bars = plt.bar(labels, values)
    plt.xlabel("Subject")
    plt.ylabel(ylabel)
    plt.title(title)
    plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
    # Display the actual values on top of the bars
    for bar in bars:
        yval = bar.get_height()
        plt.text(bar.get_x() + bar.get_width() / 2, yval, f'{yval:,.2f}', ha='center', va='bottom')
    plt.tight_layout()  # Adjust the spacing of the graph elements
    png_name = f"{directory}/{kind.lower()}_graph.png"
    plt.savefig(png_name)  # Save the graph as a PNG file
    print(f"Generated: {png_name}")
 if __name__ == '__main__':
    if not os.path.isdir(".git"):
        print("Please run from root directory of the repository!")
        print("e.g. python wrk/graph.py")
        import sys
        sys.exit(1)
    # Iterate over the files in the directory
    for filename in os.listdir(directory):
        if filename.endswith(".perflog"):
            label = os.path.splitext(filename)[0]
            file_path = os.path.join(directory, filename)
            with open(file_path, "r") as file:
                lines = file.readlines()
                for line in lines: 
                    # Extract the Requests/sec value using regular expressions
                    match = re.search(r"Requests/sec:\s+([\d.]+)", line)
                    if match:
                        requests_sec[label].append(float(match.group(1)))
                    match = re.search(r"Transfer/sec:\s+([\d.]+)", line)
                    if match:
                        value = float(match.group(1))
                        if 'KB' in line:
                            value *= 1024
                        elif 'MB' in line:
                            value *= 1024 * 1024
                        value /= 1024.0 * 1024
                        transfers_sec[label].append(value)
    # calculate means
    for k, v in requests_sec.items():
        mean_requests[k] = statistics.mean(v)
    for k, v in transfers_sec.items():
        mean_transfers[k] = statistics.mean(v)
    # save the plots
    plot(kind='req_per_sec', title='Requests/sec Comparison',
         ylabel='requests/sec', means=mean_requests)
    plot(kind='xfer_per_sec', title='Transfer/sec Comparison',
         ylabel='transfer/sec [MB]', means=mean_transfers)
--- a/wrk/measure.sh
+++ b/wrk/measure.sh
@ -14,7 +14,7 @@ if [ "$SUBJECT" = "" ] ; then
    exit 1
 fi
-if [ "$SUBJECT" = "zig" ] ; then
+if [ "$SUBJECT" = "zig-zap" ] ; then
    zig build -Doptimize=ReleaseFast wrk > /dev/null
    $TSK_SRV ./zig-out/bin/wrk &
    PID=$!
@ -41,20 +41,27 @@ if [ "$SUBJECT" = "python" ] ; then
    URL=http://127.0.0.1:8080
 fi
-if [ "$SUBJECT" = "sanic" ] ; then
+if [ "$SUBJECT" = "python-sanic" ] ; then
    $TSK_SRV python wrk/sanic/sanic-app.py &
    PID=$!
    URL=http://127.0.0.1:8000
 fi
-if [ "$SUBJECT" = "rust" ] ; then
+if [ "$SUBJECT" = "rust-bythebook" ] ; then
-    cd wrk/rust/hello && cargo build --release
+    cd wrk/rust/bythebook && cargo build --release
    $TSK_SRV ./target/release/hello &
    PID=$!
    URL=http://127.0.0.1:7878
 fi
-if [ "$SUBJECT" = "axum" ] ; then
+if [ "$SUBJECT" = "rust-clean" ] ; then
    cd wrk/rust/clean && cargo build --release
    $TSK_SRV ./target/release/hello &
    PID=$!
    URL=http://127.0.0.1:7878
 fi
 if [ "$SUBJECT" = "rust-axum" ] ; then
    cd wrk/axum/hello-axum && cargo build --release
    $TSK_SRV ./target/release/hello-axum &
    PID=$!
@ -68,7 +75,7 @@ if [ "$SUBJECT" = "csharp" ] ; then
    URL=http://127.0.0.1:5026
 fi
-if [ "$SUBJECT" = "cpp" ] ; then
+if [ "$SUBJECT" = "cpp-beast" ] ; then
    cd wrk/cpp && zig build -Doptimize=ReleaseFast
    $TSK_SRV ./zig-out/bin/cpp-beast 127.0.0.1 8070 . &
    PID=$!
--- a/wrk/measure_all.sh
+++ b/wrk/measure_all.sh
@ -0,0 +1,20 @@
 #! /usr/bin/env bash
 if [ ! -d ".git" ] ; then
    echo "This script must be run from the root directory of the repository!"
    echo "./wrk/measure_all.sh"
    exit 1
 fi
 SUBJECTS="zig-zap go python python-sanic rust-bythebook rust-clean rust-axum csharp cpp-beast"
 rm -f wrk/*.perflog
 for S in $SUBJECTS; do
    L="$S.perflog"
    for R in 1 2 3 ; do
        ./wrk/measure.sh $S | tee -a wrk/$L
    done
 done
 echo "Finished"
--- a/wrk/rust/bythebook/.gitignore
+++ b/wrk/rust/bythebook/.gitignore
--- a/wrk/rust/bythebook/Cargo.toml
+++ b/wrk/rust/bythebook/Cargo.toml
--- a/wrk/rust/bythebook/hello.html
+++ b/wrk/rust/bythebook/hello.html
--- a/wrk/rust/bythebook/src/lib.rs
+++ b/wrk/rust/bythebook/src/lib.rs
--- a/wrk/rust/bythebook/src/main.rs
+++ b/wrk/rust/bythebook/src/main.rs
--- a/wrk/rust/clean/.gitignore
+++ b/wrk/rust/clean/.gitignore
@ -0,0 +1,14 @@
 # Generated by Cargo
 # will have compiled files and executables
 debug/
 target/
 # Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
 # More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
 Cargo.lock
 # These are backup files generated by rustfmt
 **/*.rs.bk
 # MSVC Windows builds of rustc generate these, which store debugging information
 *.pdb
--- a/wrk/rust/clean/Cargo.toml
+++ b/wrk/rust/clean/Cargo.toml
@ -0,0 +1,9 @@
 [package]
 name = "hello"
 version = "0.1.0"
 edition = "2021"
 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
 [dependencies]
 # crossbeam = { version = "0.8.2", features = ["crossbeam-channel"] }
--- a/wrk/rust/clean/hello.html
+++ b/wrk/rust/clean/hello.html
@ -0,0 +1 @@
 Hello from RUST!
--- a/wrk/rust/clean/src/lib.rs
+++ b/wrk/rust/clean/src/lib.rs
@ -0,0 +1,101 @@
 //Crossbeam should, but does not make this faster.
 //use crossbeam::channel::bounded;
 use std::{net::TcpStream, sync::mpsc, thread};
 type Job = (fn(TcpStream), TcpStream);
 type Sender = mpsc::Sender<Job>;
 //type Sender = crossbeam::channel::Sender<Job>;
 type Receiver = mpsc::Receiver<Job>;
 //type Receiver = crossbeam::channel::Receiver<Job>;
 pub struct ThreadPool {
    workers: Vec<Worker>,
    senders: Vec<Sender>,
    next_sender: usize,
 }
 impl ThreadPool {
    /// Create a new ThreadPool.
    ///
    /// The size is the number of threads in the pool.
    ///
    /// # Panics
    ///
    /// The `new` function will panic if the size is zero.
    pub fn new(size: usize) -> ThreadPool {
        assert!(size > 0);
        let mut workers = Vec::with_capacity(size);
        let mut senders = Vec::with_capacity(size);
        for id in 0..size {
            //let (sender, receiver) = bounded(2);
            let (sender, receiver) = mpsc::channel();
            senders.push(sender);
            workers.push(Worker::new(id, receiver));
        }
        ThreadPool {
            workers,
            senders,
            next_sender: 0,
        }
    }
    /// round robin over available workers to ensure we never have to buffer requests
    pub fn execute(&mut self, handler: fn(TcpStream), stream: TcpStream) {
        let job = (handler, stream);
        self.senders[self.next_sender].send(job).unwrap();
        //self.senders[self.next_sender].try_send(job).unwrap();
        self.next_sender += 1;
        if self.next_sender == self.senders.len() {
            self.next_sender = 0;
        }
    }
 }
 impl Drop for ThreadPool {
    fn drop(&mut self) {
        self.senders.clear();
        for worker in &mut self.workers {
            println!("Shutting down worker {}", worker.id);
            if let Some(thread) = worker.thread.take() {
                thread.join().unwrap();
            }
        }
    }
 }
 struct Worker {
    id: usize,
    thread: Option<thread::JoinHandle<()>>,
 }
 impl Worker {
    fn new(id: usize, receiver: Receiver) -> Worker {
        let thread = thread::spawn(move || Self::work(receiver));
        Worker {
            id,
            thread: Some(thread),
        }
    }
    fn work(receiver: Receiver) {
        loop {
            let message = receiver.recv();
            match message {
                Ok((handler, stream)) => {
                    // println!("Worker  got a job; executing.");
                    handler(stream);
                }
                Err(_) => {
                    // println!("Worker  disconnected; shutting down.");
                    break;
                }
            }
        }
    }
 }
--- a/wrk/rust/clean/src/main.rs
+++ b/wrk/rust/clean/src/main.rs
@ -0,0 +1,41 @@
 use hello::ThreadPool;
 use std::io::prelude::*;
 use std::net::TcpListener;
 use std::net::TcpStream;
 fn main() {
    let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
    //Creating a massive amount of threads so we can always have one ready to go.
    let mut pool = ThreadPool::new(128);
    // for stream in listener.incoming().take(2) {
    for stream in listener.incoming() {
        let stream = stream.unwrap();
        //handle_connection(stream);
        pool.execute(handle_connection, stream);
    }
    println!("Shutting down.");
 }
 fn handle_connection(mut stream: TcpStream) {
    stream.set_nodelay(true).expect("set_nodelay call failed");
    let mut buffer = [0; 1024];
    let nbytes = stream.read(&mut buffer).unwrap();
    if nbytes == 0 {
        return;
    }
    let status_line = "HTTP/1.1 200 OK";
    let contents = "HELLO from RUST!";
    let response = format!(
        "{}\r\nContent-Length: {}\r\n\r\n{}",
        status_line,
        contents.len(),
        contents
    );
    stream.write_all(response.as_bytes()).unwrap();
 }
--- a/wrk/samples/laptop_req_per_sec_graph.png
+++ b/wrk/samples/laptop_req_per_sec_graph.png
--- a/wrk/samples/laptop_xfer_per_sec_graph.png
+++ b/wrk/samples/laptop_xfer_per_sec_graph.png
--- a/wrk/samples/workstation_req_per_sec_graph.png
+++ b/wrk/samples/workstation_req_per_sec_graph.png
--- a/wrk/samples/workstation_xfer_per_sec_graph.png
+++ b/wrk/samples/workstation_xfer_per_sec_graph.png
--- a/wrk/zig/main.zig
+++ b/wrk/zig/main.zig
@ -19,6 +19,6 @@ pub fn main() !void {
    // start worker threads
    zap.start(.{
        .threads = 4,
-        .workers = 4,
+        .workers = 2, // empirical tests: yield best perf on my machine
    });
 }