mirrors/zig - "Borealis" Git by INX: Hosted by INX "Xenon".

mirror of https://codeberg.org/ziglang/zig.git synced 2025-12-07 14:24:43 +00:00

Author	SHA1	Message	Date
Isaac Freund	0f3fa4d654	zig fmt: array types	2021-02-05 11:36:19 -08:00
Isaac Freund	6f3b93e2e8	zig fmt: struct and anon array initialization	2021-02-05 10:51:45 -08:00
Isaac Freund	3e960cfffe	zig fmt: float literal with exponent	2021-02-05 10:51:45 -08:00
Andrew Kelley	7069459a76	zig fmt: implement struct init	2021-02-04 19:59:06 -07:00
Andrew Kelley	8e46d06650	zig fmt: implement fn protos and defers	2021-02-04 16:38:29 -07:00
Asherah Connor	4428acf0f7	zig fmt: deref, unwrap optional	2021-02-04 10:49:45 -08:00
Andrew Kelley	725adf8332	zig fmt: builtin calls and array access	2021-02-03 22:12:11 -07:00
Andrew Kelley	f5279cbada	zig fmt: implement top-level fields	2021-02-03 17:02:12 -07:00
Andrew Kelley	1a83b29bea	zig fmt: implement if, call, field access, assignment	2021-02-02 21:05:53 -07:00
Andrew Kelley	0c6b98b825	zig fmt: implement simple test with doc comments	2021-02-01 21:31:41 -07:00
Andrew Kelley	272a0ab359	zig fmt: implement "line comment followed by top-level comptime"	2021-02-01 20:11:55 -07:00
Andrew Kelley	20554d32c0	zig fmt: start reworking with new memory layout * start implementation of ast.Tree.firstToken and lastToken * clarify some ast.Node doc comments * reimplement renderToken	2021-02-01 17:23:49 -07:00
Andrew Kelley	4dca99d3f6	stage2: rework AST memory layout This is a proof-of-concept of switching to a new memory layout for tokens and AST nodes. The goal is threefold: * smaller memory footprint * faster performance for tokenization and parsing * most importantly, a proof-of-concept that can be also applied to ZIR and TZIR to improve the entire compiler pipeline in this way. I had a few key insights here: * Underlying premise: using less memory will make things faster, because of fewer allocations and better cache utilization. Also using less memory is valuable in and of itself. * Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of padding between the enum tag (which kind of token is it; which kind of AST node is it) and the next fields in the struct. It also improves cache coherence, since one can peek ahead in the tokens array without having to load the source locations of tokens. * Token memory can be conserved by only having the tag (1 byte) and byte offset (4 bytes) for a total of 5 bytes per token. It is not necessary to store the token ending byte offset because one can always re-tokenize later, but also most tokens the length can be trivially determined from the tag alone, and for ones where it doesn't, string literals for example, one must parse the string literal again later anyway in astgen, making it free to re-tokenize. * AST nodes do not actually need to store more than 1 token index because one can poke left and right in the tokens array very cheaply. So far we are left with one big problem though: how can we put AST nodes into an array, since different AST nodes are different sizes? This is where my key observation comes in: one can have a hash table for the extra data for the less common AST nodes! But it gets even better than that: I defined this data that is always present for every AST Node: * tag (1 byte) - which AST node is it * main_token (4 bytes, index into tokens array) - the tag determines which token this points to * struct{lhs: u32, rhs: u32} - enough to store 2 indexes to other AST nodes, the tag determines how to interpret this data You can see how a binary operation, such as `a * b` would fit into this structure perfectly. A unary operation, such as `a` would also fit, and leave `rhs` unused. So this is a total of 13 bytes per AST node. And again, we don't have to pay for the padding to round up to 16 because we store in struct-of-arrays format. I made a further observation: the only kind of data AST nodes need to store other than the main_token is indexes to sub-expressions. That's it. The only purpose of an AST is to bring a tree structure to a list of tokens. This observation means all the data that nodes store are only sets of u32 indexes to other nodes. The other tokens can be found later by the compiler, by poking around in the tokens array, which again is super fast because it is struct-of-arrays, so you often only need to look at the token tags array, which is an array of bytes, very cache friendly. So for nearly every kind of AST node, you can store it in 13 bytes. For the rarer AST nodes that have 3 or more indexes to other nodes to store, either the lhs or the rhs will be repurposed to be an index into an extra_data array which contains the extra AST node indexes. In other words, no hash table needed, it's just 1 big ArrayList with the extra data for AST Nodes. Final observation, no need to have a canonical tag for a given AST. For example: The expression `foo(bar)` is a function call. Function calls can have any number of parameters. However in this example, we can encode the function call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs for the function expr and rhs for the only parameter expr. Meanwhile if the code was `foo(bar, baz)` then the AST node would have to be `FunctionCall` with lhs still being the function expr, but rhs being the index into `extra_data`. Then because the tag is `FunctionCall` it means `extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end". Now the range `extra_data[start..end]` describes the list of parameters to the function. Point being, you only have to pay for the extra bytes if the AST actually requires it. There's no limit to the number of different AST tag encodings. Preliminary results: 15% improvement on cache-misses * 28% improvement on total instructions executed * 26% improvement on total CPU cycles * 22% improvement on wall clock time This is 1/4 items on the checklist before this can actually be merged: * [x] parser * [ ] render (zig fmt) * [ ] astgen * [ ] translate-c	2021-01-30 20:16:59 -07:00
Tadeo Kondrak	0b5f3c2ef9	Replace @TagType uses, mostly with std.meta.Tag	2021-01-30 22:26:44 +02:00
Jay Petacat	a9b505fa77	Reduce use of deprecated IO types Related: #4917	2021-01-07 23:48:58 -08:00
Andrew Kelley	974c008a0e	convert more {} to {d} and {s}	2021-01-02 19:03:14 -07:00
LemonBoy	dd973fb365	std: Use {s} instead of {} when printing strings	2021-01-02 17:12:57 -07:00
Frank Denis	6c2e0c2046	Year++	2020-12-31 15:45:24 -08:00
LemonBoy	fa6449dac0	zig fmt: Fix alignment of initializer elements Resetting `column_counter` is not needed as the effective column number is calculated by taking that value modulo `row_size`. Closes #7289	2020-12-11 02:34:44 -05:00
Vexu	be71994fb1	zig fmt: improve var decl initializer formatting	2020-12-09 13:47:22 +02:00
Vexu	a63fd34c50	return a valid node even if invalid deref was used	2020-10-29 19:20:15 +02:00
Travis	d7f9128b5d	add error message to zig side of tokenizing/parsing	2020-10-29 12:03:45 -05:00
Lachlan Easton	4496a6c9cc	zig fmt: Special case un-indent comma after multiline string in param list	2020-09-18 20:34:00 +10:00
Lachlan Easton	1aacedf6e1	zig fmt: Fix regression in ArrayInitializers	2020-09-18 20:34:00 +10:00
Lachlan Easton	40b6e86a99	zig fmt: fix #6171	2020-09-18 20:34:00 +10:00
Lachlan Easton	206a8cf670	zig fmt: fix comments and multiline literals in function args	2020-09-18 20:34:00 +10:00
Lachlan Easton	291482a031	zig fmt: Don't consider width of expressions containing multiline string literals when calculating padding for array initializers. fixes #3739 Changes some of the special casing for multiline string literals.	2020-09-18 20:34:00 +10:00
Lachlan Easton	e1bd271192	zig fmt: Allow trailing comments to do manual array formatting. close #5948	2020-09-18 20:34:00 +10:00
Lachlan Easton	9f0821e688	zig fmt: Fix erroneously commented out code, add passing test case to close #5722	2020-09-18 20:34:00 +10:00
Lachlan Easton	ea6181aaf6	zig fmt: Add test for nesting if expressions	2020-09-18 20:34:00 +10:00
Lachlan Easton	601331833a	Add passing test. close #5343	2020-09-09 21:54:42 +10:00
Lachlan Easton	283d441c19	zig fmt: fix #3978 , fix #2748	2020-09-09 21:54:42 +10:00
Lachlan Easton	bb848dbeee	zig fmt: Patch rename stream to ais (auto indenting stream) & other small refactors	2020-09-02 20:16:28 +10:00
Lachlan Easton	bc24b86d82	zig fmt: Fix regression not covered by testing	2020-09-01 13:19:34 +10:00
Lachlan Easton	029ec456bc	zig fmt: Set indent_delta to 2 when rendering inline asm	2020-08-31 23:39:50 +10:00
Lachlan Easton	5aca3baea6	zig fmt: Remove dynamic stack from auto-indenting-stream	2020-08-31 23:39:50 +10:00
Lachlan Easton	a72b9d403d	Refactor zig fmt indentation. Remove indent from rendering code and have a stream handle automatic indentation	2020-08-29 13:35:00 +10:00
Andrew Kelley	4a69b11e74	add license header to all std lib files add SPDX license identifier copyright ownership is zig contributors	2020-08-20 16:07:04 -04:00
Vexu	f962315363	fix missing parser error for missing comma before eof Closes #5952	2020-07-30 13:10:55 +03:00
Andrew Kelley	804b51b179	stage2: VarDecl and FnProto take advantage of TrailerFlags API These AST nodes now have a flags field and then a bunch of optional trailing objects. The end result is lower memory usage and consequently better performance. This is part of an ongoing effort to reduce the amount of memory parsed ASTs take up. Running `zig fmt` on the std lib: * cache-misses: 2,554,321 => 2,534,745 * instructions: 3,293,220,119 => 3,302,479,874 * peak memory: 74.0 MiB => 73.0 MiB Holding the entire std lib AST in memory at the same time: 93.9 MiB => 88.5 MiB	2020-07-15 02:07:30 -07:00
Vexu	1a989ba39d	fix parser tests and add test for anytype conversion	2020-07-11 21:20:50 +03:00
Vexu	010c58e303	fix zig fmt out of bounds on empty file	2020-05-30 23:07:51 +03:00
Jakub Konka	e61e8c94be	Reenable zig parser tests disabled targeting Wasm I'm not sure why I disabled them when landing extended Wasm/WASI support, but they pass the parser tests just fine now, so I'm gonna go ahead and re-enable them.	2020-05-26 21:01:54 -04:00
Vexu	e07b467c7c	fix missing compile error on while/for missing block	2020-05-25 23:25:06 +03:00
Andrew Kelley	8df0841d6a	stage2 parser: token ids in their own array To prevent cache misses, token ids go in their own array, and the start/end offsets go in a different one. perf measurement before: 2,667,914 cache-misses:u 2,139,139,935 instructions:u 894,167,331 cycles:u perf measurement after: 1,757,723 cache-misses:u 2,069,932,298 instructions:u 858,105,570 cycles:u	2020-05-22 12:34:12 -04:00
Andrew Kelley	93384f7428	use singly linked lists for std.zig.parse std.ast uses a singly linked list for lists of things. This is a breaking change to the self-hosted parser API. std.ast.Tree has been separated into a private "Parser" type which represents in-progress parsing, and std.ast.Tree which has only "output" data. This means cleaner, but breaking, API for parse results. Specifically, `tokens` and `errors` are no longer SegmentedList but a slice. The way to iterate over AST nodes has necessarily changed since lists of nodes are now singly linked lists rather than SegmentedList. From these changes, I observe the following on the self-hosted-parser benchmark from ziglang/gotta-go-fast: throughput: 45.6 MiB/s => 55.6 MiB/s maxrss: 359 KB => 342 KB This commit breaks the build; more updates are necessary to fix API usage of the self-hosted parser.	2020-05-19 21:22:52 -04:00
Jakub Konka	3d267bab71	Re-enable refAllDecls gen and check in std.zig	2020-05-18 21:05:29 +02:00
Vexu	28d449b38d	fix zig fmt regression	2020-05-17 15:13:19 +03:00
Vexu	081ffe24cf	fix infinite loop with invalid comptime	2020-05-16 19:23:59 +03:00
Vexu	ed62081d38	recover from missing semicolon after if stmt	2020-05-16 12:29:01 +03:00

... 3 4 5 6 7

328 commits