Commit graph

583 commits

Author SHA1 Message Date
Andrew Kelley
7b255235d6 wasm linker: fix off-by-one in function table indexes 2025-01-15 15:11:36 -08:00
Andrew Kelley
9c14645b58 wasm codegen: fix freeing of locals 2025-01-15 15:11:36 -08:00
Andrew Kelley
1c4b4fb516 implement indirect function table for object functions 2025-01-15 15:11:36 -08:00
Andrew Kelley
21a2888561 wasm linker: don't assume nav callees are fully resolved
codegen can be called which contains calls to navs which have only their
type resolved. this means the indirect function table needs to track nav
indexes not ip indexes.
2025-01-15 15:11:36 -08:00
Andrew Kelley
7d224516c4 wasm linker: chase relocations for references 2025-01-15 15:11:36 -08:00
Andrew Kelley
eb943890d9 resolve merge conflicts
with 497592c9b4
2025-01-15 15:11:36 -08:00
Andrew Kelley
f1e167c1d8 use fixed writer in more places 2025-01-15 15:11:36 -08:00
Andrew Kelley
728103467e wasm linker: implement indirect function calls 2025-01-15 15:11:36 -08:00
Andrew Kelley
389b29fd8c wasm linker: avoid recursion in lowerZcuData
instead of recursion, callers of the function are responsible for
checking the respective tables that might have new entries in them and
then calling lowerZcuData again.
2025-01-15 15:11:36 -08:00
Andrew Kelley
568d9936ab wasm codegen: fix call_indirect 2025-01-15 15:11:36 -08:00
Andrew Kelley
91efc5c98b wasm linker: fix calling imported functions
and more disciplined type safety for output function indexes
2025-01-15 15:11:35 -08:00
Andrew Kelley
1a58ae2ed6 wasm codegen: fix extra index not relative 2025-01-15 15:11:35 -08:00
Andrew Kelley
85b53730fe add safety for calling functions that get virtual addrs 2025-01-15 15:11:35 -08:00
Andrew Kelley
2d899e9a9f wasm codegen: fix wrong union field for locals 2025-01-15 15:11:35 -08:00
Andrew Kelley
416fc2df94 complete wasm.Emit implementation 2025-01-15 15:11:35 -08:00
Andrew Kelley
458f658b42 wasm linker: implement missing logic
fix some compilation errors for reworked Emit now that it's actually
referenced

introduce DataSegment.Id for sorting data both from object files and
from the Zcu.

introduce optimization: data segment sorting includes a descending sort
on reference count so that references to data can be smaller integers
leading to better LEB encodings. this optimization is skipped for object
files.

implement uav address access function which is based on only 1 hash
table lookup to find out the offset after sorting.
2025-01-15 15:11:35 -08:00
Andrew Kelley
4ecc4addc4 wasm codegen: remove dependency on PerThread where possible 2025-01-15 15:11:35 -08:00
Andrew Kelley
098e0b1906 wasm codegen: fix lowering of 32/64 float rt calls 2025-01-15 15:11:35 -08:00
Andrew Kelley
55773aee3f remove bad deinit 2025-01-15 15:11:35 -08:00
Andrew Kelley
e21a42723b wasm linker: implement name, module name, and type for function imports 2025-01-15 15:11:35 -08:00
Andrew Kelley
031c84c8cb wasm: fix many compilation errors
Still, the branch is not yet passing semantic analysis.
2025-01-15 15:11:35 -08:00
Andrew Kelley
bf20a4aa9e wasm: use call_intrinsic MIR instruction 2025-01-15 15:11:35 -08:00
Andrew Kelley
c443a7a57f wasm: move error_name lowering to Emit phase 2025-01-15 15:11:35 -08:00
Andrew Kelley
d45e5ac5eb wasm codegen: rename func: CodeGen to cg: CodeGen 2025-01-15 15:11:35 -08:00
Andrew Kelley
4a1447d1db wasm codegen: switch on bool instead of int 2025-01-15 15:11:35 -08:00
Andrew Kelley
e24f635c75 wasm: implement errors_len as a MIR opcode with no linker involvement 2025-01-15 15:11:35 -08:00
Andrew Kelley
bffa148600 wasm codegen: fix some compilation errors 2025-01-15 15:11:35 -08:00
Andrew Kelley
e521879e47 rewrite wasm/Emit.zig
mainly, rework how relocations works. This is the point at which symbol
indexes are known - not before. And don't emit unnecessary relocations!
They're only needed when emitting an object file.

Changes wasm linker to keep MIR around long-lived so that fixups can be
reapplied after linker garbage collection.

use labeled switch while we're at it
2025-01-15 15:11:35 -08:00
Andrew Kelley
943dac3e85 compiler: add type safety for export indices 2025-01-15 15:11:35 -08:00
Andrew Kelley
795e7c64d5 wasm linker: aggressive DODification
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
2025-01-15 15:11:35 -08:00
mlugg
3afda4322c
compiler: analyze type and value of global declaration separately
This commit separates semantic analysis of the annotated type vs value
of a global declaration, therefore allowing recursive and mutually
recursive values to be declared.

Every `Nav` which undergoes analysis now has *two* corresponding
`AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val`
unit is responsible for *fully resolving* the `Nav`: determining its
value, linksection, addrspace, etc. The `nav_ty` unit, on the other
hand, resolves only the information necessary to construct a *pointer*
to the `Nav`: its type, addrspace, etc. (It does also analyze its
linksection, but that could be moved to `nav_val` I think; it doesn't
make any difference).

Analyzing a `nav_ty` for a declaration with no type annotation will just
mark a dependency on the `nav_val`, analyze it, and finish. Conversely,
analyzing a `nav_val` for a declaration *with* a type annotation will
first mark a dependency on the `nav_ty` and analyze it, using this as
the result type when evaluating the value body.

The `nav_val` and `nav_ty` units always have references to one another:
so, if a `Nav`'s type is referenced, its value implicitly is too, and
vice versa. However, these dependencies are trivial, so, to save memory,
are only known implicitly by logic in `resolveReferences`.

In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the
corresponding `Nav`. There are two exceptions to this. If the
declaration is an `extern` declaration, then we immediately ensure the
`Nav` value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag `.@"fn"`, we again
immediately resolve the `Nav` value. The latter restriction is in place
for two reasons:

* Functions are special, in that their externs are allowed to trivially
  alias; i.e. with a declaration `extern fn foo(...)`, you can write
  `const bar = foo;`. This is not allowed for non-function externs, and
  it means that function types are the only place where it is possible
  for a declaration `Nav` to have a `.@"extern"` value without actually
  being declared `extern`. We need to identify this situation
  immediately so that the `decl_ref` can create a pointer to the *real*
  extern `Nav`, not this alias.
* In certain situations, such as taking a pointer to a `Nav`, Sema needs
  to queue analysis of a runtime function if the value is a function. To
  do this, the function value needs to be known, so we need to resolve
  the value immediately upon `&foo` where `foo` is a function.

This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.

A consequence of this commit is that codegen and linking logic needs to
be more careful when looking at `Nav`s. In general:

* When `updateNav` or `updateFunc` is called, it is safe to assume that
  the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully
  resolved.
* Any `Nav` whose value is/will be an `@"extern"` or a function is fully
  resolved; see `Nav.getExtern` for a helper for a common case here.
* Any other `Nav` may only have its type resolved.

This didn't seem to be too tricky to satisfy in any of the existing
codegen/linker backends.

Resolves: #131
2024-12-24 02:18:41 +00:00
Jacob Young
c894ac09a3 dwarf: fix stepping through an inline loop containing one statement
Previously, stepping from the single statement within the loop would
always exit the loop because all of the code unrolled from the loop is
associated with the same line and treated by the debugger as one line.
2024-11-24 17:28:12 -05:00
mlugg
d11bbde5f9
compiler: remove anonymous struct types, unify all tuples
This commit reworks how anonymous struct literals and tuples work.

Previously, an untyped anonymous struct literal
(e.g. `const x = .{ .a = 123 }`) was given an "anonymous struct type",
which is a special kind of struct which coerces using structural
equivalence. This mechanism was a holdover from before we used
RLS / result types as the primary mechanism of type inference. This
commit changes the language so that the type assigned here is a "normal"
struct type. It uses a form of equivalence based on the AST node and the
type's structure, much like a reified (`@Type`) type.

Additionally, tuples have been simplified. The distinction between
"simple" and "complex" tuple types is eliminated. All tuples, even those
explicitly declared using `struct { ... }` syntax, use structural
equivalence, and do not undergo staged type resolution. Tuples are very
restricted: they cannot have non-`auto` layouts, cannot have aligned
fields, and cannot have default values with the exception of `comptime`
fields. Tuples currently do not have optimized layout, but this can be
changed in the future.

This change simplifies the language, and fixes some problematic
coercions through pointers which led to unintuitive behavior.

Resolves: #16865
2024-10-31 20:42:53 +00:00
Andrew Kelley
ba2d006634 link.File.Wasm: remove the "files" abstraction
Removes the `files` field from the Wasm linker, storing the ZigObject
as its own field instead using a tagged union.

This removes a layer of indirection when accessing the ZigObject, and
untangles logic so that we can introduce a "pre-link" phase that
prepares the linker state to handle only incremental updates to the
ZigObject and then minimize logic inside flush().

Furthermore, don't make array elements store their own indexes, that's
always a waste.

Flattens some of the file system hierarchy and unifies variable names
for easier refactoring.

Introduces type safety for optional object indexes.
2024-10-30 19:34:58 -07:00
mlugg
ec19086aa0
compiler: remove @setAlignStack
This commit finishes implementing #21209 by removing the
`@setAlignStack` builtin in favour of `CallingConvention` payloads. The
x86_64 backend is updated to use the stack alignment given in the
calling convention (the LLVM backend was already updated in a previous
commit).

Resolves: #21209
2024-10-19 19:15:23 +01:00
mlugg
bc797a97b1
std: update for new CallingConvention
The old `CallingConvention` type is replaced with the new
`NewCallingConvention`. References to `NewCallingConvention` in the
compiler are updated accordingly. In addition, a few parts of the
standard library are updated to use the new type correctly.
2024-10-19 19:15:23 +01:00
mlugg
51706af908
compiler: introduce new CallingConvention
This commit begins implementing accepted proposal #21209 by making
`std.builtin.CallingConvention` a tagged union.

The stage1 dance here is a little convoluted. This commit introduces the
new type as `NewCallingConvention`, keeping the old `CallingConvention`
around. The compiler uses `std.builtin.NewCallingConvention`
exclusively, but when fetching the type from `std` when running the
compiler (e.g. with `getBuiltinType`), the name `CallingConvention` is
used. This allows a prior build of Zig to be used to build this commit.
The next commit will update `zig1.wasm`, and then the compiler and
standard library can be updated to completely replace
`CallingConvention` with `NewCallingConvention`.

The second half of #21209 is to remove `@setAlignStack`, which will be
implemented in another commit after updating `zig1.wasm`.
2024-10-19 19:08:59 +01:00
Pavel Verigo
4b89a4c7cb stage2-wasm: airRem + airMod for floats 2024-10-08 20:58:15 +02:00
David Rubin
043b1adb8d
remove @fence (#21585)
closes #11650
2024-10-04 22:21:27 +00:00
Linus Groh
8588964972 Replace deprecated default initializations with decl literals 2024-09-12 16:01:23 +01:00
Jacob Young
e046977354 codegen: implement output to the .debug_info section 2024-09-10 12:27:57 -04:00
mlugg
cb68c0917a
wasm: un-regress loop and switch_br
`.loop` is also a block, so the block_depth must be stored *after* block
creation, ensuring a correct block_depth to jump back to when receiving
`.repeat`.

This also un-regresses `switch_br` which now correctly handles ranges
within cases. It supports it for both jump tables as well as regular
conditional branches.
2024-09-01 18:30:31 +01:00
mlugg
5e12ca9fe3
compiler: implement labeled switch/continue 2024-09-01 18:30:31 +01:00
mlugg
5fb4a7df38
Air: add explicit repeat instruction to repeat loops
This commit introduces a new AIR instruction, `repeat`, which causes
control flow to move back to the start of a given AIR loop. `loop`
instructions will no longer automatically perform this operation after
control flow reaches the end of the body.

The motivation for making this change now was really just consistency
with the upcoming implementation of #8220: it wouldn't make sense to
have this feature work significantly differently. However, there were
already some TODOs kicking around which wanted this feature. It's useful
for two key reasons:

* It allows loops over AIR instruction bodies to loop precisely until
  they reach a `noreturn` instruction. This allows for tail calling a
  few things, and avoiding a range check on each iteration of a hot
  path, plus gives a nice assertion that validates AIR structure a
  little. This is a very minor benefit, which this commit does apply to
  the LLVM and C backends.

* It should allow for more compact ZIR and AIR to be emitted by having
  AstGen emit `repeat` instructions more often rather than having
  `continue` statements `break` to a `block` which is *followed* by a
  `repeat`. This is done in status quo because `repeat` instructions
  only ever cause the direct parent block to repeat. Now that AIR is
  more flexible, this flexibility can be pretty trivially extended to
  ZIR, and we can then emit better ZIR. This commit does not implement
  this.

Support for this feature is currently regressed on all self-hosted
native backends, including x86_64. This support will be added where
necessary before this branch is merged.
2024-09-01 18:30:31 +01:00
mlugg
1b000b90c9
Air: direct representation of ranges in switch cases
This commit modifies the representation of the AIR `switch_br`
instruction to represent ranges in cases. Previously, Sema emitted
different AIR in the case of a range, where the `else` branch of the
`switch_br` contained a simple `cond_br` for each such case which did a
simple range check (`x > a and x < b`). Not only does this add
complexity to Sema, which we would like to minimize, but it also gets in
the way of the implementation of #8220. That proposal turns certain
`switch` statements into a looping construct, and for optimization
purposes, we want to lower this to AIR fairly directly (i.e. without
involving a `loop` instruction). That means we would ideally like a
single instruction to represent the entire `switch` statement, so that
we can dispatch back to it with a different operand as in #8220. This is
not really possible to do correctly under the status quo system.

This commit implements lowering of this new `switch_br` usage in the
LLVM and C backends. The C backend just turns any case containing ranges
entirely into conditionals, as before. The LLVM backend is a little
smarter, and puts scalar items into the `switch` instruction, only using
conditionals for the range cases (which direct to the same bb). All
remaining self-hosted backends are temporarily regressed in the presence
of switch range cases. This functionality will be restored for at least
the x86_64 backend before merge.
2024-09-01 18:30:31 +01:00
mlugg
0fe3fd01dd
std: update std.builtin.Type fields to follow naming conventions
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
2024-08-28 08:39:59 +01:00
Jacob Young
8c3f6c72c0 Dwarf: fix and test string format 2024-08-27 02:09:59 -04:00
mlugg
6808ce27bd
compiler,lib,test,langref: migrate @setCold to @branchHint 2024-08-27 00:44:35 +01:00
mlugg
457c94d353
compiler: implement @branchHint, replacing @setCold
Implements the accepted proposal to introduce `@branchHint`. This
builtin is permitted as the first statement of a block if that block is
the direct body of any of the following:

* a function (*not* a `test`)
* either branch of an `if`
* the RHS of a `catch` or `orelse`
* a `switch` prong
* an `or` or `and` expression

It lowers to the ZIR instruction `extended(branch_hint(...))`. When Sema
encounters this instruction, it sets `sema.branch_hint` appropriately,
and `zirCondBr` etc are expected to reset this value as necessary. The
state is on `Sema` rather than `Block` to make it automatically
propagate up non-conditional blocks without special handling. If
`@panic` is reached, the branch hint is set to `.cold` if none was
already set; similarly, error branches get a hint of `.unlikely` if no
hint is explicitly provided. If a condition is comptime-known, `cold`
hints from the taken branch are allowed to propagate up, but other hints
are discarded. This is because a `likely`/`unlikely` hint just indicates
the direction this branch is likely to go, which is redundant
information when the branch is known at comptime; but `cold` hints
indicate that control flow is unlikely to ever reach this branch,
meaning if the branch is always taken from its parent, then the parent
is also unlikely to ever be reached.

This branch information is stored in AIR `cond_br` and `switch_br`. In
addition, `try` and `try_ptr` instructions have variants `try_cold` and
`try_ptr_cold` which indicate that the error case is cold (rather than
just unlikely); this is reachable through e.g. `errdefer unreachable` or
`errdefer @panic("")`.

A new API `unwrapSwitch` is introduced to `Air` to make it more
convenient to access `switch_br` instructions. In time, I plan to update
all AIR instructions to be accessed via an `unwrap` method which returns
a convenient tagged union a la `InternPool.indexToKey`.

The LLVM backend lowers branch hints for conditional branches and
switches as follows:

* If any branch is marked `unpredictable`, the instruction is marked
  `!unpredictable`.
* Any branch which is marked as `cold` gets a
  `llvm.assume(i1 true) [ "cold"() ]` call to mark the code path cold.
* If any branch is marked `likely` or `unlikely`, branch weight metadata
  is attached with `!prof`. Likely branches get a weight of 2000, and
  unlikely branches a weight of 1. In `switch` statements, un-annotated
  branches get a weight of 1000 as a "middle ground" hint, since there
  could be likely *and* unlikely *and* un-annotated branches.

For functions, a `cold` hint corresponds to the `cold` function
attribute, and other hints are currently ignored -- as far as I can tell
LLVM doesn't really have a way to lower them. (Ideally, we would want
the branch hint given in the function to propagate to call sites.)

The compiler and standard library do not yet use this new builtin.

Resolves: #21148
2024-08-27 00:41:49 +01:00
David Rubin
863f74dcd2
comp: rename module to zcu 2024-08-25 15:17:21 -07:00