In order to not regress the quality of compile errors, some improvements
had to be made.
* std.zig.parseCharLiteral is improved to return more detailed parse
failure information.
* tokenizer is improved to handle null bytes in the middle of strings,
character literals, and line comments.
* validating how many unicode escape digits in string literals is moved
to std.zig.parseStringLiteral rather than handled in the tokenizer.
* when a tokenizer error occurs, if the reported token is the 'invalid'
tag, an error note is added to point to the invalid byte location.
Further improvements would be:
- Mention the expected set of allowed bytes at this location.
- Display the invalid byte (if printable, print it, otherwise
escape-print it).
This matches the behaviour of other languages and leaves us
the ability to create actual static Wasm archives with
```
zig build-lib -static some.zig
```
which can then be combined with other Wasm object files and linked
into either a Wasm lib or executable using `wasm-ld`.
Update langref to reflect the fact we now ship WASI libc.
Conflicts:
* doc/langref.html.in
* lib/std/enums.zig
* lib/std/fmt.zig
* lib/std/hash/auto_hash.zig
* lib/std/math.zig
* lib/std/mem.zig
* lib/std/meta.zig
* test/behavior/alignof.zig
* test/behavior/bitcast.zig
* test/behavior/bugs/1421.zig
* test/behavior/cast.zig
* test/behavior/ptrcast.zig
* test/behavior/type_info.zig
* test/behavior/vector.zig
Master branch added `try` to a bunch of testing function calls, and some
lines also had changed how to refer to the native architecture and other
`@import("builtin")` stuff.
* Remove some unused imports in AstGen.zig. I think it would make sense
to start decoupling AstGen from the rest of the compiler code,
similar to how the tokenizer and parser are decoupled.
* AstGen: For decls, move the block_inline instructions to the top of
the function so that they get lower ZIR instruction indexes. With
this, the block_inline instruction index combined with its corresponding
break_inline instruction index can be used to form a ZIR instruction
range. This is useful for allocating an array to map ZIR instructions
to semantically analyzed instructions.
* Module: extract emit-h functionality into a struct, and only allocate
it when emit-h is activated.
* Module: remove the `decl_table` field. This previously was a table of
all Decls in the entire Module. A "name hash" strategy was used to
find decls within a given namespace, using this global table. Now,
each Namespace has its own map of name to children Decls.
- Additionally, there were 3 places that relied on iterating over
decl_table in order to function:
- C backend and SPIR-V backend. These now have their own decl_table
that they keep populated when `updateDecl` and `removeDecl` are
called.
- emit-h. A `decl_table` field has been added to the new GlobalEmitH
struct which is only allocated when emit-h is activated.
* Module: fix ZIR serialization/deserialization bug in debug mode having
to do with the secret safety tag for untagged unions. There is still an
open TODO to investigate a friendlier solution to this problem with
the language.
* Module: improve deserialization of ZIR to allocate only exactly as
much capacity as length in the instructions array so as to not waste
space.
* Module: move `srcHashEql` to `std.zig` to live next to the definition
of `SrcHash` itself.
* Module: re-introduce the logic for scanning top level declarations
within a namespace.
* Compilation: add an `analyze_pkg` Job which is used to kick off the
start of semantic analysis by doing the equivalent of
`_ = @import("std");`. The `analyze_pkg` job is unconditionally added
to the work queue on every update(), with pkg set to the std lib pkg.
* Rename TZIR to AIR in a few places. A more comprehensive rename will
come later.
This branch adds "builtin" and "std" to the import table when using the
self-hosted backend.
"builtin" gains one additional item:
```
pub const zig_is_stage2 = true; // false when using stage1 backend
```
This allows the std lib to do conditional compilation based on detecting
which backend is being used. This will be removed from builtin as soon
as self-hosted catches up to feature parity with stage1.
Keep a sharp eye out - people are going to be tempted to abuse this.
The general rule of thumb is do not use `builtin.zig_is_stage2`. However
this commit breaks the rule so that we can gain limited start.zig support
as we incrementally improve the self-hosted compiler.
This commit also implements `fullyQualifiedNameHash` and related
functionality, which effectively puts all Decls in their proper
namespaces. `fullyQualifiedName` is not yet implemented.
Stop printing "todo" log messages for test decls unless we are in test
mode.
Add "previous definition here" error notes for Decl name collisions.
This commit does not bring us yet to a newly passing test case.
Here's what I'm working towards:
```zig
const std = @import("std");
export fn main() c_int {
const a = std.fs.base64_alphabet[0];
return a - 'A';
}
```
Current output:
```
$ ./zig-cache/bin/zig build-exe test.zig
test.zig:3:1: error: TODO implement more analyze elemptr
zig-cache/lib/zig/std/start.zig:38:46: error: TODO implement structInitExpr ty
```
So the next steps are clear:
* Sema: improve elemptr
* AstGen: implement structInitExpr
The memory layout for ZIR instructions is completely reworked. See
zir.zig for those changes. Some new types:
* `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating
each instruction independently, there is now a Tag and 8 bytes of
data available for all ZIR instructions. Small instructions fit
within these 8 bytes; larger ones use 4 bytes for an index into
`extra`. There is also `string_bytes` so that we can have 4 byte
references to strings. `zir.Inst.Tag` describes how to interpret
those 8 bytes of data.
- This is shared by all `Block` scopes.
* `Module.WipZirCode`: represents an in-progress `zir.Code`. In this
structure, the arrays are mutable, and get resized as we add/delete
things. There is extra state to keep track of things. This struct is
stored on the stack. Once it is finished, it produces an immutable
`zir.Code`, which will remain on the heap for the duration of a
function's existence.
- This is shared by all `GenZir` scopes.
* `Sema`: represents in-progress semantic analysis of a `zir.Code`.
This data is stored on the stack and is shared among all `Block`
scopes. It is now the main "self" argument to everything in the file
that was previously named `zir_sema.zig`.
Additionally, I moved some logic that was in `Module` into here.
`Module.Fn` now stores its parameter names inside the `zir.Code`,
instead of inside ZIR instructions. When the TZIR memory layout
reworking time comes, codegen will be able to reference this data
directly instead of duplicating it.
astgen.zig is (so far) almost entirely untouched, but nearly all of it
will need to be reworked to adhere to this new memory layout structure.
I have no benchmarks to report yet, as I am still working through
compile errors and fixing various things that I broke in this branch.
Overhaul of Source Locations:
Previously we used `usize` everywhere to mean byte offset, but sometimes
also mean other stuff. This was error prone and also made us do
unnecessary work, and store unnecessary bytes in memory.
Now there are more types involved into source locations, and more ways
to describe a source location.
* AllErrors.Message: embrace the assumption that files always have less
than 2 << 32 bytes.
* SrcLoc gets more complicated, to model more complicated source
locations.
* Introduce LazySrcLoc, which can model interesting source locations
with very little stored state. Useful for avoiding doing unnecessary
work when no compile errors occur.
Also, previously, we had `src: usize` on every ZIR instruction. This is
no longer the case. Each instruction now determines whether it even cares
about source location, and if so, how that source location is stored.
This requires more careful work inside `Sema`, but it results in fewer
bytes stored on the heap, without compromising accuracy and power of
compile error messages.
Miscellaneous:
* std.zig: string literals have more helpful result values for
reporting errors. There is now a lower level API and a higher level
API.
- side note: I noticed that the string literal logic needs some love.
There is some unnecessarily hacky code there.
* cut & pasted some TZIR logic that was in zir.zig to ir.zig. This
probably broke stuff and needs to get fixed.
* Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't
think this quite how this code will be organized. Need some more
careful planning about how to implement structs, unions, enums. They
need to be independent Decls, just like a top level function.
Conflicts:
* lib/std/zig/ast.zig
* lib/std/zig/parse.zig
* lib/std/zig/parser_test.zig
* lib/std/zig/render.zig
* src/Module.zig
* src/zir.zig
I resolved some of the conflicts by reverting a small portion of
@tadeokondrak's stage2 logic here regarding `callconv(.Inline)`.
It will need to get reworked as part of this branch.
This commit does not reach any particular milestone, it is
work-in-progress towards getting things to build.
There's a `@panic("TODO")` in translate-c that should be removed when
working on translate-c stuff.
Zig's format system is flexible enough to add custom formatters. This PR removes the new z/Z format specifiers that were added for printing Zig identifiers and replaces them with custom formatters.
This is convenient for debugging purposes, as well as simplifying the
caching system since executable basenames will not conflict with their
corresponding object files.
* change some {} to be {s} to gain type safety
* fix libraries being libfoo.lib instead of foo.lib for COFF
* when linking mingw-w64, add the "always link" libs so that we
generate DLL import .lib files for them as the linker code relies on.
* COFF LLD linker does not support -r so we do a file copy as an
alternative to the -r thing that ELF linking does.
I will file an issue for the corresponding TODO upon merging this
branch, to look into an optimization that possibly elides this copy
when the source and destination are both cache directories.
* add a CLI error message when trying to link multiple objects into one
and using COFF object format.
* Don't try to generate C header files yet since it will only cause a
crash saying the feature is unimplemented.
* Rename the CLI options for release modes to use the `-O` prefix to
match C compiler precedent. Options are now `-ODebug`,
`-OReleaseFast`, `-OReleaseSafe`, `-OReleaseSmall`. The optimization
mode matches the enum tags of std.builtin.Mode. It is planned to, at
some point, rename std.builtin.Mode to std.builtin.OptimizationMode
and modify the tags to be lower case to match the style convention.
- Update build.zig code to support this new CLI.
* update std.zig.binNameAlloc to support an optional Version and update
the implementation to correctly deal with dynamic library version
suffixes.
* std.cache_hash exposes Hasher type
* std.cache_hash makes hasher_init a global const
* std.cache_hash supports cloning so that clones can share the same
open manifest dir handle as well as fork from shared hasher state
* start to populate the cache_hash for stage2 builds
* remove a footgun from std.cache_hash add function
* get rid of std.Target.ObjectFormat.unknown
* rework stage2 logic for resolving output artifact names by adding
object_format as an optional parameter to std.zig.binNameAlloc
* support -Denable-llvm in stage2 tests
* Module supports the use case when there are no .zig files
* introduce c_object_table and failed_c_objects to Module
* propagate many new kinds of data from CLI into Module and into
linker.Options
* introduce -fLLVM, -fLLD, -fClang and their -fno- counterparts.
closes#6251.
- add logic for choosing when to use LLD or zig's self-hosted linker
* stub code for implementing invoking Clang to build C objects
* add -femit-h, -femit-h=foo, and -fno-emit-h CLI options
- This avoids having multiple `init()` functions for every combination
of optional parameters
- The API is consistent across all hash functions
- New options can be added later without breaking existing applications.
For example, this is going to come in handy if we implement parallelization
for BLAKE2 and BLAKE3.
- We don't have a mix of snake_case and camelCase functions any more, at
least in the public crypto API
Support for BLAKE2 salt and personalization (more commonly called context)
parameters have been implemented by the way to illustrate this.
Instead of having all primitives and constructions share the same namespace,
they are now organized by category and function family.
Types within the same category are expected to share the exact same API.
* Introduce the concept of anonymous Decls
* Primitive Hello, World with inline asm works
* There is still an unsolved problem of how to manage ZIR instructions
memory when generating from AST. Currently it leaks.
Zig now supports a more fine-grained sense of what is native and what is
not. Some examples:
This is now allowed:
-target native
Different OS but native CPU, default Windows C ABI:
-target native-windows
This could be useful for example when running in Wine.
Different CPU but native OS, native C ABI.
-target x86_64-native -mcpu=skylake
Different C ABI but otherwise native target:
-target native-native-musl
-target native-native-gnu
Lots of breaking changes to related std lib APIs.
Calls to getOs() will need to be changed to getOsTag().
Calls to getArch() will need to be changed to getCpuArch().
Usage of Target.Cross and Target.Native need to be updated to use
CrossTarget API.
`std.build.Builder.standardTargetOptions` is changed to accept its
parameters as a struct with default values. It now has the ability to
specify a whitelist of targets allowed, as well as the default target.
Rather than two different ways of collecting the target, it's now always
a string that is validated, and prints helpful diagnostics for invalid
targets. This feature should now be actually useful, and contributions
welcome to further improve the user experience.
`std.build.LibExeObjStep.setTheTarget` is removed.
`std.build.LibExeObjStep.setTarget` is updated to take a CrossTarget
parameter.
`std.build.LibExeObjStep.setTargetGLibC` is removed. glibc versions are
handled in the CrossTarget API and can be specified with the `-target`
triple.
`std.builtin.Version` gains a `format` method.