Commit graph

93 commits

Author SHA1 Message Date
Andrew Kelley
79f267f6b9 std.Io: delete GenericReader
and delete deprecated alias std.io
2025-08-29 17:14:26 -07:00
Andrew Kelley
30b41dc510 std.compress.zstd.Decompress fixes
* std.Io.Reader: appendRemaining no longer supports alignment and has
  different rules about how exceeding limit. Fixed bug where it would
  return success instead of error.StreamTooLong like it was supposed to.

* std.Io.Reader: simplify appendRemaining and appendRemainingUnlimited
  to be implemented based on std.Io.Writer.Allocating

* std.Io.Writer: introduce unreachableRebase

* std.Io.Writer: remove minimum_unused_capacity from Allocating. maybe
  that flexibility could have been handy, but let's see if anyone
  actually needs it. The field is redundant with the superlinear growth
  of ArrayList capacity.

* std.Io.Writer: growingRebase also ensures total capacity on the
  preserve parameter, making it no longer necessary to do
  ensureTotalCapacity at the usage site of decompression streams.

* std.compress.flate.Decompress: fix rebase not taking into account seek

* std.compress.zstd.Decompress: split into "direct" and "indirect" usage
  patterns depending on whether a buffer is provided to init, matching
  how flate works. Remove some overzealous asserts that prevented buffer
  expansion from within rebase implementation.

* std.zig: fix readSourceFileToAlloc returning an overaligned slice
  which was difficult to free correctly.

fixes #24608
2025-08-15 10:44:35 -07:00
Andrew Kelley
749f10af49 std.ArrayList: make unmanaged the default 2025-08-11 15:52:49 -07:00
Linus Groh
eb37552536 Remove numerous things deprecated during the 0.14 release cycle
Basically everything that has a direct replacement or no uses left.

Notable omissions:

- std.ArrayHashMap: Too much fallout, needs a separate cleanup.
- std.debug.runtime_safety: Too much fallout.
- std.heap.GeneralPurposeAllocator: Lots of references to it remain, not
  a simple find and replace as "debug allocator" is not equivalent to
  "general purpose allocator".
- std.io.Reader: Is being reworked at the moment.
- std.unicode.utf8Decode(): No replacement, needs a new API first.
- Manifest backwards compat options: Removal would break test data used
  by TestFetchBuilder.
- panic handler needs to be a namespace: Many tests still rely on it
  being a function, needs a separate cleanup.
2025-07-11 08:17:43 +02:00
Andrew Kelley
0e37ff0d59 std.fmt: breaking API changes
added adapter to AnyWriter and GenericWriter to help bridge the gap
between old and new API

make std.testing.expectFmt work at compile-time

std.fmt no longer has a dependency on std.unicode. Formatted printing
was never properly unicode-aware. Now it no longer pretends to be.

Breakage/deprecations:
* std.fs.File.reader -> std.fs.File.deprecatedReader
* std.fs.File.writer -> std.fs.File.deprecatedWriter
* std.io.GenericReader -> std.io.Reader
* std.io.GenericWriter -> std.io.Writer
* std.io.AnyReader -> std.io.Reader
* std.io.AnyWriter -> std.io.Writer
* std.fmt.format -> std.fmt.deprecatedFormat
* std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
* std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
* std.fmt.fmtSliceHexLower -> {x}
* std.fmt.fmtSliceHexUpper -> {X}
* std.fmt.fmtIntSizeDec -> {B}
* std.fmt.fmtIntSizeBin -> {Bi}
* std.fmt.fmtDuration -> {D}
* std.fmt.fmtDurationSigned -> {D}
* {} -> {f} when there is a format method
* format method signature
  - anytype -> *std.io.Writer
  - inferred error set -> error{WriteFailed}
  - options -> (deleted)
* std.fmt.Formatted
  - now takes context type explicitly
  - no fmt string
2025-07-07 22:43:51 -07:00
mlugg
018262d537
std: update eval branch quotas after bdbc485
Also, update `std.math.Log2Int[Ceil]` to more efficient implementations
that don't use up so much damn quota!
2024-08-21 01:30:46 +01:00
Andrew Kelley
377e8579f9 std.zig.tokenizer: simplify
I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
  occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.

Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.

Other changes included in this commit:

* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
  function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
  runtime assertion in favor of using the type system.

After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.
2024-07-31 16:57:42 -07:00
Andrew Kelley
08e83fee57
Merge pull request #20297 from sno2/wtf8-conversion-buffer-overflows
std: fix buffer overflows from improper WTF encoding
2024-07-28 20:24:31 -07:00
Ryan Liptak
959d227d13 ArgIteratorWindows: Reduce allocated memory by parsing the WTF-16 string directly
Before this commit, the WTF-16 command line string would be converted to WTF-8 in `init`, and then a second buffer of the WTF-8 size + 1 would be allocated to store the parsed arguments. The converted WTF-8 command line would then be parsed and the relevant bytes would be copied into the argument buffer before being returned.

After this commit, only the WTF-8 size of the WTF-16 string is calculated (without conversion) which is then used to allocate the buffer for the parsed arguments. Parsing is then done on the WTF-16 slice directly, with the arguments being converted to WTF-8 on-the-fly.

This has a few (minor) benefits:

- Cuts the amount of memory allocated by ArgIteratorWindows in half (or better)
- Makes the total amount of memory allocated by ArgIteratorWindows predictable, since, before, the upfront `wtf16LeToWtf8Alloc` call could end up allocating more-memory-than-necessary temporarily due to its internal use of an ArrayList. Now, the amount of memory allocated is always exactly `calcWtf8Len(cmd_line) + 1`.
2024-07-13 14:48:17 -07:00
Carter Snook
56929795a8 std.unicode: add encode overflow check function and friends 2024-06-14 15:40:54 -05:00
Ryan Liptak
76fb2b685b std: Convert deprecated aliases to compile errors and fix usages
Deprecated aliases that are now compile errors:

- `std.fs.MAX_PATH_BYTES` (renamed to `std.fs.max_path_bytes`)
- `std.mem.tokenize` (split into `tokenizeAny`, `tokenizeSequence`, `tokenizeScalar`)
- `std.mem.split` (split into `splitSequence`, `splitAny`, `splitScalar`)
- `std.mem.splitBackwards` (split into `splitBackwardsSequence`, `splitBackwardsAny`, `splitBackwardsScalar`)
- `std.unicode`
  + `utf16leToUtf8Alloc`, `utf16leToUtf8AllocZ`, `utf16leToUtf8`, `fmtUtf16le` (all renamed to have capitalized `Le`)
  + `utf8ToUtf16LeWithNull` (renamed to `utf8ToUtf16LeAllocZ`)
- `std.zig.CrossTarget` (moved to `std.Target.Query`)

Deprecated `lib/std/std.zig` decls were deleted instead of made a `@compileError` because the `refAllDecls` in the test block would trigger the `@compileError`. The deleted top-level `std` namespaces are:

- `std.rand` (renamed to `std.Random`)
- `std.TailQueue` (renamed to `std.DoublyLinkedList`)
- `std.ChildProcess` (renamed/moved to `std.process.Child`)

This is not exhaustive. Deprecated aliases that I didn't touch:
  + `std.io.*`
  + `std.Build.*`
  + `std.builtin.Mode`
  + `std.zig.c_translation.CIntLiteralRadix`
  + anything in `src/`
2024-06-13 10:18:59 -04:00
Ryan Liptak
84f4c5d9cc std.unicode: Fix ArrayList functions when using populated ArrayLists
ensureTotalCapacityPrecise only satisfies the assumptions made in the ArrayListImpl functions (that there's already enough capacity for the entire converted string if it's all ASCII) when the ArrayList has no items, otherwise it would hit illegal behavior.
2024-04-23 03:20:38 -07:00
mlugg
9c3670fc93
compiler: implement analysis-local comptime-mutable memory
This commit changes how we represent comptime-mutable memory
(`comptime var`) in the compiler in order to implement the intended
behavior that references to such memory can only exist at comptime.

It does *not* clean up the representation of mutable values, improve the
representation of comptime-known pointers, or fix the many bugs in the
comptime pointer access code. These will be future enhancements.

Comptime memory lives for the duration of a single Sema, and is not
permitted to escape that one analysis, either by becoming runtime-known
or by becoming comptime-known to other analyses. These restrictions mean
that we can represent comptime allocations not via Decl, but with state
local to Sema - specifically, the new `Sema.comptime_allocs` field. All
comptime-mutable allocations, as well as any comptime-known const allocs
containing references to such memory, live in here. This allows for
relatively fast checking of whether a value references any
comptime-mtuable memory, since we need only traverse values up to
pointers: pointers to Decls can never reference comptime-mutable memory,
and pointers into `Sema.comptime_allocs` always do.

This change exposed some faulty pointer access logic in `Value.zig`.
I've fixed the important cases, but there are some TODOs I've put in
which are definitely possible to hit with sufficiently esoteric code. I
plan to resolve these by auditing all direct accesses to pointers (most
of them ought to use Sema to perform the pointer access!), but for now
this is sufficient for all realistic code and to get tests passing.

This change eliminates `Zcu.tmp_hack_arena`, instead using the Sema
arena for comptime memory mutations, which is possible since comptime
memory is now local to the current Sema.

This change should allow `Decl` to store only an `InternPool.Index`
rather than a full-blown `ty: Type, val: Value`. This commit does not
perform this refactor.
2024-03-25 14:49:41 +00:00
Andrew Kelley
12191c8a22 std: promote tests to doctests
Now these show up as "example usage" in generated documentation.
2024-03-21 14:11:46 -07:00
Jacob Young
2fcb2f5975 Sema: implement vector coercions
These used to be lowered elementwise in air, and now are a single air
instruction that can be lowered elementwise in the backend if necessary.
2024-02-25 11:22:10 +01:00
Jacob Young
2fdc9e6ae8 x86_64: implement @shuffle 2024-02-25 11:22:10 +01:00
Ryan Liptak
68b87918df Fix handling of Windows (WTF-16) and WASI (UTF-8) paths
Windows paths now use WTF-16 <-> WTF-8 conversion everywhere, which is lossless. Previously, conversion of ill-formed UTF-16 paths would either fail or invoke illegal behavior.

WASI paths must be valid UTF-8, and the relevant function calls have been updated to handle the possibility of failure due to paths not being encoded/encodable as valid UTF-8.

Closes #18694
Closes #1774
Closes #2565
2024-02-24 14:05:24 -08:00
Ryan Liptak
f6b6b8a4ae Add std.unicode.fmtUtf8 that can handle ill-formed UTF-8
Ill-formed UTF-8 byte sequences are replaced by the replacement character (U+FFFD) according to "U+FFFD Substitution of Maximal Subparts" from Chapter 3 of the Unicode standard, and as specified by https://encoding.spec.whatwg.org/#utf-8-decoder
2024-02-24 14:04:59 -08:00
Ryan Liptak
4ee1309a8d std.unicode: Refactor and add WTF-16/WTF-8 functions
Renamed functions for consistent `Le` capitalization and conventions:

- utf16leToUtf8Alloc -> utf16LeToUtf8Alloc
- utf16leToUtf8AllocZ -> utf16LeToUtf8AllocZ
- utf16leToUtf8 -> utf16LeToUtf8
- utf8ToUtf16LeWithNull -> utf8ToUtf16LeAllocZ
- fmtUtf16le -> fmtUtf16Le

New UTF related functions:

- utf16LeToUtf8ArrayList
- utf8ToUtf16LeArrayList
- utf8ToUtf16LeAlloc
- isSurrogateCodepoint

(the ArrayList functions are mostly to allow the Alloc and AllocZ to share an implementation)

New WTF related functions/structs:

- wtf8Encode
- wtf8Decode
- wtf8ValidateSlice
- Wtf8View
- Wtf8Iterator
- wtf16LeToWtf8ArrayList
- wtf16LeToWtf8Alloc
- wtf16LeToWtf8AllocZ
- wtf16LeToWtf8
- wtf8ToWtf16LeArrayList
- wtf8ToWtf16LeAlloc
- wtf8ToWtf16LeAllocZ
- wtf8ToWtf16Le
- wtf8ToUtf8Lossy
- wtf8ToUtf8LossyAlloc
- wtf8ToUtf8LossyAllocZ
- Wtf16LeIterator
2024-02-24 14:04:58 -08:00
vinnichase
279607cae5
Fix fmt UTF-8 characters as fill (#18533)
Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com>
2024-01-13 22:47:03 -05:00
Andrew Kelley
6a32d58876
Merge pull request #18318 from castholm/simd-segfault
Rename `simd.suggestVectorSize` to clarify intent and fix related segfault
2024-01-09 17:13:58 -08:00
davideger
e426ae43ae
Updated Utf8View example to format the single codepoint UTF-8 slice with {s} (#18288) 2024-01-01 18:47:27 -05:00
Carl Åstholm
59ac0d1eed Deprecate suggestVectorSize in favor of suggestVectorLength
The function returns the vector length, not the byte size of the vector or the bit size of individual elements. This distinction is very important and some usages of this function in the stdlib operated under these incorrect assumptions.
2024-01-01 16:18:57 +01:00
Meghan Denny
6a12fd62c1 std: make std.unicode.initComptime() a comptime-known function
resolved a TODO :)
2023-12-08 15:59:17 +02:00
Ryan Liptak
15a6b27957 std.unicode: Disable utf8 -> utf16 ASCII fast path on mips
Fixes a compile error when the target is mips, since std.simd.interlace does not work correctly on mips and raises a compile error if it is used.
2023-11-21 13:51:03 +02:00
mlugg
51595d6b75
lib: correct unnecessary uses of 'var' 2023-11-19 09:55:07 +00:00
Andrew Kelley
3fc6fc6812 std.builtin.Endian: make the tags lower case
Let's take this breaking change opportunity to fix the style of this
enum.
2023-10-31 21:37:35 -04:00
Jacob Young
d890e81761 mem: fix ub in writeInt
Use inline to vastly simplify the exposed API.  This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
2023-10-31 21:37:35 -04:00
Ryan Liptak
13c8ec9db0 std.unicode: Add ASCII fast path to UTF-16 -> UTF-8 conversion functions 2023-10-31 02:23:35 -07:00
Ryan Liptak
03117c5290 std.unicode: Add ASCII fast path to UTF-8 -> UTF-16 conversion functions 2023-10-31 02:23:33 -07:00
Jacob Young
fe93332ba2 x86_64: implement enough to pass unicode tests
* implement vector comparison
 * implement reduce for bool vectors
 * fix `@memcpy` bug
 * enable passing std tests
2023-10-23 22:42:18 -04:00
Jacob Young
27fe945a00 Revert "Revert "Merge pull request #17637 from jacobly0/x86_64-test-std""
This reverts commit 6f0198cadb.
2023-10-22 15:46:43 -04:00
Andrew Kelley
6f0198cadb Revert "Merge pull request #17637 from jacobly0/x86_64-test-std"
This reverts commit 0c99ba1eab, reversing
changes made to 5f92b070bf.

This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
2023-10-22 12:16:35 -07:00
Jacob Young
ccc9ebf0bd std: slightly improve codegen of std.unicode.utf8ValidateSlice 2023-10-22 12:07:23 -04:00
Jacob Young
2e6e39a700 x86_64: fix bugs and disable erroring tests 2023-10-21 10:55:41 -04:00
Veikka Tuominen
c919e9a280 std.simd: return comptime_int from suggestVectorSize 2023-10-13 16:58:05 +03:00
Karl Seguin
d68f39b541
std.unicode.utf8ValidateSlice: optimize implementation (#17329)
Originally inspired by Go's `utf8.Valid` function. Includes some test cases from Go's test suite.

Further optimized to be faster in all tested cases (short/long ascii/UTF8), in all release modes.

Takes advantage of SIMD for the ASCII fast path.
2023-10-06 23:49:21 -04:00
Ryan Liptak
a155e35850
std.json: Fix decoding of UTF-16 surrogate pairs (#16830)
* std.unicode: Add more UTF-16 decoding functions

This mostly makes parts of Utf16LeIterator reusable

* std.json: Fix decoding of UTF-16 surrogate pairs

Before this commit, there were 524,288 codepoints that would get decoded improperly. After this commit, there are 0.

Fixes #16828
2023-08-15 09:11:59 -04:00
mlugg
f26dda2117 all: migrate code to new cast builtin syntax
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:

* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
2023-06-24 16:56:39 -07:00
Eric Joldasov
d884d7050e
all: replace comptime try with try comptime
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
2023-06-13 23:46:58 +06:00
dweiller
bd3360e03d convert s[start..start+len] to s[start..][0..len] 2023-05-07 15:55:21 +10:00
mlugg
ccf670c2b0 Zir: implement explicit block_comptime instruction
Resolves: #7056
2023-04-12 12:06:19 -04:00
Andrew Kelley
50eb7983cd remove most conditional compilation based on stage1
There are still a few occurrences of "stage1" in the standard library
and self-hosted compiler source, however, these instances need a bit
more careful inspection to ensure no breakage.
2022-12-06 20:38:54 -07:00
Andrew Kelley
ceb0a632cf std.mem.Allocator: allow shrink to fail
closes #13535
2022-11-29 23:30:38 -07:00
Jan Philipp Hafer
cf744cf04f
add suggestions by ifreund
also remove 2 redundant and outcommented tests
2022-05-17 18:56:06 +02:00
Jan Philipp Hafer
405f4286f7
std.unicode: add utf16 byte length and codepoints counting routines
* comptime and runtime tests are based on tests for counting utf8 code points
2022-05-17 18:54:29 +02:00
r00ster91
62d717e2ff Add std.unicode.replacement_character 2022-04-15 11:20:11 +03:00
r00ster
c4aac28a42 Reuse code in Utf8Iterator.nextCodepoint 2022-04-12 05:34:12 -04:00
PhaseMage
8a97807d68
Full response file (*.rsp) support
I hit the "quotes in an RSP file" issue when trying to compile gRPC using
"zig cc". As a fun exercise, I decided to see if I could fix it myself.
I'm fully open to this code being flat-out rejected. Or I can take feedback
to fix it up.

This modifies (and renames) _ArgIteratorWindows_ in process.zig such that
it works with arbitrary strings (or the contents of an RSP file).

In main.zig, this new _ArgIteratorGeneral_ is used to address the "TODO"
listed in _ClangArgIterator_.

This change closes #4833.

**Pros:**

- It has the nice attribute of handling "RSP file" arguments in the same way it
  handles "cmd_line" arguments.
- High Performance, minimal allocations
- Fixed bug in previous _ArgIteratorWindows_, where final trailing backslashes
  in a command line were entirely dropped
- Added a test case for the above bug
- Harmonized the _ArgIteratorXxxx._initWithAllocator()_ and _next()_ interface
  across Windows/Posix/Wasi (Moved Windows errors to _initWithAllocator()_
  rather than _next()_)
- Likely perf benefit on Windows by doing _utf16leToUtf8AllocZ()_ only once
  for the entire cmd_line

**Cons:**

- Breaking Change in std library on Windows: Call
  _ArgIterator.initWithAllocator()_ instead of _ArgIterator.init()_
- PhaseMage is new with contributions to Zig, might need a lot of hand-holding
- PhaseMage is a Windows person, non-Windows stuff will need to be double-checked

**Testing Done:**

- Wrote a few new test cases in process.zig
- zig.exe build test -Dskip-release (no new failures seen)
- zig cc now builds gRPC without error
2022-01-30 21:27:52 +02:00
Lee Cannon
85de022c56
allocgate: std Allocator interface refactor 2021-11-30 23:32:47 +00:00