Commit graph

184 commits

Author SHA1 Message Date
Alex Rønne Petersen
4c0913ff7c
Merge pull request #23417 from dweiller/zstd-fixes
Zstd fixes
2025-03-31 17:52:23 +02:00
mlugg
8e074f1549
std: remove dependencies on legacy coercion 2025-02-26 00:17:09 +00:00
Shun Sakai
a3ad0a2f77 std.compress.flate.Lookup: Replace invisible doc comments with top-level doc comments
I think it would be better if this invisible doc comments is top-level
doc comments rather than doc comments. Because it is at the start of a
source file. This makes the doc comments visible.
2025-01-22 23:34:57 +09:00
Andrew Kelley
09d021c908 add test coverage for previous commit 2025-01-20 21:41:30 -08:00
Kamil T
d50bae4da9 Fix memcpy alias bug in std.compress.lzma 2025-01-20 21:41:12 -08:00
tgschultz
ba569bb8e9
Rewrite bit_reader and bit_writer to take advantage of current zig semantics and enhance readability (#21689)
Co-authored-by: Tanner Schultz <tgschultz@tgschultz-dl.tail7ba92.ts.net>
2024-10-13 18:44:42 -07:00
mlugg
0fe3fd01dd
std: update std.builtin.Type fields to follow naming conventions
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
2024-08-28 08:39:59 +01:00
Jora Troosh
13070448f5
std: fix typos (#20560) 2024-07-09 14:25:42 -07:00
Michael Bradshaw
642093e04b Rename *[UI]LEB128 functions to *[UI]leb128 2024-06-23 04:30:12 +01:00
Pavel Verigo
d4d1efeb3e std.compress.flate: fix panic when reading into empty buffer 2024-05-09 15:51:42 -07:00
Igor Anić
791c4491a7 compress.xz: remove unnecessary variable
`to_read.items.len is always zero when entering readBlock.
2024-03-13 18:43:36 +01:00
Igor Anić
54f882c4aa compress.xz: make reader loop little more readable
No need to do same error check on two places. First return all
uncompressed data then on last read check error.
2024-03-13 18:41:20 +01:00
Igor Anić
a21f9b6d8b compress.xz: remove copyForwards from tight loop
In the example from the issue #19052 to_read holds 213_315_584
uncompressed bytes. Calling read with small output results in many
shifts of that big buffer.
This removes need to shift to_read after each read.
2024-03-13 18:22:08 +01:00
Igor Anić
a06a305f97 zlib: fix missing comptime attribute 2024-03-04 09:53:01 +01:00
Igor Anić
c680b5d138 compress.zlib: add overshoot test cast
Using example from [zigimg](https://github.com/zigimg/zigimg/pull/164) project.
2024-03-04 09:53:01 +01:00
Igor Anić
f2508abfa6 flate: use 4 bytes lookahead for zlib
That ensures no bytes are left in the BitReader buffer after we reach
end of the stream.
2024-03-04 09:53:01 +01:00
Igor Anić
711281602a flate: option to fill BitReader
fill(0) will fill all bytes in bit reader. If bit reader is aligned to
the byte, as it is at the end of the stream this ensures no overshoot
when reading footer. Footer is 4 bytes (zlib) or 8 bytes (gzip). For
zlib we will use 4 bytes BitReader and 8 for gzip. After align and fill
we will read those bytes and leave BitReader empty after that.
2024-03-04 09:53:01 +01:00
Igor Anić
8a963fd66e flate: 32 bit BitReader
Extend BitReader to accept size of internal buffer. It can be u64 (only
option until now) or u32.
2024-03-04 09:53:01 +01:00
IntegratedQuantum
6e078883ee Expand the memcpy fast path in flate.CircularBuffer.writeMatch to allow for overlapping regions. 2024-02-27 21:26:26 -08:00
Igor Anić
62ce753814 compress: activate tests in wasm32
They were disabled because insufficient stack size.
That is
[changed](d51aa9748f) now.
2024-02-27 19:19:59 -08:00
dweiller
bd0dbb0a13 std.compress.zstd: enable tests for wasm32
The increase in stack size for wasm32 targets in commit d51aa9748f
allows the streaming decompressor to be tested on wasm32-wasi.
2024-02-27 11:37:48 -08:00
Ryan Liptak
726a1149e0 Change many test blocks to doctests/decltests 2024-02-26 15:18:31 -08:00
Ryan Liptak
16b3d1004e Remove redundant test name prefixes now that test names are fully qualified
Follow up to #19079, which made test names fully qualified.

This fixes tests that now-redundant information in their test names. For example here's a fully qualified test name before the changes in this commit:

"priority_queue.test.std.PriorityQueue: shrinkAndFree"

and the same test's name after the changes in this commit:

"priority_queue.test.shrinkAndFree"
2024-02-26 15:18:31 -08:00
Andrew Kelley
d51aa9748f change default WASI stack size
to match the other operating systems. 16 MiB

closes #18885
2024-02-26 10:33:17 -08:00
Robinson Collado
119b2030f7
std.compress.flate: fix typo in function name (#19002) 2024-02-24 20:47:17 -05:00
Andrew Kelley
c44a902836 fix zstd compilation errors from previous commit 2024-02-23 02:37:11 -07:00
dweiller
5c12783094 std.compress.zstd: make DecompressStream options runtime 2024-02-23 02:37:11 -07:00
dweiller
accbba3cd8 std.compress.zstd: disable failing wasm32 tests
This commit can be reverted after
https://github.com/ziglang/zig/pull/18971 is merged.
2024-02-23 02:37:11 -07:00
dweiller
ac1b957e79 std.compress.zstd: remove allocation from DecompressStream 2024-02-23 02:37:11 -07:00
dweiller
73f6d3afb5 std.compress.zstd: fix decompressStreamOptions 2024-02-23 02:37:11 -07:00
dweiller
63fa151f1c std.compress.zstandard: fix buffer sizes
This change corrects the size of various internal buffers used. The
previous behavior did not cause validity problems but wasted space.
2024-02-23 02:37:11 -07:00
Ian Johnson
80f3ef6e14 Package.Fetch: fix Git package fetching
This commit works around #18967 by adding an `AccumulatingReader`, which
accumulates data read from the underlying packfile, and by keeping track
of the position in the packfile and hash/checksum information separately
rather than using reader composition. That is, the packfile position and
hashes/checksums are updated with the accumulated read history data only
after we can determine what data has actually been used by the
decompressor rather than merely being buffered.

The only addition to the standard library APIs to support this change is
the `unreadBytes` function in `std.compress.flate.Inflate`, which allows
the user to determine how many bytes have been read only for buffering
and not used as part of compressed data.

These changes can be reverted if #18967 is resolved with a decompressor
that reads precisely only the number of bytes needed for decompression.
2024-02-19 13:43:32 -08:00
Igor Anić
3e8cb153ea fix flate regression
Until now literal and distance code lengths where treated as two
different arrays. But according to rfc they can overlap:

  The code length repeat codes can cross from HLIT + 257 to the
  HDIST + 1 code lengths.  In other words, all code lengths form
  a single sequence of HLIT + HDIST + 258 values.

Now code lengths are decoded in single array which is then split
to literal and distance part.
2024-02-17 15:31:13 -08:00
Igor Anić
99cb201438 skip failing wasm tests 2024-02-15 00:35:08 +01:00
Igor Anić
fd9db4962c reorganize compress package root folder 2024-02-14 23:34:13 +01:00
Igor Anić
2457b68b2f remove v1 deflate implementation 2024-02-14 22:34:13 +01:00
Igor Anić
e20080be13 preserve valuable tests from v1 implementation
Before removal of v1.
2024-02-14 22:12:54 +01:00
Igor Anić
0afe808928 remove testing struct sizes
It was usefull during development.

From andrewrk code review comment:
In fact, Zig does not guarantee the @sizeOf structs, and so these tests are not valid.
2024-02-14 21:06:45 +01:00
Igor Anić
d49cdf5b2d skip calculating struct sizes on 32 bit platforms 2024-02-14 19:58:45 +01:00
Igor Anić
c2361bf548 fix top level docs comments
I didn't understand the difference.

ref: https://ziglang.org/documentation/0.11.0/#Comments
2024-02-14 18:28:20 +01:00
Igor Anić
5fbc371b41 fix wording in comment 2024-02-14 18:28:20 +01:00
Igor Anić
f81b3a2095 fix reading input stream during decompression
By using read instead of readAll decompression reader could get bytes
then available in the stream and then later wrongly failed with end of
stream.
2024-02-14 18:28:20 +01:00
Igor Anić
d645114f7e add deflate implemented from first principles
Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format.

Fixes #18062.

This PR replaces current compress/gzip and compress/zlib packages. Deflate package is renamed to flate. Flate is common name for deflate/inflate where deflate is compression and inflate decompression.

There are breaking change. Methods signatures are changed because of removal of the allocator, and I also unified API for all three namespaces (flate, gzip, zlib).

Currently I put old packages under v1 namespace they are still available as compress/v1/gzip, compress/v1/zlib, compress/v1/deflate. Idea is to give users of the current API little time to postpone analyzing what they had to change. Although that rises question when it is safe to remove that v1 namespace.

Here is current API in the compress package:

```Zig
// deflate
    fn compressor(allocator, writer, options) !Compressor(@TypeOf(writer))
    fn Compressor(comptime WriterType) type

    fn decompressor(allocator, reader, null) !Decompressor(@TypeOf(reader))
    fn Decompressor(comptime ReaderType: type) type

// gzip
    fn compress(allocator, writer, options) !Compress(@TypeOf(writer))
    fn Compress(comptime WriterType: type) type

    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// zlib
    fn compressStream(allocator, writer, options) !CompressStream(@TypeOf(writer))
    fn CompressStream(comptime WriterType: type) type

    fn decompressStream(allocator, reader) !DecompressStream(@TypeOf(reader))
    fn DecompressStream(comptime ReaderType: type) type

// xz
   fn decompress(allocator: Allocator, reader: anytype) !Decompress(@TypeOf(reader))
   fn Decompress(comptime ReaderType: type) type

// lzma
    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// lzma2
    fn decompress(allocator, reader, writer !void

// zstandard:
    fn DecompressStream(ReaderType, options) type
    fn decompressStream(allocator, reader) DecompressStream(@TypeOf(reader), .{})
    struct decompress
```

The proposed naming convention:
 - Compressor/Decompressor for functions which return type, like Reader/Writer/GeneralPurposeAllocator
 - compressor/compressor for functions which are initializers for that type, like reader/writer/allocator
 - compress/decompress for one shot operations, accepts reader/writer pair, like read/write/alloc

```Zig
/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void

/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@TypeOf(writer))

/// Compressor type
fn Compressor(comptime WriterType: type) type

/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@TypeOf(reader)

/// Decompressor type
fn Decompressor(comptime ReaderType: type) type

```

Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases.
More resutls in [this](https://github.com/ianic/flate) repo.

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K.
For inflate this library allocates 74.5K while std around 36K.

Inflate difference is because we here use 64K history instead of 32K in std.

If merged existing usage of compress gzip/zlib/deflate need some changes. Here is example with necessary changes in comments:

```Zig

const std = @import("std");

// To get this file:
// wget -nc -O war_and_peace.txt https://www.gutenberg.org/ebooks/2600.txt.utf-8
const data = @embedFile("war_and_peace.txt");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    try oldDeflate(allocator);
    try new(std.compress.flate, allocator);

    try oldZlib(allocator);
    try new(std.compress.zlib, allocator);

    try oldGzip(allocator);
    try new(std.compress.gzip, allocator);
}

pub fn new(comptime pkg: type, allocator: std.mem.Allocator) !void {
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    var cmp = try pkg.compressor(buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    var dcp = pkg.decompressor(fbs.reader());

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldDeflate(allocator: std.mem.Allocator) !void {
    const deflate = std.compress.v1.deflate;

    // Compressor
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();
    // Remove allocator
    // Rename deflate -> flate
    var cmp = try deflate.compressor(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finish
    cmp.deinit(); // Remove

    // Decompressor
    var fbs = std.io.fixedBufferStream(buf.items);
    // Remove allocator and last param
    // Rename deflate -> flate
    // Remove try
    var dcp = try deflate.decompressor(allocator, fbs.reader(), null);
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldZlib(allocator: std.mem.Allocator) !void {
    const zlib = std.compress.v1.zlib;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compressStream => compressor
    // Remove allocator
    var cmp = try zlib.compressStream(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // decompressStream => decompressor
    // Remove allocator
    // Remove try
    var dcp = try zlib.decompressStream(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldGzip(allocator: std.mem.Allocator) !void {
    const gzip = std.compress.v1.gzip;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compress => compressor
    // Remove allocator
    var cmp = try gzip.compress(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finisho
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // Rename decompress => decompressor
    // Remove allocator
    // Remove try
    var dcp = try gzip.decompress(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

```
2024-02-14 18:28:20 +01:00
Andrew Kelley
7680c5330c some API work on std.c, std.os, std.os.wasi
* std.c: consolidate some definitions, making them share code. For
  example, freebsd, dragonfly, and openbsd can all share the same
  `pthread_mutex_t` definition.
* add type safety to std.c.O
  - this caught a bug where mode flags were incorrectly passed as the
    open flags.
* 3 fewer uses of usingnamespace keyword
* as per convention, remove purposeless field prefixes from struct field
  names even if they have those prefixes in the corresponding C code.
* fix incorrect wasi libc Stat definition
* remove C definitions from incorrectly being in std.os.wasi
* make std.os.wasi definitions type safe
* go through wasi native APIs even when linking libc because the libc
  APIs are problematic and wasteful
* don't expose WASI definitions in std.posix
* remove std.os.wasi.rights_t.ALL: this is a footgun. should it be all
  future rights too? or only all current rights known? both are
  the wrong answer.
2024-02-11 13:38:55 -07:00
Jacob Young
4dfca01de4 gzip: implement compression 2024-01-29 14:30:23 -08:00
mlugg
51595d6b75
lib: correct unnecessary uses of 'var' 2023-11-19 09:55:07 +00:00
dweiller
138a35df8f zstandard: fix division by zero when using RingBuffer
This change fixes some division-by-zero bugs introduced by the optimized
ring buffer read/write functions in d8c067966.

There are edge cases where decompression can use a length zero ring
buffer as the size of the ring buffer used is exactly the the window
size specified by a Zstandard frame, and this can be zero. Switching
away from loops to mem copies means that we need to ensure ring buffers
do not have length zero ring when attempting to read/write from them.
2023-11-10 15:18:16 -05:00
Jacob Young
509be7cf1f x86_64: fix std test failures 2023-11-03 23:18:21 -04:00
dweiller
f6de3ec963 zstandard: fix incorrect RLE decompression into ring buffer
This reverts a change introduced in #17400 causing a bug when
decompressing an RLE block into a ring buffer.

RLE blocks contain only a single byte of data to copy into the output,
so attempting to copy a slice causes buffer overruns and incorrect
decompression.
2023-11-03 23:03:43 -04:00
Andrew Kelley
3fc6fc6812 std.builtin.Endian: make the tags lower case
Let's take this breaking change opportunity to fix the style of this
enum.
2023-10-31 21:37:35 -04:00