Commit graph

5 commits

Author SHA1 Message Date
mlugg
6808ce27bd
compiler,lib,test,langref: migrate @setCold to @branchHint 2024-08-27 00:44:35 +01:00
Harrison McCarty
8ff01f78f3 std.fmt.parseFloat: add f80 formatFloat support 2024-07-14 11:19:34 +12:00
Alex Kladov
772debb03a reduce AstGen.numberLiteral stack usage
At the moment, the LLVM IR we generate for this fn is

define internal fastcc void @AstGen.numberLiteral ...  {
Entry:
  ...
  %16 = alloca %"fmt.parse_float.decimal.Decimal(f128)", align 8
  ...

That `Decimal` is huuuge! It stores

    pub const max_digits =  11564;
    digits: [max_digits]u8,

on the stack.

It comes from `convertSlow` function, which LLVM happily inlined,
despite it being the cold path. Forbid inlining that to not penalize
callers with excessive stack usage.

Backstory: I was looking for needles memcpys in TigerBeetle, and came up
with this copyhound.zig tool for doing just that:

   ee67e2ab95/src/copyhound.zig

Got curious, run it on the Zig's own code base, and looked at some of
the worst offenders.

List of worst offenders:

warning: crypto.kyber_d00.Kyber.SecretKey.decaps: 7776 bytes memcpy
warning: crypto.ff.Modulus.powPublic: 8160 bytes memcpy
warning: AstGen.numberLiteral: 11584 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 13984 bytes memcpy
warning: http.Client.connectUnproxied: 16896 bytes memcpy
warning: crypto.tls.Client.init__anon_133566: 16904 bytes memcpy
warning: objcopy.ElfFileHelper.tryCompressSection: 32768 bytes memcpy

Note from Andrew: I removed `noinline` from this commit since it should
be enough to set it to be cold.
2023-07-20 12:51:18 -07:00
mlugg
f26dda2117 all: migrate code to new cast builtin syntax
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:

* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
2023-06-24 16:56:39 -07:00
Marc Tiehuis
2085a4af56 add new float-parser based on eisel-lemire algorithm
The previous float-parsing method was lacking in a lot of areas. This
commit introduces a state-of-the art implementation that is both
accurate and fast to std.

Code is derived from working repo https://github.com/tiehuis/zig-parsefloat.
This includes more test-cases and performance numbers that are present
in this commit.

* Accuracy

The primary testing regime has been using test-data found at
https://github.com/tiehuis/parse-number-fxx-test-data. This is a fork of
upstream with support for f128 test-cases added. This data has been
verified against other independent implementations and represents
accurate round-to-even IEEE-754 floating point semantics.

* Performance

Compared to the existing parseFloat implementation there is ~5-10x
performance improvement using the above corpus. (f128 parsing excluded
in below measurements).

** Old

    $ time ./test_all_fxx_data
    3520298/5296694 succeeded (1776396 fail)

    ________________________________________________________
    Executed in   28.68 secs    fish           external
       usr time   28.48 secs    0.00 micros   28.48 secs
       sys time    0.08 secs  694.00 micros    0.08 secs

** This Implementation

    $ time ./test_all_fxx_data
    5296693/5296694 succeeded (1 fail)

    ________________________________________________________
    Executed in    4.54 secs    fish           external
       usr time    4.37 secs  515.00 micros    4.37 secs
       sys time    0.10 secs  171.00 micros    0.10 secs

Further performance numbers can be seen using the
https://github.com/tiehuis/simple_fastfloat_benchmark/ repository, which
compares against some other well-known string-to-float conversion
functions. A breakdown can be found here:

0d9f020f1a/PERFORMANCE.md (commit-b15406a0d2e18b50a4b62fceb5a6a3bb60ca5706)

In summary, we are within 20% of the C++ reference implementation and
have about ~600-700MB/s throughput on a Intel I5-6500 3.5Ghz.

* F128 Support

Finally, f128 is now completely supported with full accuracy. This does
use a slower path which is possible to improve in future.

* Behavioural Changes

There are a few behavioural changes to note.

 - `parseHexFloat` is now redundant and these are now supported directly
   in `parseFloat`.
 - We implement round-to-even in all parsing routines. This is as
   specified by IEEE-754. Previous code used different rounding
   mechanisms (standard was round-to-zero, hex-parsing looked to use
   round-up) so there may be subtle differences.

Closes #2207.
Fixes #11169.
2022-05-03 16:46:40 +12:00