* unify the logic for exporting math functions from compiler-rt,
with the appropriate suffixes and prefixes.
- add all missing f128 and f80 exports. Functions with missing
implementations call other functions and have TODO comments.
- also add f16 functions
* move math functions from freestanding libc to compiler-rt (#7265)
* enable all the f128 and f80 code in the stage2 compiler and behavior
tests (#11161).
* update std lib to use builtins rather than `std.math`.
There were a few minor bugs in the rounding behavior and Inf/NaN
handling for the f80 __addxf3 and __subtf3 functions.
This change updates the original generic implementation to correctly
handle f80 floats, including the explicit integer bit.
Some SPARC CPUs (particularly old and/or embedded ones) only has atomic
TAS instruction available (`ldstub`). This adds support for emitting
that instruction in the spinlock.
* goals
- zig as linker for object files generated by other compilers
- zig-specific runtime features for eventual standardisation
* changes
- missing routines are marked with `missing`
- structure inspired by libgcc docs, but improved order and wording
- rename misspelled functions
- reorder and rephrase compiler_rt.zig to reflect documentation
- potential decimal float or fixed-point arithmetic support:
* 'Decimal float library routines' ca. 120 functions
* 'Fixed-point fractional library routines' ca. 300 functions
thanks to @Vexu for multiple reviews and @scheibo for review
Get rid of `std.math.F80Repr`. Instead of trying to match the memory
layout of f80, we treat it as a value, same as the other floating point
types. The functions `make_f80` and `break_f80` are introduced to
compose an f80 value out of its parts, and the inverse operation.
stage2 LLVM backend: fix pointer to zero length array tripping LLVM
assertion. It now checks for when the element type is a zero-bit type
and lowers such thing the same way that pointers to other zero-bit types
are lowered.
Both stage1 and stage2 LLVM backends are adjusted so that f80 is lowered
as x86_fp80 on x86_64 and i386 architectures, and identical to a u80 on
others. LLVM constants are lowered in a less hacky way now that #10860
is fixed, by using the expression `(exp << 64) | fraction` using llvm
constants.
Sema is improved to handle c_longdouble by recursively handling it
correctly for whatever the float bit width is. In both stage1 and
stage2.
- approach by Hacker's Delight with wrapping subtraction
- performance expected to be similar to addo
- tests with all relevant combinations of min,max with -1,0,+1 and all
combinations of sequences +-1,2,4..,max
- approach by Hacker's Delight with wrapping addition
- ca. 1.10x perf over the standard approach on my laptop
- tests with all combinations of min,max with -1,0,+1 and combinations of
sequences +-1,2,4..,max
We're going to remove the first parameter from this function in the
future. Stage2 already ignores the first parameter. So we put an `@as`
in here to make it work for both.
- use usize to decide if register size is big enough to store
multiplication result or if division is necessary
- multiplication routine with check of integer bounds
- wrapping multipliation and division routine from Hacker's Delight
Before this commit, compiling an empty main with Stage 2 on macOS x86_64 results in
```
../stage2/bin/zig build-exe -ODebug -fLLVM empty_main.zig
error: sub-compilation of compiler_rt failed
[...]/zig/stage2/lib/zig/std/special/compiler_rt/os_version_check.zig:26:10: error: TODO: Sema.zirStructInit for runtime-known struct values
```
By assigning the value to a variable we can sidestep the issue for now.
This allows stage2 to build more of compiler-rt.
I also changed `-%` to `-` for comptime ints in the div and mul
implementations of compiler-rt. This is clearer code and also happens to
work around a bug in stage2.
This improves readability as well as compatibility with stage2. Most of
compiler-rt is now enabled for stage2 with just a few functions disabled
(until stage2 passes more behavior tests).
- neg can only overflow, if a == MIN
- case `-0` is properly handled by hardware, so overflow check by comparing
`a == MIN` is sufficient
- tests: MIN, MIN+1, MIN+4, -42, -7, -1, 0, 1, 7..
See #1290
- abs can only overflow, if a == MIN
- comparing the sign change from wrapping addition is branchless
- tests: MIN, MIN+1,..MIN+4, -42, -7, -1, 0, 1, 7..
See #1290
- adds __cmpsi2, __cmpdi2, __cmpti2
- adds __ucmpsi2, __ucmpdi2, __ucmpti2
- use 2 if statements with 2 temporaries and a constant
- tests: MIN, MIN+1, MIN/2, -1, 0, 1, MAX/2, MAX-1, MAX if applicable
See #1290
- use negXi2.zig to prevent confusion with negXf2.zig
- used for size optimized builds and machines without carry instruction
- tests: special cases 0, -INT_MIN
* use divTrunc range and shift with constant offsets
See #1290
- each byte gets masked, shifted and combined
- use boring masks instead of comptime for readability
- tests: bit patterns with reverse operation, if applicable
See #1290
- use Bit Twiddling Hacks: Compute parity in parallel
- test cases derived from popcount.zig
- tests: compare naive approach 10_000 times with random numbers created
from naive seed 42
- compiler_rt.zig: sort by LLVM builtin order and add comments to improve structure
See #1290
- apply simpler approach than LLVM for __popcountdi2
taken from The Art of Computer Programming and generalized
- rename popcountdi2.zig to popcount.zig
- test cases derived from popcountdi2_test.zig
- tests: compare naive approach 10_000 times with
random numbers created from naive seed 42
See #1290