Commit graph

157 commits

Author SHA1 Message Date
Cody Tapscott
b2950866b1 compiler_rt: Fix rounding/NaN handling for f80 add/sub
There were a few minor bugs in the rounding behavior and Inf/NaN
handling for the f80 __addxf3 and __subtf3 functions.

This change updates the original generic implementation to correctly
handle f80 floats, including the explicit integer bit.
2022-04-18 21:45:46 -07:00
Cody Tapscott
d760cae2b1 compiler_rt: implement __mulxf3 for f80 2022-04-18 20:46:03 -07:00
Koakuma
33956b8e55 compiler_rt: atomics: clr -> clrb 2022-04-15 19:48:25 +07:00
Koakuma
fac2a2e754 compiler_rt: atomics: Formatting change for flag definition 2022-04-15 19:44:46 +07:00
Koakuma
5b283fba77 compiler_rt: atomics: Add Leon CAS instruction check for SPARC atomics 2022-04-15 19:40:36 +07:00
Koakuma
6aa89115f9 ompiler_rt: atomics: Split long lines and add comment on constants 2022-04-15 19:31:55 +07:00
Koakuma
274e2a1ef1 compiler_rt: atomics: Add TAS lock support for SPARC
Some SPARC CPUs (particularly old and/or embedded ones) only has atomic
TAS instruction available (`ldstub`). This adds support for emitting
that instruction in the spinlock.
2022-04-15 08:09:57 +07:00
Cody Tapscott
319555a669 Add floatFractionalBits to replace floatMantissaDigits 2022-04-12 12:33:16 -07:00
Cody Tapscott
04dd43934a Skip some floatXiYf tests on non-x86 platforms
These need to be skipped because of a bug with `@floatToInt`
on stage1:  https://github.com/ziglang/zig/issues/11408
2022-04-12 10:25:29 -07:00
Cody Tapscott
b5d5685a4e compiler_rt: Implement floatXiYf/fixXfYi, incl f80
This change:
 - Adds  generic implementation of the float -> integer conversion
   functions floatXiYf, including support for f80
 - Updates the existing implementation of integer -> float conversion
   fixXiYf to support f16 and f80
 - Fixes the handling of the explicit integer bit in `__trunctfxf2`
 - Combines the test cases for fixXfYi/floatXiYf into a single file
 - Renames `fmodl` to `fmodq`, since it operates on 128-bit floats

The new implementation for floatXiYf has been benchmarked, and generally
provides equal or better performance versus the current implementations:

Throughput (MiB/s) - Before
     |    u32   |    i32   |    u64   |    i64   |   u128   |   i128   |
-----|----------|----------|----------|----------|----------|----------|
 f16 |     none |     none |     none |     none |     none |     none |
 f32 |  2231.67 |  2001.19 |  1745.66 |  1405.77 |  2173.99 |  1874.63 |
 f64 |  1407.17 |  1055.83 |  2911.68 |  2437.21 |  1676.05 |  1476.67 |
 f80 |     none |     none |     none |     none |     none |     none |
f128 |   327.56 |   321.25 |   645.92 |   654.52 |  1153.56 |  1096.27 |

Throughput (MiB/s) - After
     |    u32   |    i32   |    u64   |    i64   |   u128   |   i128   |
-----|----------|----------|----------|----------|----------|----------|
 f16 |  1407.61 |  1637.25 |  3555.03 |  2594.56 |  3680.60 |  3063.34 |
 f32 |  2101.36 |  2122.62 |  3225.46 |  3123.86 |  2860.05 |  1985.21 |
 f64 |  1395.57 |  1314.87 |  2409.24 |  2196.30 |  2384.95 |  1908.15 |
 f80 |   475.53 |   457.92 |   884.50 |   812.12 |  1475.27 |  1382.16 |
f128 |   359.60 |   350.91 |   723.08 |   706.80 |  1296.42 |  1198.87 |
2022-04-12 10:25:26 -07:00
viri
9b5c02022f
compiler-rt(divtf3): fix remark, add more tests 2022-04-08 20:43:27 -06:00
viri
e46c612503
use math/float.zig everywhere 2022-04-07 05:04:38 -06:00
Meghan
b73cf97c93
replace other uses of std.meta.Vector with @Vector (#11346) 2022-03-30 14:12:14 -04:00
Jan Philipp Hafer
5d89955543 compiler_rt: specify goals, organize README and compiler_rt.zig
* goals
  - zig as linker for object files generated by other compilers
  - zig-specific runtime features for eventual standardisation

* changes
  - missing routines are marked with `missing`
  - structure inspired by libgcc docs, but improved order and wording
  - rename misspelled functions
  - reorder and rephrase compiler_rt.zig to reflect documentation
  - potential decimal float or fixed-point arithmetic support:
    * 'Decimal float library routines' ca. 120 functions
    * 'Fixed-point fractional library routines' ca. 300 functions

thanks to @Vexu for multiple reviews and @scheibo for review
2022-02-23 16:38:51 -05:00
Mateusz Radomski
b5f8fb85e6
Implement f128 @rem 2022-02-13 15:37:38 +02:00
Andrew Kelley
a024aff932 make f80 less hacky; lower as u80 on non-x86
Get rid of `std.math.F80Repr`. Instead of trying to match the memory
layout of f80, we treat it as a value, same as the other floating point
types. The functions `make_f80` and `break_f80` are introduced to
compose an f80 value out of its parts, and the inverse operation.

stage2 LLVM backend: fix pointer to zero length array tripping LLVM
assertion. It now checks for when the element type is a zero-bit type
and lowers such thing the same way that pointers to other zero-bit types
are lowered.

Both stage1 and stage2 LLVM backends are adjusted so that f80 is lowered
as x86_fp80 on x86_64 and i386 architectures, and identical to a u80 on
others. LLVM constants are lowered in a less hacky way now that #10860
is fixed, by using the expression `(exp << 64) | fraction` using llvm
constants.

Sema is improved to handle c_longdouble by recursively handling it
correctly for whatever the float bit width is. In both stage1 and
stage2.
2022-02-12 11:18:23 +01:00
Jan Philipp Hafer
fc59a04061 compiler_rt: add subo
- approach by Hacker's Delight with wrapping subtraction
- performance expected to be similar to addo
- tests with all relevant combinations of min,max with -1,0,+1 and all
  combinations of sequences +-1,2,4..,max
2022-02-08 02:14:29 -05:00
matu3ba
3db130ff3d
compiler_rt: add addo (#10824)
- approach by Hacker's Delight with wrapping addition
- ca. 1.10x perf over the standard approach on my laptop
- tests with all combinations of min,max with -1,0,+1 and combinations of
  sequences +-1,2,4..,max
2022-02-07 13:27:21 -05:00
Andrew Kelley
d4805472c3 compiler_rt: addXf3: add coercion to @clz
We're going to remove the first parameter from this function in the
future. Stage2 already ignores the first parameter. So we put an `@as`
in here to make it work for both.
2022-02-06 20:06:00 -07:00
Andrew Kelley
8dcb1eba60
Merge pull request #10738 from Vexu/f80
Add compiler-rt functions for f80
2022-02-05 20:57:32 -05:00
Jan Philipp Hafer
01d48e55a5 compiler_rt: optimize mulo
- use usize to decide if register size is big enough to store
  multiplication result or if division is necessary
- multiplication routine with check of integer bounds
- wrapping multipliation and division routine from Hacker's Delight
2022-02-05 01:35:46 -05:00
Veikka Tuominen
6a736f0c8c compiler-rt: add add/sub for f80 2022-02-04 22:38:13 +02:00
Veikka Tuominen
9bbd3ab257 compiler-rt: add comparison functions for f80 2022-02-04 22:22:43 +02:00
Veikka Tuominen
72cef17b1a compiler-rt: add trunc functions for f80 2022-02-04 22:18:44 +02:00
Veikka Tuominen
5c4ef1a64c compiler-rt: add extend functions for f80 2022-02-04 22:16:07 +02:00
John Schmidt
9ee67b967b stage2: avoid inferred struct in os_version_check.zig
Before this commit, compiling an empty main with Stage 2 on macOS x86_64 results in

```
../stage2/bin/zig build-exe -ODebug -fLLVM empty_main.zig
error: sub-compilation of compiler_rt failed
    [...]/zig/stage2/lib/zig/std/special/compiler_rt/os_version_check.zig:26:10: error: TODO: Sema.zirStructInit for runtime-known struct values
```

By assigning the value to a variable we can sidestep the issue for now.
2022-01-26 00:48:05 -05:00
Meghan
c08b190c69
lint: duplicate import (#10519) 2022-01-07 00:06:06 -05:00
Andrew Kelley
5e086b2b4c compiler-rt: small refactor in atomics 2022-01-02 17:58:54 -07:00
Andrew Kelley
b4d6e85a33 Sema: implement peer type resolution of signed and unsigned ints
This allows stage2 to build more of compiler-rt.

I also changed `-%` to `-` for comptime ints in the div and mul
implementations of compiler-rt. This is clearer code and also happens to
work around a bug in stage2.
2022-01-02 14:11:37 -07:00
Andrew Kelley
6b14c58f63 compiler-rt: simplify implementations
This improves readability as well as compatibility with stage2. Most of
compiler-rt is now enabled for stage2 with just a few functions disabled
(until stage2 passes more behavior tests).
2022-01-02 13:16:17 -07:00
Andrew Kelley
be5130ec53 compiler_rt: move more functions to the stage2 section
also move more already-passing behavior tests to the passing section.
2021-12-29 00:39:25 -07:00
Jan Philipp Hafer
17046674a7 compiler_rt: add __negvsi2, __negvdi2, __negvti2
- neg can only overflow, if a == MIN
- case `-0` is properly handled by hardware, so overflow check by comparing
  `a == MIN` is sufficient
- tests: MIN, MIN+1, MIN+4, -42, -7, -1, 0, 1, 7..

See #1290
2021-12-27 14:35:45 -08:00
Jan Philipp Hafer
405ff911da compiler_rt: add __absvsi2, __absvdi2, __absvti2
- abs can only overflow, if a == MIN
- comparing the sign change from wrapping addition is branchless
- tests: MIN, MIN+1,..MIN+4, -42, -7, -1, 0, 1, 7..

See #1290
2021-12-26 13:21:18 -08:00
Isaac Freund
9f9f215305
stage1, stage2: rename c_void to anyopaque (#10316)
zig fmt now replaces c_void with anyopaque to make updating
code easy.
2021-12-19 00:24:45 -05:00
Andrew Kelley
3532abe0c6 compiler_rt: reorganize in a way that stage2 understands
Before this commit, stage2 behavior tests are regressed; it cannot build
compiler-rt.
2021-12-15 14:23:28 -07:00
Jan Philipp Hafer
20328e976f compiler_rt: add __cmpXi2 and __ucmpXi2
- adds __cmpsi2, __cmpdi2, __cmpti2
- adds __ucmpsi2, __ucmpdi2, __ucmpti2
- use 2 if statements with 2 temporaries and a constant
- tests: MIN, MIN+1, MIN/2, -1, 0, 1, MAX/2, MAX-1, MAX if applicable

See #1290
2021-12-14 14:21:30 -08:00
Jan Philipp Hafer
0550198c98 compiler_rt: simplify popcount "magic constants"
- magic constants are nicer to construct ie with
  (~@as(unsigned type, 0) / 3) == 0x55...55
- thanks to Stefan Kanthak for the idea
2021-12-14 14:19:51 -08:00
Jan Philipp Hafer
eb1e75b2b8 compiler_rt: refactor __mulodi2 and __muloti2 to get __mulosi2
- use comptime instead of 2 identical implementations
- tests: port missing tests and link to archived llvm-mirror release 80

See #1290
2021-12-14 14:16:24 -08:00
Jan Philipp Hafer
c56663dee8 compiler_rt: add __negsi2, __negdi2, __negti2
- use negXi2.zig to prevent confusion with negXf2.zig
- used for size optimized builds and machines without carry instruction
- tests: special cases 0, -INT_MIN
  * use divTrunc range and shift with constant offsets

See #1290
2021-12-14 14:14:31 -08:00
Jan Philipp Hafer
efdb94486b compiler_rt: add __bswapsi2, __bswapdi2 and __bswapti2
- each byte gets masked, shifted and combined
- use boring masks instead of comptime for readability
- tests: bit patterns with reverse operation, if applicable

See #1290
2021-12-11 01:43:37 -08:00
matu3ba
282e2c714f
compiler_rt: add __ffssi2, __ffsdi2 and __ffsti2 (#10268)
See #1290
2021-12-04 16:23:33 -05:00
Jan Philipp Hafer
e4c053f047 compiler_rt: add __paritysi2, __paritydi2, __parityti2
- use Bit Twiddling Hacks: Compute parity in parallel
- test cases derived from popcount.zig
- tests: compare naive approach 10_000 times with random numbers created
  from naive seed 42
- compiler_rt.zig: sort by LLVM builtin order and add comments to improve structure

See #1290
2021-12-01 13:35:19 -08:00
Jan Philipp Hafer
f2608df0fb compiler_rt: add __ctzsi2, __ctzdi2 and __ctzti2
- structure derived from count0bits.zig
- test cases derived from clzsi2_test.zig and
  cross-checked via short helper program

See #1290
2021-11-30 18:31:16 -08:00
Andrew Kelley
902df103c6 std lib API deprecations for the upcoming 0.9.0 release
See #3811
2021-11-30 00:13:07 -07:00
Jan Philipp Hafer
1ea650bb75 compiler_rt: add __popcountsi2, __popcountdi2 and __popcountti2
- apply simpler approach than LLVM for __popcountdi2
  taken from The Art of Computer Programming and generalized
- rename popcountdi2.zig to popcount.zig
- test cases derived from popcountdi2_test.zig
- tests: compare naive approach 10_000 times with
  random numbers created from naive seed 42

    See #1290
2021-11-29 12:50:25 -08:00
Stephen Gutekanst
b613210140
compiler_rt: implement __isPlatformVersionAtLeast (Objective-C @available expressions) for Darwin (#10232) 2021-11-29 14:54:23 -05:00
Kenta Iwasaki
00966b7258 compiler_rt: disable spinlocks for atomic instrinsics for bpf
The BPF target does not support mutable global variables. Mark the BPF
target as a target that does not support atomic variables in order to
avoid including the global spinlock table provided in compiler_rt.
2021-11-18 13:16:35 -05:00
LemonBoy
d03e9d0b83 compiler-rt: Fix f16 API declarations to be consistent
LLVM and compiler-rt must agree on how the parameters are passed, it
turns out that in LLVM13 something changed and broke the test case for
AArch64 systems.

It has nothing to do with fma at all.

Closes #9900
2021-11-04 14:30:35 -04:00
Andrew Kelley
9ed599b4e3 stage2: LLVM backend: miscompilation fixes
* work around a stage1 miscompilation leading to the wrong integer
   comparison predicate being emitted.
 * fix the bug of not annotating callsites with the calling convention
   of the callee, leading to undefined behavior.
 * add the `nobuiltin` attribute when building freestanding libc or
   compiler_rt libraries to prevent e.g. memcpy from being "optimized"
   into a call to itself.
 * compiler-rt: change a call to be comptime to make the generated LLVM
   IR simpler and easier to study.

I still can't enable the widening tests due to the compiler-rt compare
function being miscompiled in some not-yet-diagnosed way.
2021-10-05 20:36:04 -07:00
Andrew Kelley
01ad6c0b02 freestanding libc: don't rely on compiler_rt symbols we don't have yet
Previous commit made fmal depend on __extendxftf2 and __trunctfxf2 but
we don't have implementations of those yet.
2021-10-05 18:19:31 -07:00