n0thhhing/zeon
ARM/ARM64 Neon intrinsics implemented in zig
ARM/ARM64 Neon intrinsics implemented in pure zig as well as in assembly!
Zeon aims to provide high-performance Neon
intrinsics for ARM
and ARM64
architectures, implemented in both pure Zig and inline assembly. This project prioritizes portability, performance, and flexibility, ensuring compatibility across various environments.
🚧 This project is under active development(522/3803 implemented). Contributions and feedback are welcome!
Complete inline assembly/LLVM builtin implementations.
Write thorough tests for all functions to ensure correctness.
Refactor into multiple files.
Eliminate repetitive patterns to improve maintainability.
Implement fallbacks for non-ARM architectures.
Instruction Stripping e.g, Functions like vget_lane_f64
should compile down to nothing more than accessing the appropriate register (e.g., s0 for vec in v0). Currently, we are explicitly inserting instructions, which prevents the compiler from optimizing them away when not needed.
Add support for Big Endian arm/aarch64, and add tests for it.
For Vector Load intrinsics, dont assume the input length is the exact length of the output vector.
Test against C/C++ implementation.
Add a better way to switch between implementations(like assembly, builtins and the fallback).
Use the fallback instead of assembly implementation when not in release.
vld1*
on non-ARM architectures(or if use_asm and use_builtins is off), it assumes the underlying type fits the size of the vector.ReleaseFast
.To test and simulate ARM/ARM64 environments, QEMU user mode
is required. Make sure QEMU is properly installed and configured before running tests. You'll also need Make
for build and test automation.
For usage examples, see examples.
Clone the repository:
git clone https://github.com/n0thhhing/zeon
cd zeon
Run tests:
make test
Run examples:
make examples
This project is licensed under the MIT
License. See the LICENSE file for more information.