4
12bitfloat
201d

Is writing hand-optimized SIMD code even still worth it? Thinking about writing my own little math library for my game engine but I've tried writing a hand-optimized `dot(normalize(b - a), foo) >= bar` and somehow it's actually slower than writing the same thing using a math lib which is implemented exclusively with scalar math and auto vectorized by llvm

LLVM... I kneel

Comments
  • 3
    Using Agner Fog's VCL *and* compiling with an LLVM-based compiler should theoretically give the best results.

    Plot twist: read the dissasembly this would generate and learn the optimizations by heart, you can then write perfect SIMD fuckery by hand. But who's got time for that?
  • 5
    Nothing is worth hand optimizing it if there are already optimized system libs for that.
  • 1
    @Lensflare fair enough, but I'm not happy with the linear algebra crates in Rust so I wanna do it my own way (and just steal all the actual maths from JOML lol)
  • 2
    Soooooo yeah... Uhhh turns out auto-vectorization kinda sucks ass actually

    I didn't notice much difference in my initial tests because apparently having multiple rustflags fields in the cargo/config overwrite each other which is fun and it didn't actually compile with proper simd features

    Here's the auto-vectorized assembly:
  • 3
    And here's the highlevel hand optimized version:

    I.e. this isn't even a fully specialized simd thing, I have just written a Vec struct with optimized dot and normalize methods... Kinda stark difference
Add Comment