Octavian Rusu
x.com/riemannianmani
github.com/gandalftea
contact @octavian.work
i do independent research in high-performance computing and machine learning. interested in physics, especially QFT and the spin-statistics theorem, decoding retinal ganglion cells, etc.
posts
07.2024 > efficient 2D convolutions on the Zen 2 microarchitecture [new]
02.2024 > vectorizing a gotoBLAS/BLIS 8x8 sgemm kernel
04.2021 > using ANNs to decode retinal ganglion cell encoding
currently
- optimizing 2D convolutions in asm for x86-64 Zen 2 and documenting it
- afterwards might try on an ARM Exynos 7420 and nvidia GA106-300-A1
- writing an efficient MNIST convnet in C
ongoing projects
tensorlib : general purpose tensor library with AVX-256 vectorization. Built to search for the adequate abstractions needed to have full control over generated asm and generalize everything else. No dependencies.
capable of:
- ~8% faster sgemm then OpenBLAS when N<2048, ~13% slower when N>4096
- random number generators with various distributions like kaiming normal/uniform and various statistical tests like kolmogorov-smirnov, dagostino skewness/omnibus, etc.
- internal syscalls like CPUID for hardware information, storing output registers open to the user through bit field helpers.
- internal use of movntps nontemporal streaming to skip loading cache lines and achieve faster copy then memcpy
- similar syntax and ops with pytorch with more explicit memory control
TODO:
- better benchmarking tools using PMU hardware and software counters from the perf-events ABI on machines that run linux kernels.
- better TLB and cache control of existing kernels
- data and control flow graphs
older projects
Mixed Reality Computer Vision : Simultanious Localisation and Mapping computer vision library capable of monocular/binocular environment mapping and position tracking from a live video feed or recorded videos.
- built using Eigen and some OpenCV helper functions
- visualize and nativate around the resulting point cloud using Pangolin or OpenGL
- successfully used by an autonomous drone to fly around the interior of the building
opendataset : collaborative dataset creation and source control
- git-like object-based source control with custom binary commit objects, byte headers, delta files and database entries that allow for reverse of contributions and cryptographic checksums.
- automatic schema generation from .csv and .json files.
- PostgreSQL database, user authentification and management with pgcrypto hashing and express-session cookies
- express.js webserver, react and cli frontends
enginehmw : procedural game engine with OpenGL rendering
- procedurally generate realistic terrain using 2D Perlin noise and fractal noise
- render geometry using OpenGL and use GLUT to window and capture input
- collision, quaternions, .obj file reader, etc.
- first person camera control