Novel implementation of DCGAN inference on resource-constrained RISC-V microcontroller (CH32H417) with 512KB shared SRAM, using int8 quantization, SD card weight streaming with double buffering, and custom C inference engine achieving bit-identical PyTorch outputs. Demonstrates practical techniques for embedded generative models on non-ARM architectures where ecosystem tools like CMSIS-NN don't exist, with creative integration of quantum entropy for latent vector seeding.
DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]
r/MachineLearning
·
14d ago
·
8
·
inference
open source
deployment
quantization
r/LocalLLaMA
·
45d ago
·
8
·
benchmark
inference
quantization
Empirical study showing KV cache quantization (q8_0, q4_0) has significant, model-dependent quality impact—contrary to conventional wisdom that q8_0 is "practically lossless." Gemma models show substantial degradation (KL 0.108-0.377 at q8_0) while Qwen remains robust (KL <0.04), with detailed methodology using KL divergence across 250K tokens across 6 categories, enabling engineers to make informed quantization tradeoff decisions.