Endava's case study demonstrates using OpenAI's Codex to automate requirements analysis and accelerate software delivery through agentic workflows. The approach reduces traditionally weeks-long processes to hours, showing practical application of code generation models in enterprise software development.
Q-Judger is a fine-tuned vision-language model for automated evaluation of text-to-image generation quality, built on Qwen2.7-27B with structured JSON output. The article provides practical setup instructions across multiple inference frameworks (Transformers, vLLM, SGLang, Docker) and demonstrates hierarchical evaluation criteria validated against human expert rankings.
Survey of 1,260 quantitative social scientists (Feb-Mar 2026) reveals 81% adoption of generative AI for research, with coding agents like Claude Code enabling autonomous research execution—automating data analysis, interpretation, and iteration that were previously irreducibly human tasks. The research explores disparities in tool access, output quality differences, and potential impacts on the scholarly record, with planned randomized experiments to measure productivity effects.
A researcher training GPT-like Transformer-decoder models (100M-500M parameters) on 750M tokens is encountering a common failure mode where the model gets stuck generating single tokens repeatedly, suggesting a training dynamics issue. The post includes detailed hyperparameters (AdamW optimizer, 1e-3 learning rate, 4M token batch size) and seeks guidance on whether decoder-only model training requires specific tricks or has undocumented failure modes.
This project applies diffusion models to sketch-guided trajectory simulation in basketball, enabling controllable generation of player movements conditioned on partial instructions. The approach uses joint refinement of all trajectories through diffusion rather than autoregressive methods, with open-sourced code and models demonstrating a practical application of conditional generation for sports analytics.
This article covers a merged 31B parameter model (Gemma-4-Harmonia) with practical integration guides for Transformers, vLLM, and SGLang, along with MMLU benchmark results showing 84.55% accuracy. While the technical implementation details on model merging and quantization are useful, the content is heavily focused on a niche fine-tuned variant rather than addressing core workflow or breakthrough capabilities.
CVE-2026-48710 (BadHost) is a critical vulnerability in Starlette that affects FastAPI, vLLM, LiteLLM, and MCP servers—allowing HTTP Host header injection to bypass authentication. AI engineers building agents and services must immediately upgrade Starlette to version 1.0.1+ and audit any systems using these frameworks, as credentials stored in MCP servers are particularly at risk.
SQLite has published AGENTS.md documenting their policy on AI agents interacting with their codebase—they reject agentic code contributions but accept high-quality AI-generated bug reports with reproducible test cases. This reflects practical workflow considerations for engineers using AI agents in development, including how open-source projects are adapting policies around AI-generated contributions.
Open-source Context Swarm Memory (CSM) system benchmarked against Hindsight on BEAM 100K, achieving 0.757573 AMB score vs 0.733658 with 38.2% fewer context tokens but 4.5x slower retrieval. Author seeks methodology feedback before pursuing official leaderboard validation.
TritonMoE is a portable MoE inference kernel written in Triton that achieves 89-131% of Megablocks throughput while running unchanged on both NVIDIA and AMD GPUs. The key optimization uses fused gate+up GEMM operations to reduce global memory traffic by 35%, though performance degrades at very long sequences (2048+ tokens) and under extreme routing skew.
Open-source UK GDPR compliance QA dataset (1K pairs) with SME-focused questions, detailed answers linked to specific articles/ICO guidance, and generation metadata. Generated via Qwen 14B + DeepSeek API, released in JSON/Parquet with MIT license—directly applicable for fine-tuning compliance assistants or building RAG systems for privacy tools.
BioHub released ESMFold 2, a transformer-based protein structure prediction engine that achieves state-of-the-art performance on protein interactions and antibody design by scaling simple BERT-like models on diverse protein sequence data rather than using specialized architectures like AlphaFold3. The release includes an atlas of 6.8 billion predicted protein structures and demonstrates that inference-time scaling works across multiple biological targets, representing a significant shift toward general-purpose foundation models in structural biology.
A systems-focused writeup on building self-improving AI agent harnesses for benchmark tasks, exploring the challenge of safely compounding agent-proposed improvements and parallels to coding-agent customization patterns. The author shares both successful and failed approaches to implementing continuous self-improvement loops, offering practical insights for engineers building autonomous improvement systems.
NVIDIA's SOL-ExecBench revealed critical issues in AI-generated CUDA kernels when deployed in production training loops, despite passing the benchmark verifier. A detailed case study of a fused embedding-gradient + RMSNorm kernel demonstrates how bf16 accumulation bugs can cause training divergence that masquerades as research failures, with practical debugging insights for transformer training implementations.
noisekit is an open-source tool that generates realistic degraded audio datasets from clean annotated speech data, enabling accurate STT vendor benchmarking under production conditions (phone noise, codecs, reverb). It fills a critical gap for voice agent builders by providing WER-measurable datasets that approximate real-world phone call audio rather than relying on clean studio recordings.
NeuroFlow is a training-free dynamic routing framework for Vision Transformers that achieves 55.8× wall-clock speedup on high-res video inference by eliminating redundant tokens via semantic surprise tracking in embedding space. The approach uses a dual-memory architecture with retinal gating and cortical caching to maintain 97%+ fidelity while achieving extreme sparsity (84% token reduction), with code and paper publicly available.
Cross-species neuroscience study comparing learning rules (BP, FA, PC, STDP) across human fMRI and macaque electrophysiology (V1/V2/V4/IT), finding that early visual alignment is conserved but IT alignment scales with model capacity rather than learning rule. Includes careful controls for stimulus confounds and capacity baselines, with code and companion papers provided.