r/LocalLLaMA · 45d ago · 9 · new model open source agent deployment benchmark

MiniMax-M2.7 is a new open-source model with strong programming and agent capabilities, featuring self-evolving optimization during training and native multi-agent collaboration support. The model demonstrates exceptional performance on code tasks (SWE-Pro 56.22%, Terminal Bench 57.0%), system-level reasoning for SRE work, and achieves competitive benchmarks against GPT-5.3 and Claude variants while supporting deployment via SGLang, vLLM, and Transformers.

Simon Willison · 45d ago · 5 · tool open source

SQLite 3.53.0 release includes result formatting improvements via a new Query Results Formatter library, with a WebAssembly playground built using Claude Code. While SQLite is foundational infrastructure, this release focuses on general database improvements rather than AI-specific tooling or capabilities.

GitHub Trending AI · 46d ago · 6 · tool deployment open source

Hermes Web UI is a full-featured dashboard for managing Hermes Agent instances, offering session management, multi-platform channel configuration (8 platforms including Telegram, Discord, Slack), usage monitoring, and skill browsing. Built with Vue 3 + TypeScript frontend and Koa 2 backend, it provides unified control for AI chat deployments with features like SSE streaming, credential management, and web terminal access.

Latent Space · 46d ago · 7 · new model agent workflow inference

GLM-5.1 reaches top-tier coding performance (#3 on Code Arena), while the 'cheap executor + expensive advisor' pattern emerges as a standard orchestration approach for reducing inference costs. Key implementations include Anthropic's API-level advisor tools, Berkeley's research, and new features in Qwen Code (v0.14.x) with agent engineering primitives like model routing and sub-agent selection.

Simon Willison · 47d ago · 6 · api update benchmark workflow

Technical analysis of OpenAI's capability gap between voice mode (GPT-4o era, April 2024 cutoff) and advanced reasoning models, highlighting how different access points reveal disparate model capabilities. References Andrej Karpathy's observation on the disconnect between consumer-facing voice interfaces versus specialized paid models excelling at code analysis and complex reasoning tasks.

GitHub Trending AI · 47d ago · 6 · tool agent workflow

SkillClaw is a system for managing and evolving LLM agent skill libraries through automatic deduplication, improvement, and cross-session knowledge sharing. It consists of a local client proxy that intercepts agent requests and records skills, plus an optional evolution server for automatic skill optimization and team-wide sharing across multiple agents/devices.

GitHub Trending AI · 47d ago · 8 · tutorial open source fine tuning agent rag

Open-source hands-on reinforcement learning curriculum covering classical control through cutting-edge applications including LLM post-training, DPO/GRPO alignment, RLVR, and multimodal agentic RL. Emphasizes practical code-first learning with runnable examples organized by chapter, directly applicable to building production RL systems and LLM fine-tuning pipelines.

OpenAI Blog · 47d ago · 6 · workflow prompt engineering tutorial

Guide on creating ChatGPT Skills for building reusable workflows and automating tasks through custom instructions and configurations. Covers practical approaches to ensure consistent outputs, relevant for engineers looking to operationalize LLM-based automation in their workflows.

OpenAI Blog · 47d ago · 6 · workflow tutorial

Guide on using ChatGPT's file upload capabilities for document analysis, summarization, and content generation across various file formats. Covers practical workflows for processing PDFs, spreadsheets, and other documents through the ChatGPT interface.

OpenAI Blog · 47d ago · 6 · prompt engineering tutorial

A guide to fundamental prompting techniques for ChatGPT, covering strategies to write clearer prompts and extract more useful outputs. Relevant for engineers regularly using LLMs, though likely covers well-established practices rather than novel methods.

OpenAI Blog · 47d ago · 7 · tutorial workflow prompt engineering

Practical guide on building custom GPTs for workflow automation and maintaining consistent outputs through purpose-built AI assistants. Covers the technical process of creating and deploying specialized GPT configurations for specific use cases.

OpenAI Blog · 47d ago · 6 · workflow tutorial

ChatGPT's Projects feature enables organizing related conversations, files, and custom instructions in a single workspace, improving workflow management and team collaboration. This is useful for engineers managing multiple AI-assisted tasks, though it's primarily a UI/UX feature rather than a technical capability advancement.

OpenAI Blog · 47d ago · 5 · prompt engineering

General guide on responsible AI usage covering safety, accuracy, and transparency practices for tools like ChatGPT. While useful for foundational understanding, lacks specific technical implementations or novel engineering approaches that would directly impact daily development workflows.

OpenAI Blog · 47d ago · 6 · prompt engineering workflow tutorial

Guide on using ChatGPT's image generation capabilities (DALL-E integration) with practical techniques for prompt engineering and iterative refinement. Covers workflow for creating visuals through the ChatGPT interface, useful for engineers building AI applications that need visual generation features.

OpenAI Blog · 47d ago · 6 · tutorial workflow prompt engineering

Guide on leveraging ChatGPT's search and deep research capabilities to find current information, evaluate source credibility, and organize findings into structured outputs. Practical for engineers building research-heavy applications or integrating search features into AI workflows.

OpenAI Blog · 47d ago · 6 · tutorial workflow

A practical guide on using ChatGPT for data analysis workflows, covering dataset exploration, insight generation, and visualization creation. While useful for engineers integrating AI into analytics pipelines, it's general-purpose instruction rather than a new tool or technical breakthrough.

OpenAI Blog · 47d ago · 6 · prompt engineering tool deployment

Resource compilation for deploying AI in financial services, covering prompt templates, GPT configurations, implementation guides, and security-focused tools. Relevant for engineers building compliant AI systems in regulated environments, though likely more business-oriented than technical deep-dive.

OpenAI Blog · 47d ago · 6 · tutorial workflow prompt engineering

A tutorial on leveraging ChatGPT as a research assistant for source gathering, information analysis, and citation management. Covers practical workflows for using LLMs to structure research tasks, though the specific techniques may be familiar to those already working with prompt engineering and RAG patterns.

OpenAI Blog · 47d ago · 5 · prompt engineering workflow tutorial

A guide on using ChatGPT as a writing assistant for content development through drafting, revision, and refinement workflows. While practical for daily writing tasks, it covers general LLM usage patterns rather than novel technical insights or advanced engineering techniques.