AI Radar
Real-time signals, quick thoughts, and curated picks from the AI frontier.
Karpathy's Autoresearch: AI Agents Running ML Experiments Overnight
Karpathy pushed 630 lines of Python to GitHub and went to sleep. By morning, his AI agent had run 50 experiments and committed the results to git. No human input in between.
The tool is called autoresearch — and it's now the most viral open-source AI project of the month.
The setup is almost too simple. You write research instructions in a Markdown file. An AI agent reads it, modifies a training script, runs a 5-minute experiment on a single GPU, checks if validation loss improved, keeps or reverts, and repeats. 12 experiments per hour. ~100 overnight.
After 2 days and 700 experiments: 20 genuine improvements that stacked, an 11% training speedup on code he thought was already optimized, and a bug in his own attention implementation he'd missed for months. The agent caught it.
Claude Code is changing how I build
Been using Claude Code for a week now and it's genuinely changing my workflow. The agentic loop + file editing combo means I can describe architecture-level changes and watch them materialize. Not perfect, but the iteration speed is wild.
Liquid AI's LFM2.5: Full AI Models Running in Your Browser
A 1.2B parameter model just ran chain-of-thought reasoning in my browser tab. No API. No server. No bill.
Liquid AI dropped LFM2.5 and the WebGPU demos are wild. Vision model does real-time webcam captioning, fully client-side. Thinking model runs chain-of-thought reasoning in a browser tab at 0.28 seconds.
Beats Llama 3.2 1B on GPQA, MMLU Pro, and IFEval benchmarks. Static deployment ships as HTML/JS/WASM — host on any CDN with zero inference cost.
This is what edge AI was supposed to look like.
Google Quietly Built the Most Complete Agentic AI Ecosystem
Google quietly built the most complete agentic AI ecosystem in the industry. And nobody's talking about the full picture.
Models + Tools + Frameworks + Protocols. Gemini 2.5 Pro, ADK (Agent Development Kit), A2A protocol. 750M+ Gemini users, 18.3K ADK stars, 150+ A2A partners.
While everyone's focused on individual model benchmarks, Google assembled the full stack for agentic AI: the models, the developer tools, the communication protocols, and the distribution.
RAG evaluation is still an unsolved problem
Every team I talk to is building RAG. Almost none of them have good evaluation. We default to vibes-based testing — "does this answer look right?" — and call it done. The gap between building RAG and knowing if it works is massive.
How Alibaba's Qwen Became #1 in Open-Source AI
Alibaba's Qwen just became the #1 open-source AI family on the planet. 700M+ downloads. 90,000+ derivative models. The top 4 spots on HuggingFace. All in 3 years.
From a small Alibaba experiment to dominating every global leaderboard — this is one of the most underrated stories in AI right now.
Cloudflare Just Launched a Web Crawling API
The company that built its reputation blocking bots just launched a web crawling API.
Cloudflare quietly dropped a /crawl endpoint this week. One API call, and you get clean, structured content from any URL. The irony is beautiful — and the implications for RAG pipelines are massive.
If you're building any kind of retrieval system, this changes the data ingestion game completely.
The State of Local LLM Inference
Every local LLM user has done this. Download a model. Wait 20 minutes. Launch it. Watch it crawl at 3 tokens per second — or not load at all.
The gap between cloud inference and local inference is still massive. But tools like llmfit are starting to close it — optimizing models for your specific hardware, quantization level, and memory constraints.
The future of local AI isn't just about smaller models, it's about smarter deployment.