用两天在路上开车的时间听完了 Latent Space 这期跟传奇 Bret Taylor 一个半小时的访谈,收获颇多! #podcast #ai
https://www.latent.space/p/bret
https://www.latent.space/p/bret
www.latent.space
The AI Architect — Bret Taylor
The legendary CEO of Sierra, Chairman of OpenAI, and creator of Google Maps/Facebook Likes on the future of Software Engineering, and building great products and teams at the break of the dawn of AGI.
Parallel Experiments
https://jax-ml.github.io/scaling-book/ 非常值得学习的分享,作者列表里好几个 Gemini 核心团队的人😃 Sholto、Jacob、Sharad 等人都是超一流的 research engineer 🙏 #llm
https://huggingface.co/spaces/nanotron/ultrascale-playbook
Hugging Face 发布了 Scaling LLM Training on GPU 的 playbook,应该会比 DeepMind 那本侧重 TPU 的 scaling book 更普适一些。 #llm
Hugging Face 发布了 Scaling LLM Training on GPU 的 playbook,应该会比 DeepMind 那本侧重 TPU 的 scaling book 更普适一些。 #llm
huggingface.co
The Ultra-Scale Playbook - a Hugging Face Space by nanotron
The ultimate guide to training LLM on large GPU Clusters
Please open Telegram to view this post
VIEW IN TELEGRAM
YouTube
Anyma - Hypnotized (feat. Ellie Goulding) [Live from Sphere Las Vegas]
Ellie Goulding and Anyma perform “Hypnotized” live from Sphere Las Vegas.
Listen to “Hypnotized (feat. Ellie Goulding)” now: https://anyma-ellie.lnk.to/hypnotized
Follow Ellie:
Instagram: https://www.instagram.com/elliegoulding
TikTok: https://www.ti…
Listen to “Hypnotized (feat. Ellie Goulding)” now: https://anyma-ellie.lnk.to/hypnotized
Follow Ellie:
Instagram: https://www.instagram.com/elliegoulding
TikTok: https://www.ti…
前段时间准备 ML Interview (with a focus on LLMs),浏览了不少学习资源,这里分享一些:
CMU 11-711 Advanced NLP
Language Modeling 综述。
The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture
比较好的一篇 Transformer 综述。
3Blue1Brown: Attention in transformers, step-by-step
解释 Attention 最好的视频,没有之一。
Hugging Face: Mixture of Experts Explained
Hugging Face: RLHF
Hugging Face: Introduction to Deep Reinforcement Learning
Hugging Face: Multimodal Models
HF 这几个资源很适合快速查漏补缺相关的话题。
Lilian Weng: Agents
依然是最好的 Agents 综述之一。
Understanding Reasoning LLMs
一些 post-training 的细节,侧重分析了 DeepSeek R1 和 R1 Zero。
Designing Machine Learning Systems 笔记 by @tms_ur_way
适合快速查漏补缺 ML 实践中的要点。
Stable Diffusion Explained From Scratch
关于 Diffusion 基本原理的解释。
除此之外以下这几位的内容都很不错,可以针对话题有选择性地摄入。
- Andrej Karpathy 的 YouTube 视频
- Lilian Weng 的博客
- Chip Huyen 的博客
这里推荐的基本都比较入门 / high level,更多是为了查漏补缺。要深度挖掘具体话题还是得去看进一步的资源和论文等。 #ml #llm
CMU 11-711 Advanced NLP
Language Modeling 综述。
The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture
比较好的一篇 Transformer 综述。
3Blue1Brown: Attention in transformers, step-by-step
解释 Attention 最好的视频,没有之一。
Hugging Face: Mixture of Experts Explained
Hugging Face: RLHF
Hugging Face: Introduction to Deep Reinforcement Learning
Hugging Face: Multimodal Models
HF 这几个资源很适合快速查漏补缺相关的话题。
Lilian Weng: Agents
依然是最好的 Agents 综述之一。
Understanding Reasoning LLMs
一些 post-training 的细节,侧重分析了 DeepSeek R1 和 R1 Zero。
Designing Machine Learning Systems 笔记 by @tms_ur_way
适合快速查漏补缺 ML 实践中的要点。
Stable Diffusion Explained From Scratch
关于 Diffusion 基本原理的解释。
除此之外以下这几位的内容都很不错,可以针对话题有选择性地摄入。
- Andrej Karpathy 的 YouTube 视频
- Lilian Weng 的博客
- Chip Huyen 的博客
这里推荐的基本都比较入门 / high level,更多是为了查漏补缺。要深度挖掘具体话题还是得去看进一步的资源和论文等。 #ml #llm
去 Netflix campus 听了个 ClickHouse 的 meetup,他们 CTO 为了 showcase,拿 ADS-B 数据做了一个炫酷的航天器轨迹可视化网站。细节很多,包括有意思的 pattern 以及实现细节,值得一看。
https://github.com/ClickHouse/adsb.exposed
https://github.com/ClickHouse/adsb.exposed
GitHub
GitHub - ClickHouse/adsb.exposed: Interactive visualization and analytics on ADS-B data with ClickHouse
Interactive visualization and analytics on ADS-B data with ClickHouse - ClickHouse/adsb.exposed
Please open Telegram to view this post
VIEW IN TELEGRAM
Pretty entertaining classical murder mystery set in the White House
https://www.imdb.com/title/tt8740614/
https://www.imdb.com/title/tt8740614/
IMDb
The Residence (TV Mini Series 2025) ⭐ 7.8 | Comedy, Crime, Drama
50m | TV-MA
https://store.steampowered.com/app/2394650/Crypt_Custodian/
🎮 Yet another metroidvania. 手感蛮好的而且游戏很可爱。 #game
🎮 Yet another metroidvania. 手感蛮好的而且游戏很可爱。 #game
Steampowered
Crypt Custodian on Steam
Crypt Custodian is a charming metroidvania about cleaning up the afterlife. Play as Pluto - a mischievous cat who has died, and is sentenced to be the afterworld's janitor... FOREVER! Hang out with other doomed ghosts, battle beasts, and explore a vastly…
Please open Telegram to view this post
VIEW IN TELEGRAM
A easy-to-follow intro to Zero Knowledge Proof: https://youtu.be/Otvcbw6k4eo
YouTube
I can prove I’ve solved this Sudoku without revealing it
Support us on Patreon: http://patreon.com/polylog
I can convince you that I’ve solved a sudoku without giving you any information about my solution. We discuss how to do this using what cryptographers call a zero-knowledge proof, and how the same tricks…
I can convince you that I’ve solved a sudoku without giving you any information about my solution. We discuss how to do this using what cryptographers call a zero-knowledge proof, and how the same tricks…
Forwarded from C’s Random Collection
https://ai-2027.com “We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.” 不管怎样,这个页面的 interaction 很棒 #ai
Ai-2027
AI 2027
A research-backed AI scenario forecast.
发现一个非常好用的 Obsidian 插件:https://github.com/RyotaUshio/obsidian-pdf-plus
通过 backlink 实现不出 Obsidian 就能给 PDF 做标注和笔记,并且笔记还可以分散在多个文件中,设计得相当 Obsidian native。
#obsidian
通过 backlink 实现不出 Obsidian 就能给 PDF 做标注和笔记,并且笔记还可以分散在多个文件中,设计得相当 Obsidian native。
#obsidian
A really good and concise deep dive into RLHF in LLM post-training, Proximal Policy Optimization (PPO), and Group Relative Policy Optimization (GRPO)
https://yugeten.github.io/posts/2025/01/ppogrpo/
#llm
https://yugeten.github.io/posts/2025/01/ppogrpo/
#llm
https://www.anthropic.com/research/tracing-thoughts-language-model
Anthropic 这个 LLM Interpretability 的研究得到了不少有趣的结论。想要 TLDR 可以读这篇博客;有兴趣可以看看两篇对应的论文,有更多细节并且页面交互做得不错。 #llm
https://transformer-circuits.pub/2025/attribution-graphs/biology.html
https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Anthropic 这个 LLM Interpretability 的研究得到了不少有趣的结论。想要 TLDR 可以读这篇博客;有兴趣可以看看两篇对应的论文,有更多细节并且页面交互做得不错。 #llm
https://transformer-circuits.pub/2025/attribution-graphs/biology.html
https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Anthropic
Tracing the thoughts of a large language model
Anthropic's latest interpretability research: a new microscope to understand Claude's internal mechanisms
Please open Telegram to view this post
VIEW IN TELEGRAM
https://newsletter.pragmaticengineer.com/p/the-philosophy-of-software-design
A Philosophy of Software Design 作者 John Ousterhout 做客 The Pragmatic Engineer. #podcast #software_design
A Philosophy of Software Design 作者 John Ousterhout 做客 The Pragmatic Engineer. #podcast #software_design
Pragmaticengineer
The Philosophy of Software Design – with John Ousterhout
Stanford professor John Ousterhout explains why thoughtful software design matters more than ever as AI tools transform coding practices and developer workflows.
https://arxiv.org/abs/2305.18290 #llm #ai
今天深入学习了 DPO,再次感叹扎实的数学功底对 AI/ML Research 的重要性……
原始的 RLHF 是用 pairwise human preference data(A 和 B 哪个更好)去训练一个 reward model,然后用 RL 来训练主 policy model,objective 是 minimize negative log likelihood + regularization(比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization)。这样的缺点在于 RL 是出了名的难搞,而且还需要一个 critic model 来预测 reward,使得整个系统的复杂性很高。
DPO 的思路是,观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function,通过一番 reparameterization 等数学推导,重新设计了一个 minimize loss over policy 的 objective,绕过了中间这个 reward model,让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率,大幅简化了流程。
拓展阅读:
- KTO: 更进一步,不需要 pairwise comparison,只用对 individual example 的 upvote/downvote 也可以学习到 preference。
- IPO: 解决 DPO 容易 overfit 的问题。
今天深入学习了 DPO,再次感叹扎实的数学功底对 AI/ML Research 的重要性……
原始的 RLHF 是用 pairwise human preference data(A 和 B 哪个更好)去训练一个 reward model,然后用 RL 来训练主 policy model,objective 是 minimize negative log likelihood + regularization(比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization)。这样的缺点在于 RL 是出了名的难搞,而且还需要一个 critic model 来预测 reward,使得整个系统的复杂性很高。
DPO 的思路是,观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function,通过一番 reparameterization 等数学推导,重新设计了一个 minimize loss over policy 的 objective,绕过了中间这个 reward model,让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率,大幅简化了流程。
拓展阅读:
- KTO: 更进一步,不需要 pairwise comparison,只用对 individual example 的 upvote/downvote 也可以学习到 preference。
- IPO: 解决 DPO 容易 overfit 的问题。
arXiv.org
Direct Preference Optimization: Your Language Model is Secretly a...
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely...