前段时间准备 ML Interview (with a focus on LLMs),浏览了不少学习资源,这里分享一些:
CMU 11-711 Advanced NLP
Language Modeling 综述。
The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture
比较好的一篇 Transformer 综述。
3Blue1Brown: Attention in transformers, step-by-step
解释 Attention 最好的视频,没有之一。
Hugging Face: Mixture of Experts Explained
Hugging Face: RLHF
Hugging Face: Introduction to Deep Reinforcement Learning
Hugging Face: Multimodal Models
HF 这几个资源很适合快速查漏补缺相关的话题。
Lilian Weng: Agents
依然是最好的 Agents 综述之一。
Understanding Reasoning LLMs
一些 post-training 的细节,侧重分析了 DeepSeek R1 和 R1 Zero。
Designing Machine Learning Systems 笔记 by @tms_ur_way
适合快速查漏补缺 ML 实践中的要点。
Stable Diffusion Explained From Scratch
关于 Diffusion 基本原理的解释。
除此之外以下这几位的内容都很不错,可以针对话题有选择性地摄入。
- Andrej Karpathy 的 YouTube 视频
- Lilian Weng 的博客
- Chip Huyen 的博客
这里推荐的基本都比较入门 / high level,更多是为了查漏补缺。要深度挖掘具体话题还是得去看进一步的资源和论文等。 #ml #llm
CMU 11-711 Advanced NLP
Language Modeling 综述。
The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture
比较好的一篇 Transformer 综述。
3Blue1Brown: Attention in transformers, step-by-step
解释 Attention 最好的视频,没有之一。
Hugging Face: Mixture of Experts Explained
Hugging Face: RLHF
Hugging Face: Introduction to Deep Reinforcement Learning
Hugging Face: Multimodal Models
HF 这几个资源很适合快速查漏补缺相关的话题。
Lilian Weng: Agents
依然是最好的 Agents 综述之一。
Understanding Reasoning LLMs
一些 post-training 的细节,侧重分析了 DeepSeek R1 和 R1 Zero。
Designing Machine Learning Systems 笔记 by @tms_ur_way
适合快速查漏补缺 ML 实践中的要点。
Stable Diffusion Explained From Scratch
关于 Diffusion 基本原理的解释。
除此之外以下这几位的内容都很不错,可以针对话题有选择性地摄入。
- Andrej Karpathy 的 YouTube 视频
- Lilian Weng 的博客
- Chip Huyen 的博客
这里推荐的基本都比较入门 / high level,更多是为了查漏补缺。要深度挖掘具体话题还是得去看进一步的资源和论文等。 #ml #llm
去 Netflix campus 听了个 ClickHouse 的 meetup,他们 CTO 为了 showcase,拿 ADS-B 数据做了一个炫酷的航天器轨迹可视化网站。细节很多,包括有意思的 pattern 以及实现细节,值得一看。
https://github.com/ClickHouse/adsb.exposed
https://github.com/ClickHouse/adsb.exposed
GitHub
GitHub - ClickHouse/adsb.exposed: Interactive visualization and analytics on ADS-B data with ClickHouse
Interactive visualization and analytics on ADS-B data with ClickHouse - ClickHouse/adsb.exposed
Please open Telegram to view this post
VIEW IN TELEGRAM
Pretty entertaining classical murder mystery set in the White House
https://www.imdb.com/title/tt8740614/
https://www.imdb.com/title/tt8740614/
IMDb
The Residence (TV Mini Series 2025) ⭐ 7.8 | Comedy, Crime, Drama
50m | TV-MA
https://store.steampowered.com/app/2394650/Crypt_Custodian/
🎮 Yet another metroidvania. 手感蛮好的而且游戏很可爱。 #game
🎮 Yet another metroidvania. 手感蛮好的而且游戏很可爱。 #game
Steampowered
Save 30% on Crypt Custodian on Steam
Crypt Custodian is a charming metroidvania about cleaning up the afterlife. Play as Pluto - a mischievous cat who has died, and is sentenced to be the afterworld's janitor... FOREVER! Hang out with other doomed ghosts, battle beasts, and explore a vastly…
Please open Telegram to view this post
VIEW IN TELEGRAM
A easy-to-follow intro to Zero Knowledge Proof: https://youtu.be/Otvcbw6k4eo
YouTube
I can prove I’ve solved this Sudoku without revealing it
Support us on Patreon: http://patreon.com/polylog
I can convince you that I’ve solved a sudoku without giving you any information about my solution. We discuss how to do this using what cryptographers call a zero-knowledge proof, and how the same tricks…
I can convince you that I’ve solved a sudoku without giving you any information about my solution. We discuss how to do this using what cryptographers call a zero-knowledge proof, and how the same tricks…
Forwarded from C’s Random Collection
https://ai-2027.com “We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.” 不管怎样,这个页面的 interaction 很棒 #ai
Ai-2027
AI 2027
A research-backed AI scenario forecast.
发现一个非常好用的 Obsidian 插件:https://github.com/RyotaUshio/obsidian-pdf-plus
通过 backlink 实现不出 Obsidian 就能给 PDF 做标注和笔记,并且笔记还可以分散在多个文件中,设计得相当 Obsidian native。
#obsidian
通过 backlink 实现不出 Obsidian 就能给 PDF 做标注和笔记,并且笔记还可以分散在多个文件中,设计得相当 Obsidian native。
#obsidian
A really good and concise deep dive into RLHF in LLM post-training, Proximal Policy Optimization (PPO), and Group Relative Policy Optimization (GRPO)
https://yugeten.github.io/posts/2025/01/ppogrpo/
#llm
https://yugeten.github.io/posts/2025/01/ppogrpo/
#llm
https://www.anthropic.com/research/tracing-thoughts-language-model
Anthropic 这个 LLM Interpretability 的研究得到了不少有趣的结论。想要 TLDR 可以读这篇博客;有兴趣可以看看两篇对应的论文,有更多细节并且页面交互做得不错。 #llm
https://transformer-circuits.pub/2025/attribution-graphs/biology.html
https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Anthropic 这个 LLM Interpretability 的研究得到了不少有趣的结论。想要 TLDR 可以读这篇博客;有兴趣可以看看两篇对应的论文,有更多细节并且页面交互做得不错。 #llm
https://transformer-circuits.pub/2025/attribution-graphs/biology.html
https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Anthropic
Tracing the thoughts of a large language model
Anthropic's latest interpretability research: a new microscope to understand Claude's internal mechanisms
Please open Telegram to view this post
VIEW IN TELEGRAM
https://newsletter.pragmaticengineer.com/p/the-philosophy-of-software-design
A Philosophy of Software Design 作者 John Ousterhout 做客 The Pragmatic Engineer. #podcast #software_design
A Philosophy of Software Design 作者 John Ousterhout 做客 The Pragmatic Engineer. #podcast #software_design
Pragmaticengineer
The Philosophy of Software Design – with John Ousterhout
Stanford professor John Ousterhout explains why thoughtful software design matters more than ever as AI tools transform coding practices and developer workflows.
https://arxiv.org/abs/2305.18290 #llm #ai
今天深入学习了 DPO,再次感叹扎实的数学功底对 AI/ML Research 的重要性……
原始的 RLHF 是用 pairwise human preference data(A 和 B 哪个更好)去训练一个 reward model,然后用 RL 来训练主 policy model,objective 是 minimize negative log likelihood + regularization(比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization)。这样的缺点在于 RL 是出了名的难搞,而且还需要一个 critic model 来预测 reward,使得整个系统的复杂性很高。
DPO 的思路是,观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function,通过一番 reparameterization 等数学推导,重新设计了一个 minimize loss over policy 的 objective,绕过了中间这个 reward model,让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率,大幅简化了流程。
拓展阅读:
- KTO: 更进一步,不需要 pairwise comparison,只用对 individual example 的 upvote/downvote 也可以学习到 preference。
- IPO: 解决 DPO 容易 overfit 的问题。
今天深入学习了 DPO,再次感叹扎实的数学功底对 AI/ML Research 的重要性……
原始的 RLHF 是用 pairwise human preference data(A 和 B 哪个更好)去训练一个 reward model,然后用 RL 来训练主 policy model,objective 是 minimize negative log likelihood + regularization(比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization)。这样的缺点在于 RL 是出了名的难搞,而且还需要一个 critic model 来预测 reward,使得整个系统的复杂性很高。
DPO 的思路是,观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function,通过一番 reparameterization 等数学推导,重新设计了一个 minimize loss over policy 的 objective,绕过了中间这个 reward model,让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率,大幅简化了流程。
拓展阅读:
- KTO: 更进一步,不需要 pairwise comparison,只用对 individual example 的 upvote/downvote 也可以学习到 preference。
- IPO: 解决 DPO 容易 overfit 的问题。
arXiv.org
Direct Preference Optimization: Your Language Model is Secretly a...
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely...
https://www.youtube.com/watch?v=lcjdwSY2AzM
这期介绍 principle of least action 的视角很独到,还科普了几位相对不怎么被提及的科学家的贡献 👍
这期介绍 principle of least action 的视角很独到,还科普了几位相对不怎么被提及的科学家的贡献 👍
YouTube
The Biggest Misconception in Physics
Why does energy disappear in General Relativity? 👉 Use code VERITASIUM to get 50% off your first monthly KiwiCo Crate! https://www.kiwico.com/VERITASIUM
Try Snatoms! A molecular modelling kit I invented where the atoms snap together.
https://ve42.co/SnatomsV…
Try Snatoms! A molecular modelling kit I invented where the atoms snap together.
https://ve42.co/SnatomsV…
https://store.steampowered.com/app/1569580/Blue_Prince/
强烈推荐,2025 开年至今个人玩到最惊艳的游戏。结合解密和roguelike,puzzle有多层depth,好玩耐玩💯
#game
强烈推荐,2025 开年至今个人玩到最惊艳的游戏。结合解密和roguelike,puzzle有多层depth,好玩耐玩💯
#game
Steampowered
Blue Prince on Steam
Welcome to Mt. Holly, where every dawn unveils a new mystery. Navigate through shifting corridors and ever-changing chambers in this genre-defying strategy puzzle adventure. But will your unpredictable path lead you to the rumored Room 46?
Interesting opinion piece. I'm most impressed by the sheer number of links in this post 😅
https://www.latent.space/p/clippy-v-anton
https://www.latent.space/p/clippy-v-anton
www.latent.space
Please stop forcing Clippy on those who want Anton
ChatGPT-4o's glazing embarrassment lays open Clippy vs Anton: The two extremes of desires in AI post-training and product
Forwarded from 散步中
朋友来找我做博客嘉宾,我说我没录过,我也不是谦虚你另请高明吧,但他说是想聊聊搬到SF的体验,我说我可以:
https://www.xiaoyuzhoufm.com/episode/680eee0d7a449ae8581a3820
https://www.xiaoyuzhoufm.com/episode/680eee0d7a449ae8581a3820
Xiaoyuzhoufm
04. 在南湾上班,为什么却住在旧金山?
听《Bay Area人文活动汇总》上小宇宙。 一本旧金山湾区的本地文化生活大全。
活动信息都在微信公众号 - Bay Area人文活动汇总
活动信息都在微信公众号 - Bay Area人文活动汇总