Parallel Experiments

https://huggingface.co/spaces/nanotron/ultrascale-playbook
Hugging Face 发布了 Scaling LLM Training on GPU 的 playbook，应该会比 DeepMind 那本侧重 TPU 的 scaling book 更普适一些。 #llm

huggingface.co

The Ultra-Scale Playbook - a Hugging Face Space by nanotron

The ultimate guide to training LLM on large GPU Clusters

1.1K viewsLinghao Zhang, 20:32

Parallel Experiments

💃 上周在 Las Vegas Sphere 看的现场，赞爆
https://www.youtube.com/watch?v=DKvWHjQAGqo

Please open Telegram to view this post

VIEW IN TELEGRAM

YouTube

Anyma - Hypnotized (feat. Ellie Goulding) [Live from Sphere Las Vegas]

Ellie Goulding and Anyma perform “Hypnotized” live from Sphere Las Vegas.

Listen to “Hypnotized (feat. Ellie Goulding)” now: https://anyma-ellie.lnk.to/hypnotized

Follow Ellie:
Instagram: https://www.instagram.com/elliegoulding
TikTok: https://www.ti…

745 viewsLinghao Zhang, 03:53

Parallel Experiments

前段时间准备 ML Interview (with a focus on LLMs)，浏览了不少学习资源，这里分享一些：

CMU 11-711 Advanced NLP

Language Modeling 综述。

The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture

比较好的一篇 Transformer 综述。

3Blue1Brown: Attention in transformers, step-by-step

解释 Attention 最好的视频，没有之一。

Hugging Face: Mixture of Experts Explained

Hugging Face: RLHF

Hugging Face: Introduction to Deep Reinforcement Learning

Hugging Face: Multimodal Models

HF 这几个资源很适合快速查漏补缺相关的话题。

Lilian Weng: Agents

依然是最好的 Agents 综述之一。

Understanding Reasoning LLMs

一些 post-training 的细节，侧重分析了 DeepSeek R1 和 R1 Zero。

Designing Machine Learning Systems 笔记 by @tms_ur_way

适合快速查漏补缺 ML 实践中的要点。

Stable Diffusion Explained From Scratch

关于 Diffusion 基本原理的解释。

除此之外以下这几位的内容都很不错，可以针对话题有选择性地摄入。

- Andrej Karpathy 的 YouTube 视频
- Lilian Weng 的博客
- Chip Huyen 的博客

这里推荐的基本都比较入门 / high level，更多是为了查漏补缺。要深度挖掘具体话题还是得去看进一步的资源和论文等。 #ml #llm

1.7K viewsLinghao Zhang, edited 19:22

Parallel Experiments

去 Netflix campus 听了个 ClickHouse 的 meetup，他们 CTO 为了 showcase，拿 ADS-B 数据做了一个炫酷的航天器轨迹可视化网站。细节很多，包括有意思的 pattern 以及实现细节，值得一看。

https://github.com/ClickHouse/adsb.exposed

GitHub

GitHub - ClickHouse/adsb.exposed: Interactive visualization and analytics on ADS-B data with ClickHouse

Interactive visualization and analytics on ADS-B data with ClickHouse - ClickHouse/adsb.exposed

890 viewsLinghao Zhang, edited 07:22

Parallel Experiments

Please open Telegram to view this post

VIEW IN TELEGRAM

975 viewsLinghao Zhang, edited 00:46

Parallel Experiments

Pretty entertaining classical murder mystery set in the White House
https://www.imdb.com/title/tt8740614/

IMDb

The Residence (TV Mini Series 2025) ⭐ 7.8 | Comedy, Crime, Drama

50m | TV-MA

530 viewsLinghao Zhang, 22:22

Parallel Experiments

https://store.steampowered.com/app/2394650/Crypt_Custodian/
🎮 Yet another metroidvania. 手感蛮好的而且游戏很可爱。 #game

Steampowered

Crypt Custodian on Steam

Crypt Custodian is a charming metroidvania about cleaning up the afterlife. Play as Pluto - a mischievous cat who has died, and is sentenced to be the afterworld's janitor... FOREVER! Hang out with other doomed ghosts, battle beasts, and explore a vastly…

518 viewsLinghao Zhang, 18:00

Parallel Experiments

Please open Telegram to view this post

VIEW IN TELEGRAM

656 viewsLinghao Zhang, edited 19:05

Parallel Experiments

A easy-to-follow intro to Zero Knowledge Proof: https://youtu.be/Otvcbw6k4eo

YouTube

I can prove I’ve solved this Sudoku without revealing it

Support us on Patreon: http://patreon.com/polylog
I can convince you that I’ve solved a sudoku without giving you any information about my solution. We discuss how to do this using what cryptographers call a zero-knowledge proof, and how the same tricks…

689 viewsLinghao Zhang, 23:17

Parallel Experiments

四集每集都是一镜到底的迷你剧系列，反复欣赏！
https://www.imdb.com/title/tt31806037/

IMDb

Adolescence (TV Mini Series 2025) ⭐ 8.2 | Crime, Drama, Mystery

1h | TV-MA

801 viewsLinghao Zhang, 22:34

Parallel Experiments

Forwarded from C’s Random Collection

https://ai-2027.com “We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.” 不管怎样，这个页面的 interaction 很棒 #ai

Ai-2027

AI 2027

A research-backed AI scenario forecast.

540 viewsLinghao Zhang, 06:41

Parallel Experiments

发现一个非常好用的 Obsidian 插件：https://github.com/RyotaUshio/obsidian-pdf-plus

通过 backlink 实现不出 Obsidian 就能给 PDF 做标注和笔记，并且笔记还可以分散在多个文件中，设计得相当 Obsidian native。

#obsidian

710 viewsLinghao Zhang, 21:21

Parallel Experiments

A really good and concise deep dive into RLHF in LLM post-training, Proximal Policy Optimization (PPO), and Group Relative Policy Optimization (GRPO)
https://yugeten.github.io/posts/2025/01/ppogrpo/
#llm

455 viewsLinghao Zhang, edited 02:24

Parallel Experiments

https://www.anthropic.com/research/tracing-thoughts-language-model
Anthropic 这个 LLM Interpretability 的研究得到了不少有趣的结论。想要 TLDR 可以读这篇博客；有兴趣可以看看两篇对应的论文，有更多细节并且页面交互做得不错。 #llm

https://transformer-circuits.pub/2025/attribution-graphs/biology.html
https://transformer-circuits.pub/2025/attribution-graphs/methods.html

Anthropic

Tracing the thoughts of a large language model

Anthropic's latest interpretability research: a new microscope to understand Claude's internal mechanisms

470 viewsLinghao Zhang, 21:37

Parallel Experiments

Please open Telegram to view this post

VIEW IN TELEGRAM

332 viewsLinghao Zhang, edited 05:22

Parallel Experiments

https://newsletter.pragmaticengineer.com/p/the-philosophy-of-software-design

A Philosophy of Software Design 作者 John Ousterhout 做客 The Pragmatic Engineer. #podcast #software_design

Pragmaticengineer

The Philosophy of Software Design – with John Ousterhout

Stanford professor John Ousterhout explains why thoughtful software design matters more than ever as AI tools transform coding practices and developer workflows.

343 viewsLinghao Zhang, 04:53

Parallel Experiments

https://arxiv.org/abs/2305.18290 #llm #ai

今天深入学习了 DPO，再次感叹扎实的数学功底对 AI/ML Research 的重要性……

原始的 RLHF 是用 pairwise human preference data（A 和 B 哪个更好）去训练一个 reward model，然后用 RL 来训练主 policy model，objective 是 minimize negative log likelihood + regularization（比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization）。这样的缺点在于 RL 是出了名的难搞，而且还需要一个 critic model 来预测 reward，使得整个系统的复杂性很高。

DPO 的思路是，观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function，通过一番 reparameterization 等数学推导，重新设计了一个 minimize loss over policy 的 objective，绕过了中间这个 reward model，让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率，大幅简化了流程。

拓展阅读：
- KTO: 更进一步，不需要 pairwise comparison，只用对 individual example 的 upvote/downvote 也可以学习到 preference。
- IPO: 解决 DPO 容易 overfit 的问题。

arXiv.org

Direct Preference Optimization: Your Language Model is Secretly a...

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely...

393 viewsLinghao Zhang, edited 05:31

About

Blog

Apps

Platform