Parallel Experiments
1.71K subscribers
62 photos
1 video
3 files
811 links
Stay informed. Stay authentic.

Welcome to the public part of my brain. Here I share curations and thoughts.

Created with ❤️ by @linghao.
Download Telegram
Gemini 2.5 昨日发布。这条不是关于 model 本身,而是分享一则 HN 上相关讨论区提到的有趣数学 puzzle [1]。po 主声称 Gemini 2.5 是第一个能一次答对这道题的模型。题面见下:

There's three people in a circle. Each person has a positive integer floating above their heads, such that each person can see the other two numbers but not his own. The sum of two of the numbers is equal to the third. The first person is asked for his number, and he says that he doesn't know. The second person is asked for his number, and he says that he doesn't know. The third person is asked for his number, and he says that he doesn't know. Then, the first person is asked for his number again, and he says: 65. What is the product of the three numbers?


答案在这里:[2]

[1] https://news.ycombinator.com/item?id=43473489
[2] https://www.reddit.com/r/math/comments/32m611/logic_question_that_has_me_stumped/
🤔31
Forwarded from C’s Random Collection
https://ai-2027.com “We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.” 不管怎样,这个页面的 interaction 很棒 #ai
🤩1
发现一个非常好用的 Obsidian 插件:https://github.com/RyotaUshio/obsidian-pdf-plus

通过 backlink 实现不出 Obsidian 就能给 PDF 做标注和笔记,并且笔记还可以分散在多个文件中,设计得相当 Obsidian native。

#obsidian
2
A really good and concise deep dive into RLHF in LLM post-training, Proximal Policy Optimization (PPO), and Group Relative Policy Optimization (GRPO)
https://yugeten.github.io/posts/2025/01/ppogrpo/
#llm
Truly a thought-provoking piece, from the author of τ-bench.
https://ysymyth.github.io/The-Second-Half/ #ai

So what’s suddenly different now?

In three words: RL finally works. More precisely: RL finally generalizes. After several major detours and a culmination of milestones, we’ve landed on a working recipe to solve a wide range of RL tasks using language and reasoning.

The second half of AI — starting now — will shift focus from solving problems to defining problems. In this new era, evaluation becomes more important than training. Instead of just asking, “Can we train a model to solve X?”, we’re asking, “What should we be training AI to do, and how do we measure real progress?” To thrive in this second half, we’ll need a timely shift in mindset and skill set, ones perhaps closer to a product manager.

It turned out the most important part of RL might not even be the RL algorithm or environment, but the priors, which can be obtained in a way totally unrelated from RL (LLMs).
🔥2
https://arxiv.org/abs/2305.18290 #llm #ai

今天深入学习了 DPO,再次感叹扎实的数学功底对 AI/ML Research 的重要性……

原始的 RLHF 是用 pairwise human preference data(A 和 B 哪个更好)去训练一个 reward model,然后用 RL 来训练主 policy model,objective 是 minimize negative log likelihood + regularization(比如 PPO 就是通过新旧 policy 之间的 KL Divergence 来做 regularization)。这样的缺点在于 RL 是出了名的难搞,而且还需要一个 critic model 来预测 reward,使得整个系统的复杂性很高。

DPO 的思路是,观察到 RLHF 的 objective 本质上是 minimize loss over (latent) reward function,通过一番 reparameterization 等数学推导,重新设计了一个 minimize loss over policy 的 objective,绕过了中间这个 reward model,让 gradient update 直接增加 policy model 生成 winner response 的概率并降低 loser response 的概率,大幅简化了流程。

拓展阅读:
- KTO: 更进一步,不需要 pairwise comparison,只用对 individual example 的 upvote/downvote 也可以学习到 preference。
- IPO: 解决 DPO 容易 overfit 的问题。
👍3
支持一下友邻!很有意思的一个人!👇
🥰1
https://koomen.dev/essays/horseless-carriages/
我是觉得拿工业革命时期的例子来类比 AI 时代的种种有点 cliche 了,不过这篇中心论点和例子都挺到位,还有交互。
In most AI apps, System Prompts should be written and maintained by users, not software developers or even domain experts hired by developers.
🍾1
https://julian.digital/2025/03/27/the-case-against-conversational-interfaces/
这篇可以一起看,标题比较钓鱼(作者自己也承认了),但其实是对怎样的 UX 能最大发挥 AI 效用很好的思考。
AI should function as an always-on command meta-layer that spans across all tools. Users should be able to trigger actions from anywhere with simple voice prompts without having to interrupt whatever they are currently doing with mouse and keyboard.

Productivity and collaboration shouldn’t be two separate workflows.


P.S. 这个博主的文章都很赞,比如 https://julian.digital/2023/07/06/multi-layered-calendars/https://julian.digital/2020/09/04/a-meta-layer-for-notes/
🏆1
Forwarded from C’s Random Collection
image_2025-05-14_23-36-37.png
504.8 KB
New landing page design and live at https://deeptime.now 🎉 and deeptime is now in beta, all features are free! Sign up today! #DeeptimeNow