top of page

Reinforcement learning vs “regular” training: the real difference is not the math, it is the loop

Most ML people grow up on a simple mental model: you have a dataset, you define a loss, you run gradient descent, you ship a checkpoint. That covers supervised learning and a lot of self-supervised pretraining. The model is learning from a fixed distribution of examples, and the training pipeline is basically a linear flow from data to gradients. Reinforcement learning (RL) breaks that mental model because the model is not only learning from data, it is also actively creating

2025: The Year I Bet on Myself

On December 30th, 2024, I finished my last day at IBM. It was the kind of ending that looks simple from the outside, but internally it carried years of thought and a lot of quiet pressure. I wasn’t leaving because I hated the work, and I wasn’t leaving because something broke. I was leaving because I could feel myself outgrowing the comfort of a structured path. IBM gave me discipline, exposure, and a solid environment to sharpen my skills, but I kept feeling a stronger pull

From Scaling To Research: Reflections On The Ilya Sutskever Conversation With Dwarkesh

There is a moment in the recent Dwarkesh Podcast episode with Ilya Sutskever that captures a turning point in how the AI community understands its own progress. Sutskever, one of the central figures behind modern deep learning and now the founder of Safe Superintelligence Inc., looks back at the last few years and says, in effect: the era when simply scaling models was the main engine of progress is ending. It is time to return to the age of research, only this time with very

My thoughts on Sora 2

Sora 2 isn’t just another milestone in AI video generation it’s a revolution that changes how we define creativity, truth, and even perception itself. What OpenAI has achieved with Sora 2 is beyond impressive; it’s transformative. For the first time, we’re seeing a model that doesn’t just generate a sequence of moving images it generates understanding . It sees the world, reasons about it, and simulates motion, lighting, emotion, and cause and effect as if it were directing r

Drop Me a Line, Let Me Know What You Think

bottom of page