My Year in Review 2024
2024 was an eventful but also difficult year for me. We made the decision to go back to running Explosion as an independent-minded, self-sufficient company, and the transition wasn’t easy. Doing open source in a sustainable way is a still a big challenge, and supporting the company with product and services revenue required many difficult decisions. On the bright side, we finally got approved for a substantial R&D government grant this year! I also got to travel a lot and met many cool people, and it’s been incredibly motivating to see that our vision for applied NLP resonates so much with the developer community.
Conferences and talks
This year, I decided to say yes to as many invitations as I could fit into my schedule. This meant a substantial amount of travelling, writing a bunch of talks and meeting lots and lots of people (which I love). Of course, I won’t be able to keep up this pace long-term, so in 2025, I’ll take it easier. I will be speaking at data:unplugged in Münster in April and giving a talk in German for a change, and otherwise my focus will be on my favourite community and ecosystem conferences (current shortlist: PyCon DE, PyCon Italy and PyData Amsterdam).
- Keynotes: PyCon Lithuania · DataHack Summit India · EuroSciPy · PyData Amsterdam · PyCon FR
- Talks: QCon London · PyCon DE & PyData Berlin · PyCon Italy · Budapest ML Forum · PyData London · InfoQ Dev Summit Munich · PyBerlin · dotAI Paris
- Roundups & Highlights: Berlin (PyCon & PyData) · Florence (PyCon) · Immergut Festival · London (PyData) · Bengaluru (DataHack Summit) · Amsterdam (PyData) · Munich (InfoQ Dev Summit) · Paris (dotAI)
- The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies? I don’t think so, and in this talk, I show you why. - Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, I show some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house. - Applied NLP in the Age of Generative AI
In this talk, I share the most important lessons we’ve learned from solving real-world information extraction problems in industry, and show you a new approach and mindset for designing robust and modular NLP pipelines in the age of Generative AI. - 10 Years of Open Source: Navigating the Next AI Revolution
In this talk, I share the most important lessons we’ve learned in 10 years of working on open-source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs.
Interviews and discussions
- Ines Montani on Natural Language Processing (Software Engineering Radio)
- AI – The Artistic Intelligence? (Immergut Festival)
- The AI Revolution Won’t Be Monopolized (TalkPython)
- Building the Future of NLP: Insights on spaCy, Prodigy and Generative AI (Leading With Data)
- The NLP and AI Revolution with the spaCy Creators (Vanishing Gradients)
- Accelerate your Career with Open-Source AI (dotAI)
- PyLadies entrepreneurs and career development (PyLadiesCon)
Blog posts and writing
I wrote quite a few blog posts this year and especially enjoyed putting together practical case studies of NLP use cases with spaCy and Prodigy, including projects from S&P Global, GitLab and Nesta. I would love to do more of those, so if you’re using spaCy and/or Prodigy at work and are open to sharing your use case, get in touch!
- A beginner’s guide to making beautiful slides for your talks (February)
- How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market (February, Explosion)
- Three transformative tools: a review of my favourite apps (June)
- Making beautiful slides for your talks, part 2: All about aesthetics (June)
- How S&P Global is making markets more transparent with NLP, spaCy and Prodigy (June, Explosion)
- A practical guide to human-in-the-loop distillation (June, Explosion)
- Back to our roots: Company update and future plans (July, Explosion)
- The Window-Knocking Machine Test (August)
- How GitLab uses spaCy to analyze support tickets and empower their community (September, Explosion)
- Serverless custom NLP with LLMs, Modal and Prodigy (October, Explosion)
- From PDFs to AI-ready structured data: a deep dive (December, Explosion)
Focus of my work and vision
🔮 distilling Large Language Models into smaller, task-specific components 🔮 developing new workflows to bring modularity and software engineering best practices to modern AI development 🔮 making specialised training workflows and UX around them as approachable as writing prompts 🔮 using clever automation to create better data faster 🔮 data privacy 🔮 strategies for refactoring code and data 🔮 taking Generative AI beyond just chat bots and natural language interfaces 🔮 structured data 🔮 helping developers take back control 🔮 scaling down, not just up
Personal
As I mentioned in my productivity tools review, I’m very much obsessed with tracking things and logging everything I do, watch, listen to, read and like. I also love my Oura ring, which I wear 24/7. The most notable change was that my morning routine and schedule has shifted and I wake up much earlier now. Maybe it’s age, or cats waking me up demanding their breakfast. Thanks to my bike, living in a super walkable city and my modest home gym, I’ve also been able to stay reasonably active.
I listen to podcasts almost daily, mostly on topics completely unrelated to tech. It helps me unwind and makes chores and travelling less dull. My favourite genres include investigative journalism, true crime (I know, I know) and interesting cultural stories and audio reporting. I listened to over 150 this year alone, although this is counting every podcast I listened to at least one episode of.
I don’t watch a lot of TV, but some series and films that stood out this year: Shōgun, The Zone of Interest, The Completely Made-Up Adventures of Dick Turpin (definitely recommended for fellow fans of the Mighty Boosh) and the new season of MasterChef Professionals, which started this autumn. Plus, my ultimate guilty pleasure when staying in hotels by myself (which I did a lot of this year): watching Antiques Roadshow or the German equivalent, curled up in bed with a snack.
I also started several books, but finished very few. One of my favourites was the long-anticipated Rath by Volker Kutscher, the 10th and final part of a series that was also the basis for the neo-noir show Babylon Berlin, set in 1920s to 1930s Berlin. (Highly recommended! Outside of Germany, you should be able to watch it on Netflix with English subs.)
In 2025, I really want to get back into music again. I’ve never been able to listen to music while working and programming, not even instrumental songs, so this limits the time I have to actually enjoy it. I did go to quite a few concerts, though, most notably several shows of my favourite German band Die Ärzte, which would have made teenage me so proud and excited. I also took part in a panel discussion on AI and the music industry at Immergut Festival, combining my work with my love for live music – never thought those two worlds would ever collide!