<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Raphaël&#39;s blog</title>
    <link>https://raphaelb.net/</link>
    <description>Recent content on Raphaël&#39;s blog</description>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)</copyright>
    <lastBuildDate>Wed, 18 Feb 2026 21:06:27 +0100</lastBuildDate>
    <atom:link href="https://raphaelb.net/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Tofu brouillé à la harissa</title>
      <link>https://raphaelb.net/posts/tofu-brouille/</link>
      <pubDate>Wed, 18 Feb 2026 21:06:27 +0100</pubDate>
      <guid>https://raphaelb.net/posts/tofu-brouille/</guid>
      <description>&lt;p&gt;&lt;img src=&#34;photo_plat.jpg&#34; alt=&#34;Tofu brouillé à la harissa&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Cette recette est rapidement devenue un classique dans ma cuisine de tous les jours. Prête en quelques minutes, c&amp;rsquo;est une très bonne manière de commencer à cuisiner le tofu.&#xA;Elle est adaptée de la &amp;ldquo;Brouillade de tofu à la harissa&amp;rdquo; d&amp;rsquo;Ottolenghi, tirée de son excellent livre &amp;ldquo;Simple&amp;rdquo;.&#xA;Vous pouvez trouver le tofu soyeux en épicerie asiatique ou en magasin bio.&lt;/p&gt;&#xA;&lt;h2 id=&#34;ingrédients&#34;&gt;Ingrédients&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;2 oignons de taille moyenne&lt;/li&gt;&#xA;&lt;li&gt;1 cuillère à soupe de harissa à la rose (de la harissa classique convient aussi)&lt;/li&gt;&#xA;&lt;li&gt;400 g de tofu soyeux (un bloc)&lt;/li&gt;&#xA;&lt;li&gt;quelques feuilles de coriandre (optionnel)&lt;/li&gt;&#xA;&lt;li&gt;noix concassées (optionnel)&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;préparation&#34;&gt;Préparation&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Ciseler les oignons. Dans une poêle, faites chauffer 2 cuillères à soupe d&amp;rsquo;huile d&amp;rsquo;olive, puis faire revenir les oignons jusqu&amp;rsquo;à ce qu&amp;rsquo;ils blondissent.&lt;/li&gt;&#xA;&lt;li&gt;Ajouter la harissa dans la poêle, mélanger et faites revenir le tout.&lt;/li&gt;&#xA;&lt;li&gt;Ajouter le tofu soyeux. Casser les blocs à l&amp;rsquo;aide d&amp;rsquo;une spatule, jusqu&amp;rsquo;à avoir une consistance d’œufs brouillés. Continuer la cuisson quelques minutes.&lt;/li&gt;&#xA;&lt;li&gt;Servir aussitôt.&lt;/li&gt;&#xA;&lt;li&gt;(optionnel) ajouter les feuilles de coriandre et les noix concassées.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;variantes&#34;&gt;Variantes&lt;/h2&gt;&#xA;&lt;p&gt;La recette originale est dressées sur des grandes tranches de pain grillées, c&amp;rsquo;est aussi une très bonne manière de manger ce plat. C&amp;rsquo;est également excellent quand si on remplace la harissa par de la pâte de curry (jaune ou rouge).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Adding a score to each pre-annotation result on Label Studio</title>
      <link>https://raphaelb.net/posts/label-studio-individual-prediction-score/</link>
      <pubDate>Mon, 09 Feb 2026 14:06:27 +0100</pubDate>
      <guid>https://raphaelb.net/posts/label-studio-individual-prediction-score/</guid>
      <description>&lt;p&gt;Label Studio is an annotation tool that comes really handy when dealing with object detection datasets. A major feature in my workflow is the ability to upload &amp;ldquo;pre-annotations&amp;rdquo;, which are used when first opening a task: a draft is automatically created with all objects present in the pre-annotation.&lt;/p&gt;&#xA;&lt;p&gt;To speed up labeling, I often use this pre-annotation feature to label all images using a zero-shot model (such as &lt;a href=&#34;https://arxiv.org/abs/2401.17270&#34;&gt;YOLO-World&lt;/a&gt; or &lt;a href=&#34;https://huggingface.co/facebook/sam3&#34;&gt;SAM 3&lt;/a&gt;). Once I&amp;rsquo;ve annotated enough images, I train a first object detection model, run this model on the full dataset again, and import these predictions as pre-annotations.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Never Trust Your Dataset</title>
      <link>https://raphaelb.net/posts/never-trust-your-dataset/</link>
      <pubDate>Mon, 10 May 2021 12:23:40 +0200</pubDate>
      <guid>https://raphaelb.net/posts/never-trust-your-dataset/</guid>
      <description>&lt;p&gt;Datasets are ubiquitous in machine learning. There is literally nothing to learn without -labeled or unlabeled- datasets. Lack of datasets has impeded the progress in NLP for low-resource languages: most of the academic work in NLP focus on English and to a lesser extent to a couple of high-resource languages (Spanish, German, Japanese, French,&amp;hellip;).&lt;/p&gt;&#xA;&lt;p&gt;Recently, a diverse team of NLP researchers studied the quality of web-crawled corpora that are behind most of the progress in NLP in the last few years (Caswell et al. 2021). More specifically, they studied 3 parallel corpora used for machine translation (CCAligned, ParaCrawl, WikiMatrix) and two monolingual corpora (OSCAR and mC4) used to train language-specific language models.&lt;/p&gt;</description>
    </item>
    <item>
      <title>How many layers of my BERT model should I freeze? ❄️</title>
      <link>https://raphaelb.net/posts/freezing-bert/</link>
      <pubDate>Fri, 04 Dec 2020 15:56:37 +0100</pubDate>
      <guid>https://raphaelb.net/posts/freezing-bert/</guid>
      <description>&lt;p&gt;Since the advent of the Transformer architecture (Vaswani et al. 2017) and of BERT models (Devlin et al. 2019), Transformer models have become ubiquitous in NLP, achieving SOTA results on most NLP datasets.&lt;/p&gt;&#xA;&lt;p&gt;Before Sesame Street puppets flooded on ArXiv, the de-facto method to train an NLP model leveraged word embeddings pre-trained using Glove or word2vec. These word embeddings were used to initialize the first embedding layer of your model, and you just had to plug the rest of your architecture above this first layer.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Prodigy, a must-have in the Data Scientist toolbox</title>
      <link>https://raphaelb.net/posts/prodigy/</link>
      <pubDate>Sat, 07 Nov 2020 17:41:06 +0100</pubDate>
      <guid>https://raphaelb.net/posts/prodigy/</guid>
      <description>&lt;p&gt;If you sometimes find yourself annotating data for machine learning projects and you&amp;rsquo;ve never heard of &lt;a href=&#34;https://prodi.gy/&#34;&gt;Prodigy&lt;/a&gt;, it&amp;rsquo;s definitely a tool you would be interested in.&lt;/p&gt;&#xA;&lt;p&gt;The project was initiated by the makers of spaCy - the well-known Python NLP library - after they realized that while supervised learning works well, &lt;a href=&#34;https://explosion.ai/blog/supervised-learning-data-collection&#34;&gt;data collection was broken&lt;/a&gt;. To this day, many data collection projects still rely on Amazon Mechanical Turk, a crowdsourcing platform with low wages, questionable UX, and low incentives for quality. Given the major impact of data quality on supervised learning model performances, data collection is too important to be outsourced on Mechanical Turk. Whenever possible, annotations should be done in-house and at least partially by the scientist in charge of the research project. The latter ensures that any quality or annotation issues that could impact the training is noticed beforehand.&lt;/p&gt;</description>
    </item>
    <item>
      <title>About</title>
      <link>https://raphaelb.net/about/</link>
      <pubDate>Fri, 06 Nov 2020 15:33:01 +0100</pubDate>
      <guid>https://raphaelb.net/about/</guid>
      <description>&lt;p&gt;I&amp;rsquo;m a machine learning engineer passionate about deep learning, machine learning and data science in general. I&amp;rsquo;m mainly focused on natural language processing, even if I enjoy computer vision as well.&lt;/p&gt;&#xA;&lt;p&gt;Beyond data science, I&amp;rsquo;m fascinated by design and UX. I&amp;rsquo;m convinced that a good UX is sometimes a serious alternative to machine learning-based solutions.&lt;/p&gt;&#xA;&lt;p&gt;I have a tendency of falling in the &lt;a href=&#34;http://www.sridattalabs.com/2012/02/06/rabbit-holes-being-smart-hurts-prod/&#34;&gt;productivity tool rabbit hole&lt;/a&gt;, so I may occasionally post articles about productivity tool from a data science perspective.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
