Easy Languages Dictionary
Published 24/08/2024
Contents
1 Overview
Easy Languages Dictionary is a browser extension to help one learn languages. It provides translations of individual words in the subtitles of videos published by any of the Easy Languages YouTube channels. It recognizes the words in the subtitles using OCR[1][1]: Optical Character Recognition and translates them using various online translation services.
The project is free and open-source, and hosted on GitHub.
2 Background
I first became interested in learning Polish in 2020, to be able to communicate with my girlfriend in her native tongue. Naturally, I was keen to find out what the most efficient way of learning a language would be. Based on my experience learning English, I felt that the language teaching methods typically applied in schools (learning grammar rules and out-of-context vocabulary) were suboptimal.
I then came across a video by Jeff Brown, a foreign language instructor, in which he reviewed language learning approaches and showed how he learned Arabic. I was impressed with his research-based approach, the focus on comprehensible input, and found his method compelling. Jeff’s language learning approach essentially boils down to finding a native speaker, and getting them to explain children’s books and magazines (in any language) to you in the target language (the nature of the material used is not so important, as long as there are large pictures in it). Jeff also insists that no English should be used in the sessions. This essentially mimics the way children learn languages.
I tried Jeff’s approach, but I felt that the lack of English made the production of comprehensible input more difficult for the language parent than it had to be, especially if one is a complete beginner in the target langauge. After all, in contrast to children, adults learning a second (or n-th) language don’t also need to learn the concepts that words represents, only the words themselves. In my view, a word’s meaning is thus most efficiently conveyed via translation, using a language one already knows, rather than sign language and wild gesturing.[2][2]: I also think that Jeff’s method may be most suited to extroverts like him, and perhaps less suited to introverts or people with perfectionist tendencies, who might find it stressful or unpleasant to attempt to communicate in a language they have no or very little knowledge of. For those people it may be better to focus on comprehensible input (listening and reading) first, and only attempt to communicate with native speakers once they have acquired a basic level of proficiency. This approach may also be closer to the way children learn their first langauge, who only start speaking at 12-18 months. In summary, I found Jeff’s approach too painful, and gave up on it after only a few sessions. It did, however, make me aware of the importance of comprehensible input for language learning.
A while later I tried continuing my journey learning Polish using the Duolingo app. I gave this up after a week, due to Duolingo’s unbearable repetetiveness, incredibly boring content, and lack of human communication.
After that, my Polish learning journey stalled for quite a while. My idea of the perfect Polish learning content at that point was beginner-level videos with subtitles and translations, to make the input comprehensible. I did not find content of this nature until 2022, when I came across the Easy Polish YouTube channel. This channel is part of the Easy Languages franchise, a collection of YouTube channels which aims to teach people languages by interviewing native speakers on the streets about various topics. The videos include subtitles, along with whole-sentence English translations. While I was very happy to have found this excellent resource, I quickly realised that there was little to no content for complete beginners, and that having whole-sentence English translations, while conveying the meaning of what was being said, did nothing to help me understand the meaning of individual words. The content did therefore not qualify as comprehensible input. What was missing were individual word translations. Such a feature would be difficult to embed in a video though, and would require some level of interactivity to work. At that point I had the idea for a browser extension which would perform OCR of the video in real time, and then offer individual word translations upon request (e.g. by hovering over the word).
In mid-2023 I finally got around to starting the work on this browser extension. I estimated that this would take 2-3 weekends to get done, however this turned out to be wildly inaccurate.[3][3]: It was actually closer to 6 months. Following an arduous development journey (see Sec. 3), I did eventually end up with what I had envisioned, and it does seem to work rather well. Now I just have to stop procrastinating and actually learn Polish…
3 Development journey
I had not previously developed a browser extension, so there was a lot to learn. The Parcel JavaScript bundler allowed me to get started quickly, which has a Web Extension configuration.[4][4]: Parcel takes care of bundling, copying files to the output directory, and even rebuildling and reloading hte extension when changes are detected. Unfortunately, the auto-reload feature became more and more unreliable as development continued, and almost never worked in the end. Debugging these issues took too much time and so I ended up replacing Parcel with the excellent Esbuild (which is around 40x faster than Parcel) to perform bundling, web-ext for auto-reloading, and a custom build script to take care of the plumbing, copying the necessary files, and packaging the extension for publishing. Getting started with OCR was also relatively straight-forward, thanks to tesseract.js, a WebAssembly port of the well-known Tesseract OCR Engine.
The more difficult part turned out to be translation. Initially, I attempted using various online dictionaries for translating individual words. This doesn’t work because online dictionaries can only translate lemmas, which are canonical, non-inflected words (in the case of nouns, this is usually the singular, nominative, masculine form, or in the case of verbs this is the infinitive present).[5][5]: As I learned later, the process of turning an inflected form of a word into its corresponding lemma is called lemmatisation, and a software program capable of performing this action is called a lemmatiser. More on this in Sec. 4.
I thus decided to use online translators such as Google Translate instead of dictionaries, which can translate words in their inflected forms as well. Using such services programmatically can be a bit tricky. When performing translations by visiting websites such as Google Translate or DeepL, the website makes requests to the service’s servers to perform the actual translations. As it turns out, translators don’t like you using their services through any means other than their websites, and typically perform a number of checks to prevent this.[6][6]: Since online translators usually offer paid services, I guess this is to prevent abuse. However, the most effective abuse prevention is to limit the number of requests that can be made in a certain timeframe by a given IP address. Any other checks are ultimately unavailing, as a sufficiently motivated individual will be able to bypass them - if translation can be done via a website without an API key, it can also be done programmatically. Programmatic translation thus requires imitating those requests perfectly, so that the server thinks it was made by the website, rather than by a script.[7][7]: Some translators get rather creative with those checks. For example, DeepL takes into account the number of times the letter “i” appears in the text to be translated when forming the request to be sent to the server.
I found out quickly that out-of-context translation can be highly inaccurate. For example, translating the Polish word “was” (Accusative plural of the personal pronoun) with Google Translate yields “mustache” instead of “you”[8][8]: The Polish translation of “mustache” is actually “wąs”, not “was”, so it seems Google Translate does not consider diacritics. Thus the question arose how to perform in-context translation. The answer relies on almost all online translators being capable of translating HTML. To perform an in-context translation, one only needs to wrap the word to be translated into an HTML tag:
Sentence containing <em>word</em> to be translated in context.
If all goes well, the translation will contain the same HTML tags, with the in-context-translated word inside of those tags. I discovered pretty quickly that Google Translate performs very poorly at in-context translation, and so switched to Bing Translate (which performs much better). In some instances however in-context translation will not work, especially if the sentence structure of the translated text is substantially different than that of the text to be translated. In those cases it is useful to have an in-context translation as well. I therefore decided to always display both an in-context as well as an out-of-context translation.
I wanted the extension to run both on Firefox and Chromium-based
browsers.[9][9]: The extension is
not available on Safari, as this requires enrolling in the ‘Apple
Developer Program’, at a cost of 99$/year, and because Safari doesn’t
support this WebExtension API method which the extension relies
on. This was a rather challenging task due to differences in the
way Firefox and Chrome have implemented browser extension support.[10][10]: Every browser extension
contains a file named manifest.json, which contains a bunch
of metadata about your extension. It also contains a key named
manifest_version, which determines the API version to be
used. There are two versions currently in use: 2 (MV2) and 3 (MV3). MV3
is the latest version, and the Chrome Webstore is in the process of
removing MV2 extensions. Unfortunately, Firefox and Chromium have
implemented MV3 sufficiently differently for this to be an issue,
particularly when it comes to permissions.
To make my life easier, I ended up using MV2 for the Firefox version,
and MV3 for the Chrome version. My build
script automatically takes care of the necessary
adaptations.
During the development process I also discovered a number of Chromium and Firefox bugs, which had to be worked around.
Finally, I wanted the browser extension to work on mobile browsers as
well. Unfortunately, there aren’t many mobile browsers that support
extensions. Firefox for Android started supporting extensions in
December 2023, and on iOS there is a browser called Orion which supports extensions. I
modified the extension to ensure it would work on the mobile YouTube
page, m.youtube.com, in addition to desktop YouTube.[11][11]: During this process I
learned that the mobile YouTube page uses a library called incremental-dom,
which provides virtual DOM
functionality. I found this out when the elements I was inserting into
the DOM were no longer present a few milliseconds later! Incremental-dom
had spotted the extraneous elements and removed them. The solution to
this issue was to place all inserted DOM elements outisde of the subtree
managed by incremental-dom. Unfortunately, the extension doesn’t
currently work on Android, due to this Firefox
for Android bug.
4 Future work
There are several items yet to be implemented. Feature-wise, I’d like to translate Easy Languages Dictionary into various languages, so that it can also be used by people who don’t speak English. This includes the option to translate words in subtitles to languages other than English.
I’d also like to add lemmatisation capabilities to Easy Languages Dictionary. This would allow me to remove the out-of-context translation and replace it with a more accurate translation of the lemma/dictionary entry, along with grammatical information such as case, number, and tense. Performing lemmatisation accurately requires a context-aware lemmatiser.[12][12]: Consider the term “means of compliance”. Lemmatising “means” with a context-unaware lemmatiser would likely yield the verb “to mean” rather than the noun “means”. Open-source lemmatisers are available for almost any spoken language, however they are written in a variety of programming languages. Integrating them into a browser extension would be a challenge, as they would either need to be written in JavaScript or compilable into WebAssembly, which is not always possible. A better option would be to have a server running all these lemmatisers, with the browser extension sending lemmatisation requests to that server.
It would also be great if the extension worked on videos outside of the Easy Languages channels, in particular with YouTube’s closed captions.
Development-wise, I’d like to improve test coverage, which is fairly low at the moment. In addition to the above, there are about a million smaller items to fix/implement at some point.