A treasure map for the coast-less ocean. Or: Diving into the semantic space.
- Mirko Vogel
- Jun 26
- 4 min read
Updated: Jun 27

No fish has ever been observed swimming in the Arabic language, so when saying the simile "العربية بحر", the tertium comparationis is definitely not that animals with gills can breathe there. Commonly, the saying is understood in the sense that the Arabic language is as infinite¹ as a coast-less ocean — but the image has much more to tell than that.
An alternative interpretation of the ocean metaphor
If we imagine every drop of that ocean to represent an Arabic expression, then we would expect 💧مَحَبَّة and 💧عَشق being quite close to one another, 💧لُطف still nearby and 💧كَراهِيَّة far off.

The same should hold true for longer expressions: the pair 💧حُجَّة مُقنِع and 💧سَبَب وَجِيه should be close, just like the pair 💧سَبَب وَهمِي and 💧حُجَّة واهِيَة, with some distance between these pairs.

The (metaphorical) treasure map
Even ordinary oceans with clearly defined boundaries can be said to contain an infinite number of water drops — so how about a coast-less ocean? There are just too many drops in the sea to find anything; we are literally drowning in words.
Luckily, we can draw upon thousands of years of human experience of finding treasures in the sea and make use of the most appropriate tool for this purpose: a treasure map. So if you want to describe a smile, the following map might guide you to the treasures you are looking for.

From the metaphor to the engine room
The idea that phrases have a „geographic position“ and can be found using maps does not only work as a metaphor, it works in everyday reality. If you came to this page using a Google search, this technology has just worked for you. For a couple of years, search engines no longer use the words in your query directly, but map them to a position in a so-called “semantic space” — which has many more dimensions than the three we are used to. Every web page has a position in that semantic space too, so the search engine can then return those that are close to your query.
Of course, this technology needs to be adapted for our maritime purpose, as searching the Internet is different from searching the sea — the Arabic language. One important element is diacritics, as short phrases are ambiguous without them². Without the context of a sentence, how could we know if „أزمة دين " refers to a crisis of religion or a debt crisis³?

Currently, Muraija uses this technology for searching collocations, but as soon as it becomes more mature, we will roll it out for other purposes like grouping collocations and ideas.
Searching in Arabic
Let's start exploring the semantic space with a search for استغل فرصة. According to the terminology established in „Morphological variants: An idea in many shapes“, "استغل فرصة" is a patient-action idea, because the noun „فرصة“ is undergoing the action „استغلال“.

As the semantic space is designed to be about meaning, not form, it is not surprising to see that morphological variants of the same collocation are close to each other, e.g. both "انتهز فرصة" and "انتهاز فرصة" are included in the search results.
To understand to which extent the linguistic form of ideas is abstracted away, we can search for "ضاعت الفرصة”, which is an agent-action idea, as the noun "فرصة" is doing the action of "ضياع“. Not only do we find "فاتت" فرصة" (another agent-action idea), but corresponding patient-action ideas as well: "ضيّع فرصة”, "فوّت فرصة" and "فقد فرصة”.
Eager to try it out? Search for "أرهف السمع”, "انتهت الصلاحية" or "ابتسامة مشرقة”!
We have to admit that Muraija's semantic space is still in an early development stage and thus error-prone. Searching for "أخلف وعد", for example, currently yields (among other sensible results) the collocation "وفي بوعد", which happens to have the opposite meaning!
Searching in English
A s our semantic space is about meaning, not about form, it is very straightforward to extend the concept to multiple languages. As both „overcome an obstacle“ and „تجاوز عقبة“ express the same idea, they should occupy almost the same position — and they actually do. So you can use the English expression as a search query:

Currently, Muraija's semantic space works reasonably well for English though searching in other languages⁴ can sometimes yield meaningful results, too. You can try “bright smile”, “make a suggestion” or “listen attentively”.
The remark made above on Muraija's semantic space still being in an early development stage applies to English expressions even more. Searching for „forced smile“ yields "ابتسامة ماكرة", which translates to "cunning smile" — although the correct Arabic equivalent, "ابتسامة صفراء" , is found in the Muraija corpus almost twice as often.
Nevertheless, it is important to understand that no translation is involved in this process. The English expression is only used to put a mark on our treasure map — and the treasures are then genuine Arabic.

So the collocations returned by a search in English might not express the idea you want to express, but they are genuinely Arabic, extracted from original Arabic sources. That is: No translationese, as you find sometimes in translation databases like Reverso Context, and no AI hallucinations.
Closing remark
At Muraija, we believe that surfing this sea — the Arabic language — can be a great pleasure, and work constantly on improving our surfboard. We appreciate your remarks by email, on X or via the feedback form!
If you want us to keep you posted on the evolution of Muraija, please subscribe to our mailing list or follow us on our social media channels!
It makes a lot of sense — at least mathematically — to assume that the number of words in any language is infinite. For the nerds, see: András Kornai: How many words are there?
To our knowledge, our model is the first Arabic embedding model supporting diacritics. All other models are catered towards dealing with sentences, where diacritics are rarely needed to resolve ambiguities.
We have to admit, that even within the context of a sentence, the automatic analyzer of Muraija sometimes fails to understand if „دين" refers to „debt“ or „religion“
Try „surmonter des difficultés“ (French) or „scharfe Kritik“ (German).
Comments