In this posting, we present the concept of morphological variants of collocations, which is a way of using derivational morphology to group semantically similar collocations. An experimental implementation of this concept was released in Muraija 0.9.2.
In Basra they say: In the beginning, there was the masdar (verbal noun¹). In Kufa they say: In the beginning, there was the verb. We say: In the beginning, there was the idea and ideas can be expressed in may ways.
A common way of expressing an action is a verbal sentence, consisting of a verb, a subject and an object.
سَوْفَ تُوَقِّعُ رَئِيسَةُ الجُمْهُورِيَّةِ اِتِّفاقِيَّةَ السَّلامِ فِي الأَيّامِ القادِمَةِ.
لَمْ يُوَقِّعْ الرَّئِيسُ الاِتِّفاقِيّاتِ المُتَعَلِّقَةَ بِتَعْزِيزِ التَّعاوُنِ بُعْدُ.
وَقَّعَ الرُّؤَساءُ اِتِّفاقِيّاتٍ لا خَصْرَ لَها!
By focusing on these core elements, abstracting away morphological variation, we can claim that all of the above sentences express the same idea. This idea can be represented by a triangle, where the lemmatized verb, subject and object rest at the angles. Inside the triangle, we put the surface forms encountered in the examples above next to their corresponding lemma.
The concept of „the idea of an action“
We can take abstraction one step further by defining the „idea of an action“ as the action (الحدث) itself, together with its agent (المؤثر) and its patient (المتأثر). That is, we are not imposing syntactic but semantic roles² on these elements. Following this approach, all of the following sentences express the same „idea of an action“:
وُقِّعَت الاِتِّفاقِيَّةُ مِن قَبْلِ الرَّئِيسَةِ.
قامَ الرَّئِيسُ بِتَوْقِيعِ هذِهِ الاِتِّفاقِيّاتِ.
هُنا الاِتِّفاقِيَّةُ المُوَقَّعَةُ مِن قَبْلِ الرَّئِيسِ.
The image of the triangle can easily be adapted to this more abstract concept:
As expected, the morphological variation increases, because our concept of „idea of an action“ does not impose any syntactic constraints. In particular, this increase affects the action, which can now be a verb, a verbal noun, or a participle. Here we use the term „morphological variation“ in a broad, derivational sense, to cover a change of part of speech as well, and call the set of all surface forms obtained by inflection and derivation a „morphological field“³. The surface forms making up this field are then morphological variants of one another.
This offers us the opportunity to come back to the old quarrel between Basra and Kufa: Which part of speech should we choose to represent the morphological field? Verbal noun or verb? Here we pick the verbal noun, following the Basra argument that it exhibits less morphological variation.
Morphological variants of collocations
We cannot relate the „idea of an action“ to collocations directly, because there are no collocation patterns capturing all its three components. Thus we have to break the idea into two pieces, one consisting of the agent and the action, and another consisting of the patient and the action. These agent-action and patient-action ideas can then be (mostly) caught by common collocation patterns:
If the action is expressed as a verb, the agent-action idea corresponds to a "verb + subject" collocation, and the patient-action idea to a "verb + object" collocation⁴.
If the action is expressed as a verbal noun, the agent-action idea corresponds to a "noun + noun" collocation, and the patient-action idea to a "noun + noun" collocation or a "noun + preposition + noun" collocation.
If the action is expressed as a participle, both the agent-action and the patient-action idea correspond to a "noun + adjective" collocation, where the adjective is an active participle in the case of the agent-action idea, and a passive participle in the case of the patient-action idea.
In our triangle representation of the „idea of an action“, the agent-action idea corresponds to the right side, and the patient-action idea to the left side:
Extending the notion of a morphological field define above to collocations, we define an agent-action morphological field, consisting of all collocations on the right side, and a patient-action morphological field, consisting of all collocations on the left side. Collocations in the same field are called morphological variants of one another.
Some examples
Let us look at some examples for the agent-action morphological field ...
الحقل الصرفي مؤثر - حدث | الاسم | الحدث |
اِنْتَشَرَت ظاهِرَةٌ / اِنْتِشارُ ظاهِرَةٍ / ظاهِرَةٌ مُنْتَشِرَةٌ "تنتشر ظاهرة الاسلاموفوبيا في الغرب على نطاق واسع" | الظاهرة | الانتشار |
صَدَرَ قانُونٌ / صُدُورُ قانُونٍ / صُدُورٌ لِـ قانُونٍ / قانُونٌ صادِرٌ "حسب قانون المطبوعات الصادر العام الماضي" | القانون | الصدور |
اِنْتَهَكَ قانُونٌ / اِنْتِهاكُ قانُونٍ / قانُونٌ مُنْتَهِكٌ "هذا القانون ينتهك الدستور تشكل صارح." | القانون | الانتهاك |
… and the patient-action morphological field.
الحقل الصرفي متأثر - حدث | الاسم | الحدث |
شَكَّلَ حُكُومَةً / تَشْكِيلُ حُكُومَةٍ / تَشْكِيلٌ لِــ حُكُومَةٍ / حُكُومَةٌ مُشَكَّلَةٌ "هناك حكومة وحدة وطنية مشكلة من الحزب الحاكم وأحزاب المعارضة" | الحكومة | التشكيل |
اِنْتَهَكَ قانُوناً / اِنْتِهاكُ قانُونٍ / اِنْتِهاكٌ لِـ قانُونٍ / قانُونٌ مُنْتَهَكٌ " نرفض هذا الانتهاك الفاضح للقانون الدولي!" | القانون | الإصدار |
اِنْتَهَكَ قانُوناً / اِنْتِهاكُ قانُونٍ / اِنْتِهاكٌ لِـ قانُونٍ / قانُونٌ مُنْتَهَكٌ " نرفض هذا الانتهاك الفاضح للقانون الدولي!" | القانون | الانتهاك |
Goal for Kufa: The Ambiguity of the Verbal Noun
It is not uncommon that a noun can be both agent and patient for a given action. In the above examples we have seen "قانون" both as an agent and as a patient for the action "انتهاك". Here we can observe one particularity of the masdar + noun collocation pattern: By looking at "اِنْتِهاكُ قانُونٍ" out of context, we do not know which role "قانون" plays, that is, this collocation is part of both the agent-action and the patient-action morphological field:
Given the ambiguity of the verbal noun⁵, we have to choose another collocation pattern to represent the morphological field. Since there does not seem to be any city advocating for participles, we take sides with Kufa and decide to use the verb-based patterns „verb + subject“ and „verb + object“. That is, "اِنْتَهَكَ قانُونٌ " represents the agent-action field and "اِنْتَهَكَ قانُوناً" the patient-action field.
Morphological variants in Muraija
Although it might look as if we took up a stance on the old quarrel between Kufa and Basra, we still stick to our initial claim: „In the beginning, there was the idea“. So if a user looks up "كمنت المشكلة", they might be interested in knowing how to express this idea differently. That's why Muraija displays the morphological variants which are actually used: You can say "مشكلة كامنة" but not "كمون المشكلة".
Other collocations have much more morphological variants, especially masdar + noun collocations, where the variants come both from the agent-action and the patient-action morphological field, e.g. "انتهاك قانون":
Interestingly, Muraija does not list the collocation “قانُونٌ مُنْتَهِكٌ” from the agent-action field, although it is very well possible to say, e.g. “نرفض هذا القانون المنتهك لسيادة الدول”. (Here the usual precautions apply: Muraija's single source of truth is its corpus from which the collocations are automatically extracted, so mistakes can happen, especially when collocations are rare.) The corresponding collocation from the patient-action field, “قانُونٌ مُنْتَهَكٌ”, is found though, and a click brings you there:
Future developments: From collocations to ideas
Currently, Muraija can only display a tiny portion of the collocations in use, because there are so many: For every pattern, only the ten most frequent ones are listed, which is unsatisfactory. But by inspecting the entry “قانُون”, we can observe that many of the displayed collocations actually are morphological variants of another:
We could gain precious space by hiding the non-canonical variants of collocation (shown in green) or move from a pattern-based presentation to an idea-based presentation.
If you want us to keep you posted on the evolution of Muraija, please subscribe to our mailing list or follow us on our social media channels!
It can be argued that some definitions of “verbal noun” are much broader than “masdar”; in the context of this article, we use them interchangeably.
The terms „agent“ and „patient“ are used very coarsely here. There is a cornucopia of semantic roles, but currently we are interested in what we can easily identify in dependency parse trees.
The definition of „morphological field“ does not state which kind of derivations are covered. This is intentional: Although the morphological fields defined in this article only use verb – masdar – participle, other derivations might be useful to define other morphological fields, like the nisba adjectives for „description ideas“: ملكة بريطانيا – الملكة البريطانية
The passive construction "وُقِّعَ اِتِّفاقِيَّةٌ" is considered a verb + object collocation in Muraija.
Actually, there is even more ambiguity to the verbal noun, as it does not only refer to the action itself (nomen actionis), but the result of this action as well (nomen resultatis). So the collocation “اِقْتِراحُ قانُونٍ” can designate both the act of introducing a bill as in “يجب على الحكومة اقتراح قانون مناسب لعصرنا” and the bill itself as in “ناقشت اللجنة اقتراح قانون اللامركزية الإدارية”. The first sentence can be reformulated using a masdar muawwal – “يجب على الحكومة أن تقترح قانونًا مناسبًا لعصرنا” –, whereas the second cannot.
Opmerkingen