While improving our translation process we got some inconsistencies in arabian properties files. Below is a small example (1). Suddenly in the unicode escaped files parantheses balance was broken (2), while in the editor everything seemed to be ok (3).
-
message.title = Message {0}:
-
message.title = :{\u0627\u0644\u0631\u0633\u0627\u0644\u0629 {0
-
message.title = :{الرسالة {0
How could this happen? There’s no voodoo going on, just bidi (bidirectional text) algorithm operating in the dark and someone editing the file without knowing about this algorithm.
What this algorithm does, is to display mixed directions text. For every character it decides the direction of textflow and wether the next char should be placed to the left or to the right. The latin (LTR) and arabic (RTL) letters are displayed in the right direction by nature. They have a so called “strong” direction. Parentheses, numbers, punctuation and similiar characters have a “weak” or “neutral” direction, which is determined depending on their neighbouring charcters.
Here’s the correct entered text and how it is displayed by the bidi-algorithm:
-
message.title = \u0627\u0644\u0631\u0633\u0627\u0644\u0629 {0}:
-
message.title = الرسالة {0}:
While this might be acceptable for most mixed text with only some “against-the-stream” words, for our case (which is quite common in a properties file) it looks really really wrong, as the parameter placeholder is splitted across the LTR and a RTL part and the opening parantheses (in the RTL part) is mirrored while the closing one is not. Now someone with no bad intention at all tried to correct this (still proper) property and accidentially broke everything without even noticing. The result is above.
What can be done to prevent this tricky translation bug?
First of all we added a sanity check, that detects broken parameter placeholders:
-
Pattern messageFormatParameter = Pattern.compile("{d{1,2}}");
-
Matcher matcher = messageFormatParameter.matcher(englishValue);
-
while (matcher.find()) {
-
if (!translatedValue.contains(parameter)) error = true;
-
}
Then we corrected the order of the characters in the file.
And finally we inserted an Unicode control character, which prevents the text from being displayed wrong. Now there’s no reason to correct something, and at the same time no opportunity to break something. (Well, if you want or if you act thoughtless, you can break almost everything. But it was too easy before.)
-
message.title = \u0627\u0644\u0631\u0633\u0627\u0644\u0629 {0}:\u200f
-
message.title = الرسالة {0}:
One open thing is, that we still wait for an advanced editor which is not only able to display and handle bidirectional text, but has some fancy features like coloring the text depending on its direction.



