Monday 26 July 2010

disunification (2)

Michael Everson correctly identifies a number of reasons to advocate the disunification of the Latin letters beta, theta, and chi from their Greek versions. If this happened, as IPA symbols we would use the Latin versions rather than the Greek ones.
He quotes briefly, without identifying the source, from the IPA 1949 Principles booklet. Here, more fully, is what is says there (The Principles of the International Phonetic Association, pages 1-2). Although unattributed, these are clearly Daniel Jones’s words.
Note the very clear intention to treat IPA θ (vertical) as distinct from Greek theta (typically oblique). Greek letters are to be incorporated into the IPA only as roman [sic] adaptations.
As Jones says, Greek theta has an alternative form, ϑ. This is encoded at U+03D1, whereas ordinary θ is at U+03B8.

In English printed texts that mix the Latin and Greek scripts, the Greek letters are typically oblique, the Latin ones upright. The purpose is to distinguish clearly between the two scripts (whereas the IPA wants everything in the same script). Here is an example, from Abbott and Mansfield’s Primer of Greek Grammar (my copy printed in 1949).
I think disunification of Latin and Greek beta, theta, chi would be a good thing.

An existing disunification that might be thought surprising is that of the IPA symbol for a voiced velar plosive, ɡ, U+0261, from ordinary lower-case g, U+0067. In many fonts there is no difference in the appearance of these two; in other fonts there is, e.g. in Times New Roman ɡ g (which I hope shows up properly in your browser). The IPA is on record as declaring that the two symbol shapes are equivalent and interchangeable. Nevertheless many phoneticians persist in treating them as distinct, which justifies Unicode’s disunification.

It is worth noting that a number of obsolete, derecognized former IPA symbols are located in the Unicode block Latin Extended-B. They include ƍ ƞ ƪ ƫ ƺ ƾ ƻ. This is also where we find upper-case versions of certain IPA symbols. These might be used in orthographies, though not in phonetic texts as such: Ɔ Ə Ɛ Ɣ Ɯ Ɵ Ʊ Ʌ. I have sometimes had to correct careless authors who used them in place of the lower-case phonetic symbols.

Another difficult area is that of letters with diacritics. It is possible to encode any such letter by using the base form plus one (or more) of the Combining Diacritical Marks provided in Unicode 0300–036F. However, doing so puts you at the mercy of the designers of fonts, browsers and word processing software, who may or may not have done the necessary work to make diacritics line up correctly above, below, or through the base letter. For “accented” letters used in orthographies Unicode provides separate encoding, as for example in the case of precomposed á ê ï õ ù ă ē į ő ů ç đ ġ ķ ň. However no precomposed combinations are provided for explicitly phonetic use. Obviously, since the range of possible combinations is potentially enormous we cannot expect to have many of these; but it would certainly be convenient to have precomposed versions of the symbols for the French nasalized vowels (blog, 15 July), ɑ̃ ɛ̃ ɔ̃ œ̃, which are abundantly attested in printed texts.

31 comments:

  1. It would also be quite handy for there to have been precomposed versions of ɛ́ ɔ́ and ɛ̀ ɔ̀ used in African orthographies… but the gate there has closed and the train left the station. No new pre-composed letter combinations will be added to the standard.

    ReplyDelete
  2. You say: "I think disunification of Latin and Greek beta, theta, chi would be a good thing."

    So do I. There are, as I have said, implications to this. There is a certain amount of phonetic data using the Greek characters. Of course, there is also a certain amount of phonetic data using ASCII-hack fonts. So even if Latin Chi were added to support Lepsius transcription, the question would remain: Will the International Phonetic Association endorse the use of Latin Chi, a shift from the current recommendation of using Greek Chi?

    ReplyDelete
  3. The variant form, ϑ, is encoded separately from ordinary θ because of its specialized use as a technical symbol.

    The theta used in Greek text may have either of these forms depending on the design of the font. If I recall correctly, there was even a typographic tradition, chiefly in France, where different forms of the theta were used depending on whether it appeared as the first letter of a word or not.

    This is another situation like that of ɡ and g where a variant form which may or may not be identically to the shape of the ordinary letter depending on the font is given a specialized use and is thus encoded separately.

    In my opinion, the same arguments for a separate encoding apply to the beta, theta, and chi used as phonetic symbols. It really is surprising that ɡ is encoded separately but beta, theta, and chi are not.

    ReplyDelete
  4. Not only Latin g and Latin ɡ, but Latin ɛ and Greek ε, Latin ɸ and Greek φ, Latin ɣ and Greek γ, Latin ɩ and Greek ι

    ReplyDelete
  5. Hmm, this raises another question in my mind: if g has an IPA doppelganger in order to enforce a certain typographic representation, why doesn't a get the same treatment? As things stand, Comic Sans is virtually useless to phoneticians, and that's a damn shame.

    ReplyDelete
  6. Comic Sans is virtually useless to phoneticians, and that's a damn shame.

    I can't tell whether you are being serious or not, but it's not like the lack of distinct glyphs for a and ɑ in Comic Sans is the only reason that it is useless for phonetics (and many other uses besides, for that matter).

    ReplyDelete
  7. Actually…

    In italic faces and script fonts, double-storey a is usually written like a script-a (like a d sans ascender) and script-a ɑ is usually written as a Greek alpha α.

    I observed this in the italics of the Uralic Phonetic Alphabet.

    It is implemented in my own "answer to Comic Sans", Allatuq. Both characters appear in an IPA text at the end of that PDF document.

    ReplyDelete
  8. There are many other fonts in which the ordinary a of Latin alphabet looks like ɑ, so having a codepoint specifically for the front open unrounded vowel would be a good idea.

    (But then, as a physicist I am already bothered by fonts which don't distinguish italic Latin v from italic Greek nu, and there are *lots* of them.)

    BTW, if the IPA treat ɡ and g as the same, how comes in the official chart they use a different glyph for the symbol of the velar plosive than for g used in text (e.g. in the "Pharyngeal" label). I can't see such a difference for any other letter.

    ReplyDelete
  9. I love chaos.

    But could we focus on opinions about the disunification? It's important that we know what people think about this.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. My opinion is that the train has left the station on disunification too, and for the same reason. To introduce a second "Unicode spelling" of [aβa] would mean that when you searched using a naive search tool (and most search tools are naive) for the older version you would not find the newer, and vice versa. The exact same would be true of a hypothetical LATIN LETTER OE WITH TILDE as opposed to the existing two-character combination. IPA is far more in use than Latin-script Kurdish, the last successful disunification in Latin.

    That said, it would be perfectly sensible in a font specialized for linguists to make the Greek roman (non-italic) letters look IPA-ish, and use the specialized italic forms for Greek italic that "Franks" are accustomed to, since the font would not be used for running Greek text. Lots of ligatures should also be included.

    Michael: What a superb specimen! And a very nice-looking comic font as well. But what are the languages on the last page after Basque? The first IPA one is evidently a Romance language, but that's all I can say.

    ReplyDelete
  12. I'm for this disunification, and a good bit more besides. Of course, the reunification of g would be very sensible, and the IPA did try to be sensible, but as army points out, there they are in 2005, and I'm glad of it. I'm too old to accept U+0067 in phonetics, and it appears these blog fonts are too young to accept it in anything.

    ReplyDelete
  13. But John, there are good grounds for adding Latin Chi anyway (Lepsius' capital).

    You mean Cyrillic-script Kurdish.

    And it doesn't address the sorting problem. See my blog.

    Thanks for enjoying Allatuq. I hope to add many characters to it. It has a full set of Canadian Syllabics already. Latin, Greek, and Cyrillic support is not yet comprehensive.

    The first IPA is a Romance conlang, if I remember right. The second is Swedish, if I remember right. I made that file a year ago.

    ReplyDelete
  14. Coming back to disunification, the unified greek IPA letters are not only aesthetically displeasing, they can also create real confusion. I remember some time ago, it might have been on this blog, I tried to post a comment that contained a greek chi to represent the voiceless uvular fricative. It looked fine in preview, but the site's font specified a descenderless chi (which is, apparently, an option in Greek typography) that looked much like a Latin x.

    ReplyDelete
  15. Sorry to be thick but aren't the versions of theta and beta used byl the IPA modified from the Greek already? I can see how the Chi needs to be modified ( though I am not clear on what it would look like if it were more Latin looking—like a Capital letter X?
    And I believe that there is some confusion here about the use ofthe term "font" vs. "typeface". A font is merely an electronic file that has the glyphs of a typeface in it while a typeface is the design of those glyphs. In many cases people are writing about problems with the design of a face (e.g. Whether a face has only a single story "g" or not, usually because it's designer doesn't realize that phoneticians differentiate between the two and think it is merely a matter of taste ) and at other times people are talking about issues with font (that it does or doesn't include a particular glyph. ) And at other times people mean a problem with the Unicode standard, though I think they are very clear about that.

    And

    ReplyDelete
  16. Eric:

    Of course a font designer can make any glyph shape he likes for any letter. The problem is that of all the Greek-based letters that are used in the IPA, all but three of them have been disunified from Greek and are encoding separately as Latin letters.

    Please see my blog entry for samples glyphs and a fuller discussion.

    Luke: That's a very interesting comment. I wonder if you could take a screen-shot of that. What OS and what browser are you using?

    ReplyDelete
  17. John, any idea why your distinction between Times New Roman voiced velar plosive, ɡ, U+0261 and ordinary lower-case g, U+0067 doesn't show up properly in my browser?

    On disunification (1) the Cyrillic Uk (maj Ꙋ, min ꙋ) for which I suggested Code 2000 to David displays OK whether I allow pages to choose their own fonts instead of my selections or not, although it doesn't seem to be in any of my versions of the fonts you have specified for this blog, but the distinction between TNR IPA g and LC g isn't working under either condition.

    ReplyDelete
  18. Michael Everson: The second is Swedish, if I remember right.
    Yes, it is Swedish, specifically the tale of 'Nordanvinden och solen' (The North Wind and the Sun).

    ReplyDelete
  19. mallamb, I placed an HTML font tag around the two Times New Roman letters. My Firefox browser obeys it.

    ReplyDelete
  20. mallamb

    Something very strange is happening.

    On disunification (1) the Cyrillic Uk (maj Ꙋ, min ꙋ)

    Sometimes when I look at it, it displays what you intended. Then I might go away and come back — only to find boxes in place of the uks.

    According to the Character Viewer 2000 is invisible. All the characters from 2000 to 200F are spaces. 2000 is EN QUAD.

    As I write, I begin to suspect that I was wrong about the uks disappearing. They've probably just appeared after I got everson.

    But that would mean that you typed A640A and A640B, or did you paste them from somewhere?

    ReplyDelete
  21. mallamb

    That was it. I've disabled everson and you're back to boxes.

    ReplyDelete
  22. Please call the font "Everson Mono", which is its name.

    ReplyDelete
  23. To John Cowan, the Romance conlang which is the first of my IPA samples is Tundrian.

    ReplyDelete
  24. Michael Everson,

    I can't seem to reproduce the issue, I'm afraid. I must have used Firefox on Windows Vista or Ubuntu at the time.

    ReplyDelete
  25. John,

    Thank you for helping with the process of elimination. It does deepen the mystery, however. I had looked at the page source and seen that you placed an HTML font tag around the two Times New Roman letters, and whether I pasted or Paste Special-ed them into Word they appeared correctly, so it must BE the browser, but what can I do but try the two options I mentioned?

    David,

    I did say perhaps we should just give up trying to understand these mysteries, but I think this is one I can explain: as with the new IPA symbols we have been discussing, I found that whatever font they are in they only appear in the top LH corner of the symbol chart where SP (U+0020) is normally displayed. You can enter their codes in the Character code box or type the code and do Alt+X in Word, but they always appear in place of SP. So I have both typed A640A and A640B, and pasted them from my own posts and various other places. And yes they do need to be in the font! If your browser can't see them at all, I suppose that means you haven't got them in any font. I don't remember where I got Code2000. Perhaps that's one of Everson's too.

    ReplyDelete
  26. Just an extremely short comment. One does not want to trigger certain sensitivities. However...however... It would not be a waste of time to consider "parapraxis" in Andrew M. Colman's excellent Oxford Dictionary of Psychology: "a minor error...a *slip of the pen...Sigmund Freud...advanced the theory that such errors are not random but often represent fulfilments of unconscious wishes...".

    Now, of course, Colman, having composed the clearly superior dictionary of psychology, does not know what he is talking about. So much for him.

    David Crosbie: "That was it. I've disabled 'everson' and you're back to boxes."

    God. mallamb?

    ReplyDelete
  27. Goddit, so please observe the system-specificity of triggers, in psychology no more or less than anywhere else.

    ReplyDelete
  28. Michael

    I would never seek to disable you.

    But neither did I want to defraud you.

    ReplyDelete
  29. It's shareware. You're allowed to try it out.

    ReplyDelete

Note: only a member of this blog may post a comment.