Home
cat - after teo rus, bad post!, sleepy goth, crossroads, money, mikage: go deeper, treason and plot, raahhhr!, asuka, O RLY? YS RLY!, bitch please, Angry Young Meredith, fat bob's novel, rose window, space hog!, sup fruits, oh really?, me!, gunpowder, doom, needs head!, native habitat, dehydrated water, brave, serious
...but the kriek is so tasty.

Anyway, I was recently snarked at on IRC for uttering the following sentence:

I.e., you start with the functional spec, you hand it in to the professor, he grades it, then you do whatever the next bit of the process is?
The snarker in question took issue with my use of the word 'he' as the anaphor, or "pronoun that refers to a previously introduced noun", for "professor". I remarked that in my dialect, 'he' is the commonly accepted third-person singular gender-neutral pronoun. "Oh, complete with gender-neutral penis?" snarked the snarker. I offered to use 'it', and remarked that while 'they' has become more common when the referent of a pronoun is known to be gendered but that gender is unknown, using a plural pronoun plays hell with my sense of number.

Thinking further about the original sentence, it occurred to me that "it grades it" also plays merry hell with my sense of syntax -- in this case, anaphora resolution.

Terminology time! An anaphor, plural anaphora, is a pronoun which refers to something. (Why, yes, there can be pronouns which don't refer to anything -- the "weather 'it'" of "It's raining" is one example.) A referent is the noun to which an anaphor refers. Anaphora resolution is the process of matching up anaphora and their references in a sentence.

So. In our example sentence, we have three possible referents: you, the person being addressed; the professor, a third person animate entity of currently-unknown gender; and the functional spec, a third person inanimate entity, thus of no (or, neutral) gender (well, at least in English).

We also have a semantic structure which we need to encode: grades(the professor, the functional spec). (I've placed this in predicate logic format. Since both referents are finite, as opposed to "some professor" or "all the functional specs", we don't need to use any of predicate logic's nifty symbols. We also know that both anaphora will be singular, because both our referents are singular.)

I encoded this semantic structure with the phrase "he grades it", which is a complete sentence being used as a phrase. Syntactically, I would encode it as [IP [NPHe ] [I' [I [-Pst] ] [VP [Vgrades ] [NPit ] ] ] ]. (Sorry, not really sure how to do trees on LJ.)

So, let's look at the possible ways to resolve "he grades it", given our current scope, or "what referents do we have available?" Both 'he' and 'it' are third person pronouns, so that rules out you. The professor could be 'it', but that means that the functional spec would have to be 'he', which isn't possible, because as we already said, the functional spec is inanimate, and 'he' applies to animate referents. Thus, the functional spec resolves to 'it', and as we already ruled out you, the professor must be 'he'. There is only one syntactically legal resolution for all the anaphora in the sentence.

But some people object to calling the professor 'he' when we don't know whether the professor is male or female, because they argue that the speaker is assuming that only men are smart enough to be professors. WTFever, I'm a chick and you're listening to me school you, so you already know that I know better than that; STFU and keep reading. I'm going to explain why 'he' is a more reasonable anaphor for that position than any alternative that was put forward.

As you'll recall, two options were discussed: 'it' and 'they'. We'll take 'it' first, because it's the general case.

All we really know, when the phrasing is "it grades it", is that whatever 'it' is, it is not grading itself -- otherwise 'itself' would be the second anaphor. We also know that you can't be the referent either, so we have two possible assignments: grades(the professor, the functional spec) or grades(the functional spec, the professor). Since 'it' can be either animate or inanimate ("Who put the dog in the trash compactor?" "I put it there."), it grades is an acceptable phrasing (grades needs an animate ACTOR), so this syntactic coding is acceptable. Thus, the syntax parts of the brain pass a validated parse tree to the semantics parts of the brain to perform anaphora resolution. It is more likely that a professor will grade a functional spec than vice versa; in fact, the latter idea is kind of silly, so that reading is "marked". (In optimality theory terms, we might say that it falls hors du combat.) Having to determine which reading is more likely is an extra step that the 'he' case does not require.

Now to consider 'they'. Remember, we had three possible referents, all singular. 'They' is a plural pronoun. Syntactically speaking, 'they' does not fit anywhere in the tree, because there is no plural referent for it to refer to. I'll be honest, I'm not quite sure what happens next, because I know very little about how the brain processes language, but my best guess is that 'they' gets downgraded to 'it' (number being the most common difference between 'they' and the possible referents) and and the same process as before occurs. (Of course, now that 'they' is becoming more common as an anaphor for 'singular gendered animate of currently unknown gender', people may be rewriting their own syntax rules.)

Anyway, in the end, this gets me thinking about computational linguistics and how to write language generators that generate correct and non-confusing syntax. In the 'he grades it' case, we created an encoding using anaphora which had only one valid reading upon decoding. In the 'it grades it' case, the encoding has two possible readings and must be further decoded by a different piece of the language mechanism. In the 'they grade it' case, there's actually no strictly valid reading at first (due to number disagreement), and other encodings have to be tried. It is thus important for a language generator to consider what the most computationally inexpensive-to-decode encoding will be, before it transmits a sentence to a listener.

Either that, or English needs a pronoun which signifies 'singular gendered animate of currently-unknown gender', and I'll let getting that into the language be your problem. Until then, the OED and I will say 'he grades it' until you tell me that your professor is a woman.

ETA: ... and of course this is interesting to me as a computer scientist, because it hints at a potentially NP-complete problem embedded in our neurological language framework: "most effective assignment of anaphora". Of course, n is not particularly large in most cases, but we are talking about encodings that have to be decoded in realtime, and as the number of referents and anaphora in a sentence increase, the number of possible encodings rises as a permutation, which gets very large very fast...

Sep. 23rd, 2006

  • 1:33 AM
cat - after teo rus, bad post!, sleepy goth, crossroads, money, mikage: go deeper, treason and plot, raahhhr!, asuka, O RLY? YS RLY!, bitch please, Angry Young Meredith, fat bob's novel, rose window, space hog!, sup fruits, oh really?, me!, gunpowder, doom, needs head!, native habitat, dehydrated water, brave, serious
Dear all developers who are considering writing a language with a "more English-like syntax" because it'll make it easier for people to learn,

Don't. Seriously. Or I will find you and kill you by cramming your own design notes down your throat. (If your design notes are all in your head, I will cram your brains down your throat. Simple enough.)

Any language which purports to be "English-like" but gives totally different semantics to contains and in needs to be put down like Old Yeller.

Also, having a type system doesn't mean we don't need type introspection. Death on toast to whoever came up with that little omission.

That is all.

--mlp, frustrated
cat - after teo rus, bad post!, sleepy goth, crossroads, money, mikage: go deeper, treason and plot, raahhhr!, asuka, O RLY? YS RLY!, bitch please, Angry Young Meredith, fat bob's novel, rose window, space hog!, sup fruits, oh really?, me!, gunpowder, doom, needs head!, native habitat, dehydrated water, brave, serious
Tonight's steaming double shot of hatred gets flung into the twin gaping maws of the American National Standards Institute and the International Standards Organization. Pull up a chair, all of you -- I may be about to rant about a programming language, but it'll make sense even to the non-geeks in the house, I promise.
So, you were saying, Ms. Furious... )
cat - after teo rus, bad post!, sleepy goth, crossroads, money, mikage: go deeper, treason and plot, raahhhr!, asuka, O RLY? YS RLY!, bitch please, Angry Young Meredith, fat bob's novel, rose window, space hog!, sup fruits, oh really?, me!, gunpowder, doom, needs head!, native habitat, dehydrated water, brave, serious
In my advisor's office the other day, $collaborator2 and I were going back and forth about this soft-query interface my advisor wants us to build. (This is what CHARUN has come back from Development Hell to provide. I talked about that a few entries ago, so I won't repeat myself.) One thing led to another, and when the dust settled, it was on me to write the backend (in C) and retool the frontend (in PHP and, now, Javascript via Sajax), while $collaborator2 did the middleware (based on SVM-light, which is also in C, which I would then write a SWIG PHP wrapper for).

Well, he emailed me tonight to say he'd finished the middleware portion to do what I wanted. I took a look at what he'd written, and was immediately suspicious: I'd asked for a function which returned a vector of weights in the form of a double[], and he had indeed written such a thing, but the parameters were (int argc, char* argv[]). Further, the elements in argv[] were mostly turning into file handles. "The hell?" I said to myself, dug through this rather baroque function, found the part that actually computed the weight vector, and rewrote that as a much smaller function. (Still not elegant, as this is C, but at least it was easier to read.)

I fired that off to him, and a few minutes later he came online to discuss what I'd written. During the course of the discussion, he admitted he'd simply yanked main() part and parcel from another part of SVM-light which did what I wanted (among other things), which I'd already suspected from the parameter names. DO NOT DO THIS, PEOPLE. IT MAKES CODE LOOK UGLY AND REUSES UNNECESSARY MATERIAL.

Now, don't get me wrong: $collaborator2 is a smart guy, and he understands the math behind this way better than I do. I would have a hard time doing this project without him, because among other things, I've never done a partial differential equation before. But it's really frustrating when people parcel out the work on a project and then I have to go back and micromanage them because they do something dumb and slipshod. I could have spent the same amount of time writing the code myself in the first place, you know?

Oh well. In happier news, tonight I discovered DOM-Drag and its offspring, dragsort, which are going to make the UI for this so very very pretty. Still don't like Javascript (though I like C less), but the fact that it has a good DOM API redeems it somewhat (ok, a lot) in my sight.

Also, availability of online library resources will be one of the deciding factors in my choice of institutions to do a postdoc with. A subscription to Safari would be ideal.

I really need to stop writing software in languages I hate. It would do wonders for my mood.