June 9th, 2013

purple hair

Let's talk about optimality theory. And also ethics.

Lately I have much been enjoying the blog Slate Star Codex, which treads in sparkling prose much of the same rationality, ethics, cognitive science, &c ground that Less Wrong has gotten bad about stomping into dittoheady mud lately. By which I mean it's actually good and stuff. One recent post sparked off some recollections from, of all things, phonology.

"But, Meredith," I hear you say, "what could the study of how sounds are composed into syllables in different languages have to do with whether people are inherently pretty decent or inherently pretty awful and just want to be seen as nice?" Well, I last cracked a phonology text lo these ten and a half years ago -- you will find posts about it on this very blog if you look back that far -- so I may be off about some of the details, and the field has doubtless moved on despite my inattention. (I welcome correction from practicing linguists [q_pheevr? kirinqueen?], more attentive students, &c.) But here goes.

One of the common underpinnings of the various phonological theories that I studied in undergrad and grad school1 is the notion that every syllable, word, &c that is spoken has an underlying representation2 -- i.e., a mental representation of a sequence of sounds to be produced, some abstractable piece of input for one of the state machines on the composed chain that leads from brain to vocal tract. The output of this state machine is (presumably) the sequence(s) of nerve impulses that make your vocal tract do the necessary to make the sounds you wanted to say -- but the sounds articulated (the surface representation) will vary predictably from the underlying representation. The job of a phonologist is to characterise languages in terms of these transformations, ideally in the most compact (or, as both linguists and computer scientists prefer to say, elegant) way possible.

Here's a concrete (and classic) example: English pluralization. The regular plural affix in English is -s, and in cases such as cat → cat-s or top → top-s, indeed the phoneme produced in the surface representation is /s/. But what about dog → dog-s, pronounced /dɔgz/? Or toy → toy-s, pronounced /tɔɪz? You get the picture. So this sort of thing got formalised in the 4th century BC for Sanskrit, but the West only got round to working it out starting in the mid-20th century after lots and lots of descriptive work from people like the Grimm brothers (yes, really). The theoretical frameworks of the '60s and '70s (of which the several I learned about, and have mostly forgotten, grew out of the work of Noam Chomsky and Morris Halle) were all fairly rule-oriented in the way that writing software is rule-oriented, and they all aimed to give linguists the ability to produce a complete description of the rules necessary to produce all underlying → surface transformations for whatever language they happened to be studying.

By now you may very well be saying, "Where do these underlying representations come from, anyway?" I know I am; it's kind of amazing how much clarity one loses about a field of study when one hasn't touched it in a decade. That said, the Chomskyan family of theories has often been criticised for coming up with "just-so stories" about what goes on between brain and vocal tract (the work of Steven Pinker notwithstanding; let's just say there is a lot of ground still to cover), so it's a good thing we're segueing to optimality theory now3. Optimality theory, which came on the scene in 1991, still relies on this notion of underlying representation, but it posits that instead of a however-intricate-it-needs-to-be spiderweb of rules to describe every little edge case of how pronunciation rules interact together for a given language to map a single input to a single output, there's a ranked set of constraints which, applied against the set of all possible candidate surface representations available at the time (which could, in principle, be any old bullshit your brain decides to come up with -- we are talking about a massively parallel computer here), selects a "least-bad" candidate which is then vocalized. The set is the same for all languages, but the ranking differs from language to language. So now language acquisition (you learn the constraint ranking for the language you're learning) and linguistic typology (linear edit distance between constraint rankings), oh and also phonology, fall out of one theory, albeit one that still needs some empirical validation.

So now let's talk about ethics.

Part of doing computer security is being able to think like the bad guy. It is a useful thing, when operating as a defender4, to be able to think like an adversary, to conceive of attacks you would never yourself perform, while coming up with your defense strategies. Put another way, out of all the possible constraints on things it is possible to do with a Turing machine, developers tend to have one typology of rankings ("who would ever ask our database for anything other than what our application asks for?") and attackers a very different one. But a defender who can't adopt the attacker mindset for the purposes of risk assessment cannot be an effective defender, even if the options the defender considers during risk assessments are ones that they would never do independently. Furthermore, if the defender's model of "the attacker mindset" (in the analogy we're constructing here, the "attacker constraint ranking" that the defender uses as a temporary replacement for their own constraint ranking) doesn't comport with what attackers do, the defender won't be very effective either. So not only do you have to be able to think like a bad guy, you have to be able to do it well. A lot of people have cognitive dissonance over this (e.g., as Rob Graham points out, a particular prosecutor, judge and jury in New Jersey). I don't, but then, that's why I'm a computer security researcher.

But it goes beyond that. As you might expect of someone who thrashed Rob Graham so hard in a Cards Against Humanity game that he wrote a blog post around it, I possess a deep capacity for being a terrible person and I am totally okay with this. There have been times in the past when I have done terrible things that have hurt people. Hell, there are times today when I do terrible things that hurt people, like buy goods made in shitty working conditions, although really I have been doing my best to minimize the amount of direct personal harm I inflict on people. What I've noticed, through introspection and discussion and so on, is that by and large, the harm I bring about is through ignorance or inattention rather than intent; having realised this, my gut response has been to try to be more mindful and less of a fuckup, thereby decreasing the extent to which my fuckups inconvenience anyone else. So one could say, as one example, that I raised the rank of the constraint "be conscientious with other people's things", and that while my brain might produce the idea "juggle your housemate's coffee cups," it would fall hors du combat early on in processing due to its violation of this highly ranked constraint. However, nothing prevents me from altering my constraint ordering in a different situation where it's appropriate to do so -- like changing politeness registers for whatever culture I'm in, or taking all the filters off and optimizing for balls-out hilarious evil in a Cards against Humanity game. When it is contextually appropriate and safe to be horrible, I can be a son of a bitch with the best of them, which is fun because being good at things is usually fun. (And by "safe" I mean "nobody gets hurt", which is usually the case in a Cards Against Humanity game apart from your illusions about how your friends spend their spare time being shattered.)

Really I guess this isn't so different from Kant's notion of the categorical imperative, but with lots of them and a ranked-choice ordering. And I also see, off in the distance, something that might be a parallel with Jonathan Haidt's moral foundations research, or which might just be an oncoming train. But it could be interesting to, say, design scenarios that require people to make moral-dilemma decisions quickly and look at the correlations between their choices and their scores along those axes.

Anyway, I'm not sure this gets us fundamentally any closer to answering whether humans are inherently good-seeking or good-appearance-seeking, because obviously there's no objective way to evaluate what constraint ranking a person is using, or whether a person is telling you the truth about their self-reporting of the constraint ranking they're using, or even whether they're right about their self-reporting. But it has been of practical use to me, in the sense that I don't feel any particular cognitive dissonance (e.g., revulsion) when my brain suggests particularly horrific or vile responses to stimuli; when I have the time to think about it, at least, if these things register at all they register as "considered and rejected," as a neat little monadic package. I suspect that it's also an instance of the "I made it but that doesn't mean it's part of me" distinction that I have also found of considerable utility in the last year and change, but that is another topic for another time.

1The University of Houston and the University of Iowa were both Chomskyan programs when I went through them; I learned a little about head-driven phrase structure grammar but that was about it as far as exposure to other theoretical frameworks. I know all the cool kids do statistical everything these days; I work for the world leader in the field, turning research code into production code, so I don't actually get all that much theory these days, and also I work in natural language understanding rather than speech recognition anyway. But these are details.

2I always got the feeling the whole underlying-representation thing had to do with historical similarities, especially since when you study a whole bunch of languages all in one family (which I got to do, for a lot of different families), it quickly becomes clear that a lot of the phonological parallels in languages like, say, Dutch and English are predictable because it's the same word, just said differently. But I don't remember any of my profs or any of the books or papers I read explicitly coming out and saying that. Maybe it's obvious? I don't know. It seems kind of simplistic now that I lay it out like that. My memory is kind of shit sometimes.

3Is that your lampshade?

4My research actually operates a level up from this, focusing on hardening software in rigorous ways, because I don't like having to do the same thing over and over again.