Meredith L. Patterson (maradydd) wrote,

[Code] TR1, why do you hate me? A study in unnecessary overcomplication

As most of you know, I have a deep and abiding love for the C++ Standard Template Library. Perhaps paradoxically, the first language I ever got any good with was Python, and I still adore it deeply. It's not the best language for much of the heavy numerical lifting that I have to do, so I mostly use Python for prototyping, and part of what I love about the STL is that it lets me do the same kinds of cool higher-order things I can do in Python.

For instance, suppose I'm working in Python and I have a list of objects, foo. (These could be numbers, strings, files, network sockets, objects I defined, whatever.) I can filter that list based on some arbitrary criterion that I define on the fly: filter(lambda x: x > 13, foo). Instead of the lambda, I could instead use a reference to some other function -- even a method of an object. Consider the following:
>>> foo = ['123', 'abc', '123abc']
>>> filter(str.isalpha, foo)
['abc']
Better yet, those function references can be handed around just like ordinary variables. I can write a function which takes another function (let's call this one bar) as one of its arguments, and use bar anywhere it can be legally applied. I don't have to know bar's real name -- in fact, bar might not even have a real name, because it could be a lambda function and thus anonymous.

The technical term for this is functions as first-order data. It's a hallmark of the functional programming paradigm, so if you learn Lisp or Haskell, you'll be doing this all the time. Treating functions as first-order data lets you do all kinds of crazy shit: you can compose them, you can bind arguments to them (and perhaps those bound arguments are the results of other functions!), you can return them as results (seriously!), and most importantly, you can construct them at runtime. It provides a level of power, expressiveness, elegance and readability which just isn't found in languages which don't support functions as first-order data.

Which languages don't support functions as first-order data? That would be most of them.

"But Meredith!" I hear you say. "I can pass a function pointer as an argument in C; doesn't that mean C supports functions as first-order data?" Well, no, it really doesn't. A function pointer is a pointer, not a function. You can construct a function pointer at runtime, but you can't construct the underlying function it points to at runtime, so you're stuck. Function pointers are also fugly as hell, and although readability is not a requirement -- see Unlambda as an example of a purely functional language which is also completely unreadable -- it sure is handy.

But let's get back to the STL. C++ also doesn't truly support functions as first-order data, but it comes closer by providing function objects, which it calls "functors". In C++, any class can have an operator() method, which allows objects of that type to behave as functions: pass a suitably-typed argument to the object, and voila, you'll get back a result. Template polymorphism makes this even more powerful: it's quite easy to define, for instance, a functor which takes two arguments of the same type, T a and T b, and returns their sum, a + b. (This works as long as operator+ is defined for type T; if not, the compiler will yell at you.) The STL provides a wide variety of predefined functors, along with ways to combine them to create more sophisticated ones. Best of all, any ordinary function can be instantly promoted to a functor with ptr_fun, and any class member function can become a functor via mem_fun (or its slightly less useful brother mem_fun_ref, but that's another story).

Okay, but why is this useful? Indeed, the <functional> library is rather masturbatory on its own, but it really comes into its own when paired with the <algorithm> library. This bad boy provides several dozen handy algorithms for operating on sequences -- searching, sorting, replacing, copying, swapping, removing, shuffling, reversing, applying arbitrary functors to members, and so on, plus conditional versions of many of these (e.g. find_if and remove_if). These algorithms are all generic: you can use them on any STL container1, and an STL container can contain anything you want2.

Also -- and I am putting this on its own line because it's important -- they are lightning goddamn fast. They are also somebody else's problem3. Many people who are much smarter than I will ever be have put hundreds of hours of work into making sure these libraries work and perform well. Sure, if you really know what you're doing, you can hand-hack yourself a special case in C or assembler and beat the STL's performance, but that requires (1) really knowing what you're doing, (2) the Copious Free Time to do it yourself, and (3) getting down and dirty with a profiler. (If you're trying to tell me that your C will always beat the C++ equivalent but you haven't profiled the two against each other and your name isn't Dennis Ritchie, get the hell off my lawn.)

Anyway. The <algorithm> library means that conditional operations over containers, like "how many even numbers are in this list?", don't require a for-loop -- you can express them in one line (see the example at the end). The catch, however, is that any function you apply in an STL algorithm must be expressed as a functor. So, if you want to start using C++ to its full power, you're going to need to take an hour or two to get comfortable with functors and how they work.

Now, one of the coolest inventions in the C++ world in the last fifteen years was the concept of smartpointers. (I have waxed rhapsodic on this before.) Smartpointers are objects that help programmers avoid the problem of memory leaks. Dynamic allocation functions (malloc) and operators (C++ new) grab memory off the heap and return a pointer to that storage. If the pointer goes out of scope and you haven't explicitly put that memory back with free or delete, the memory is now unreachable by your program or any other one -- congrats, you've created a memory leak. It'll free itself when the program terminates4</sup> ... but will that happen because the user terminated it normally, or because it ate up all available memory, threw bad_alloc and died?

By contrast, since smartpointers are objects, they have constructors and destructors. When they come into scope, they allocate whatever they point to; when they go out of scope, they destroy their target and free up its storage, even in troublesome cases like exception-triggered stack unwinding. They are immensely useful, and they are rather more complicated than I have described them here; indeed, it took some three months for the Boost team to decide how best to implement them, and they weren't accepted into the C++ Standards Committee's Technical Report 1 until 2005, some seven years later. But now they're here, they've been part of libstdc++ since gcc4.05, and they're ready to rock.

They are also where all the trouble started.

I found myself in a coding situation where I needed to maintain a std::vector of objects all belonging to the same abstract class. Doing exactly this is impossible, for two reasons:
  1. A vector<T> needs to know how big T is, in order to allocate its own storage properly. Otherwise, T is an "incomplete type", and the compiler won't let you instantiate that vector.
  2. Abstract classes are incomplete types.
However, pointers are all the same size -- a pointer is just a variable that holds an address. So even if T is an incomplete type, *T isn't, and you can create a vector<T*> just fine -- create your derived-class objects, upcast them, and you're golden.

However, although vectors manage their own memory, a vector of pointers will only release the pointers it contains -- not their contents. I could write a routine to free up the pointers' contents, but could I be sure it would be called if the vector were unexpectedly destroyed? I couldn't, but a smartpointer could do that for me. Better yet, shared_ptr (one of the smartpointer classes described in TR1) will automatically upcast itself whenever it needs to. Result: I created a vector of smartpointers to my base class, filled it with smartpointers to objects of various derived classes, and life was good.

For my next trick, I needed to be able to search my vector for the first object which returned false for a certain parameterless method defined in the base class. In most cases, this is easy: find_if(v.begin(), v.end(), not1(mem_fun(&X::foo))), where X is the class, foo is the method, and not1 is an STL adaptor function6 which provides a functor returning the opposite of whatever its argument (here, the functor created by mem_fun) returns.

Alas, this doesn't work for smartpointers, because a shared_ptr<T> is not a T; it doesn't have T's methods. This isn't the STL's fault; mem_fun entered the standard before shared_ptr did. The TR1 authors noticed this, and helpfully included mem_fn, a cleverer version of mem_fun which plays nicely with smartpointers.

However, they did not provide an update to not1 (or, more properly, its underlying function object adaptor unary_negate), and thus my compiler choked all over the place.7 What's a girl to do?

Well, as cipherpunk pointed out when I brought this up to him after I'd already solved the problem, the canonical solution is to write up a little adaptor class and point not1 at that. I'll leave that as an exercise for the reader; I could do that, but I didn't want to, because, dammit, standard tools should be able to do standard things. There had to be a way -- but what was it?

The documentation for unary_negate provided a clue: a footnote at the end noted that unary_negate could be constructed from logical_not and unary_compose. Well, unary_compose isn't actually part of the Standard; it's an SGI extension, and gcc provides it in the <ext/functional> header, but that left me in the same boat as not1 did. However, TR1 has expanded facilities for function composition, in the form of bind. The STL has limited support for function composition in the form of binder1st and binder2nd; you can bind a constant value to either the first or second argument of a two-argument function, and that's it. TR1's bind -- which, like mem_fn, had its genesis in Boost's bind -- is infinitely more flexible. You can bind as many arguments as you like, to functions of as many arguments as you like. Cooler still, you don't have to bind constant arguments -- you can bind "placeholder arguments", which substitute runtime arguments positionally. _1 is "the first argument given", _2 is the second, and so on and so forth. And, best of all, you can bind other functions. Or functors, if you so choose.

Armed with this knowledge, I tried the following:
find_if(v.begin(), v.end(), bind(logical_not<bool>(), mem_fn(&X::foo)))
That failed too. If you've read this far, you probably understand enough to figure out why; take a moment and see if you can find the bug.

Okay, here's the deal. find_if takes as its last argument a functor, which it applies to every element of the sequence bounded by the iterators that are its first two arguments. bind, as we said, binds arguments to functions (or functors). logical_not is a functor, but it isn't a function adaptor. It takes a boolean value and returns the negation of that value. So, we have to bind a boolean value to it. mem_fn returns a functor. (Unary, in this case.) A functor isn't a boolean value. In this case, it returns a boolean value, but in order for logical_not to work, I have to bind it to a boolean value.

So, how do we turn a unary functor into a boolean value? bind to the rescue again! Use one of those placeholder values I mentioned, and now we have
find_if(v.begin(), v.end(), bind(logical_not<bool>(), bind(&X::foo, _1)))
It compiles, it runs! Hallelujah!

And yet I am disappointed. In principle, I really shouldn't be. What I've written here is very functional in style -- I am quite literally composing a function with another composed function, and I am doing so with strong typing and polymorphism. (If foo were virtual, the appropriate subclass method would be called.) But it is less readable than it ought to be, and I had to do it because my new tools broke my old tools without providing a replacement for the old tools. I expect better from the latest-and-greatest in my language of choice; at the very least I expect not to have to find out the hard way. (Boost's mem_fn documentation provides a brief warning that it isn't 100% compatible with STL adaptors, but apparently this issue is not general knowledge.)

Hmm. I've been asked, in the past, to write articles for programming magazines about projects I've worked on. Perhaps something less long-winded than this diatribe is in order.

1With a few exceptions. Scott Meyers' Effective STL is a wonderful field guide to the STL, its capabilities, and most importantly, its weird edge cases. I recommend it highly.
2Except references. I haven't run into many situations where I'd want a container of references, though.
3Then again, tonight this got me into trouble. Which is actually why I'm so surprised.
4Unless you were using POSIX shared memory, in which case, sucks to be you. man(1) ipcrm or reboot, sucker.
5No love from Visual C++ until whatever the release after Visual Studio 2008 is, alas. Go download Boost.
6A syntactic-sugar function to create a particular function object adaptor, in this case unary_negate. The STL has several of these, to make code easier to read; others include bind1st and bind2nd (which are handy for function composition, but too nuanced for me to explain in a footnote; go read about them) and in fact our aforementioned ptr_fun and mem_fun.
7Reading the output closely, it actually appears to be a problem with argument deduction:
/Developer/SDKs/MacOSX10.4u.sdk/usr/include/c++/4.0.0/bits/stl_algo.h:259: error: no match for call to '(std::unary_negate<std::tr1::_Mem_fn<bool (pqxx::connection_base::*)()const> >) (std::tr1::shared_ptr<pqxx::lazyconnection>&)'
/Developer/SDKs/MacOSX10.4u.sdk/usr/include/c++/4.0.0/bits/stl_function.h:322: note: candidates are: bool std::unary_negate<_Predicate>::operator()(const typename _Predicate::argument_type&) const [with _Predicate = std::tr1::_Mem_fn<bool (pqxx::connection_base::*)()const>]
Looks to me like unary_negate's operator() doesn't like accepting a shared_ptr to a derived class, or else mem_fn isn't providing an argument_type. OTOH, if this is a const mismatch, I am going to go throw someone out a window.

Tags: but meredith i hear you say, code
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded  

  • 38 comments