I feel like the Cheney on the MTA algorithm might be more effort than it's worth. It's entirely black magic and abusing constructs in ways a compiler would never expect, and as such it can never be implemented portably. Hell even alloca isn't portable.
Cheney's algorithm on its own is fine (the from-tospace page flipping is pretty ingenious) but when you abuse the stack as in the Cheney on the MTA algorithm it seems like a recipe for arcane code that requires major revisions for every system it'll ever run on. It might be massively more efficient (I remember hearing about some CPU cache arcana that it exploits?) but like many optimizations I wonder if the maintenance burden would be more trouble than it's worth.