Another way to implement it is to simply make L1 larger, and reserve the additional space for speculative execution. (I'm thinking this could work along similar lines to register renaming?)
There, if speculation succeeds, the cache lines that would have been purged in a smaller L1 are invalidated and returned to the speculation-reserved pool of lines, and the ones that speculation added become part of the active L1.
If speculation fails, the ones speculation added are invalidated and returned.