If your motivation is merely to conserve energy, it might be simpler to just load the FIFO while the RFM69 sleeps, then initiate the Tx, then immediately sleep the atmega328p, and then have the radio wake the atmega328p with an interrupt when Tx finishes.
If you're primarily wanting to reduce latency, though, your approach seems worthwhile.