There are two processes to switch, when one run:io instruction, switch on other process. After ios, the first process's state is ready, which must switch back on it to "receive" and handle it, like bus it to main memory, this time does not use cpu/io resources, but it is necessary to consume one time.
Of cause, the "receive" operation is not cpu/io instruction, so the time will be used to invoke run/io instruction.
For the question
"Is IO_RUN_IMMEDIATE always batter than IO_RUN_LATER?"
Obviously, it base on whether the composition of instruction, that is the number and the order of run:cpu and run:io.
In the simulator, using parameter -s 10 10:50, 5:50, contrast with homework, the IO_RUN_IMMEDIATE not batter than IO_RUN_LATER.