I try to visualize the dependency graph for my variables. I figure OoO cores and optimizing compilers are good enough nowadays that, as long as I don’t really mess things up too badly, the computer should figure out how to extract that ilp.
Or, as a special case if I am writing an MPI code, I try to imagine what in the heck the slowest process is doing while everybody else waits at the barrier.