Note that I was referring to scaling of cost, not of performance. If your application parallelizes ideally, then in the worst case your cost will scale linearly, because you just add more machines and increase your power consumption by new_machine_count/previous_machine_count. It's possible adding more processors in the same address space increases the cost by an amount below new_core_count/previous_core_count, in which case the cost scales better than linearly.