zlacker

[parent] [thread] 1 comments
1. jaunty+(OP)[view] [source] 2023-08-31 05:10:43
This is definitely an interesting challenge!

Often CPU steals are visible in cloud environments. This could be useful for finding some noisy neighbor behaviors, and deciding to either adjusting expectations or rerun.

But things like IO, GPU or memory contestation also could be responsible. There are some fancy new-ish extensions for controlling memory throughput. Intel has Memory Bandwidth Allocation controls in their Resource Director Technology, which is a suite of capabilities all designed for observing & managing cross system resources. There's also controls available for setting up cache usage/allocation.

replies(1): >>yellow+Jj
2. yellow+Jj[view] [source] 2023-08-31 08:07:22
>>jaunty+(OP)
Yeah, I think I/O has bit us before. Running on AWS, you may randomly get a huge spike in I/O latency (maybe Amazon adjusted the physical hardware, drive failed, etc) and have no idea why.
[go to top]