zlacker
[parent]
[thread]
1 comments
1. ericho+(OP)
[view]
[source]
2025-12-06 23:10:57
Kimi K2 also uses MLA, and Kimi Linear runs Kimi Delta Attention (it's SSM-like) for three out of every four layers (the fourth uses MLA).
replies(1):
>>jychan+s1
◧
2. jychan+s1
[view]
[source]
2025-12-06 23:21:11
>>ericho+(OP)
Kimi K2 is literally a "copy Deepseek's homework" model. Seriously. It's even exactly 61 layers, the same as Deepseek V3/R1.
[go to top]