RISC-V in AI and HPC Part 1: Per Aspera Ad Astra?
(eetimes.com)
35 points
by: fork-bomber
8 days ago
☆
(eetimes.com)
35 points
by: fork-bomber
8 days ago
☆
12 comments
☆
justin66
4 days ago
The first sentence contains an obvious falsehood:
Introduced in 2014, the RISC-V instruction set architecture has been evolving at a pace that Arm and x86 ISAs have never experienced.
In eleven years the PC world went from having the 8086 to having the 80486.
☆
kragen
4 days ago
parent
next
[ - ]
[ x ]
https://riscv.org/blog/2024/05/14-years-of-risc-v-a-journey-...
It did (01978 to 01989), and RISC-V hasn't changed nearly that much since 02014 despite the introduction of important extensions like V, but there's an even more obvious falsehood: the RISC-V instruction set was introduced in 02010, not 02014.This article has a lot of huge honesty problems, but maybe the worst one is how it focuses exclusively on US companies, who aren't the ones doing the work of making RISC-V real. Where are the mentions of AliBaba, T-Head, WCH, Seeed Studios, Tencent, Pine64, Espressif, Rockchip, and all the other Chinese brands that are such huge players in the RISC-V world? This is like reading a news article about computer networking in 01984 in France that focuses entirely on Minitel and Groupe Bull and doesn't bother to mention Tymnet, IBM, Ethernet, Xerox, or the internet. Did MIPS hire a PR agency to write it for them?
Shilov himself has written about some of Huawei's RISC-V development, so he really has no excuse: https://www.tomshardware.com/news/huaweis-hisilicon-develops... Doesn't that seem a bit more significant than anything related to MIPS?
And in https://merics.org/en/report/huawei-quietly-dominating-china...:
> For instruction set architecture, the software for executing chip production which represents another key supply chain chokepoint with high levels of market concentration, China’s government has chosen an open-source architecture that was pioneered in the United States: RISC-V. Top artificial intelligence and software companies like Alibaba and Tencent have been charged with furthering progress in using RISC-V.
And Huawei's HarmonyOS supports RISC-V: https://www.huaweicentral.com/runhe-software-launches-harmon...
☆
nullc
4 days ago
root
parent
next
[ - ]
[ x ]
Has RISC-V gained a cmov yet or is security critical code still left do branch-and-pray or use byzantine bitops?
☆
kragen
4 days ago
root
parent
next
[ - ]
[ x ]
I'm sure at least one proposed extension has a constant time conditional move, but I don't know of a ratified extension has one. But as T-Head demonstrated with the V extension, shipped silicon can implement non-ratified extensions, and as the fast interrupt handling in WCH's CH32V003 demonstrates, shipped silicon can extend the architecture in ways that haven't even been proposed as extensions. But I don't know of any shipped silicon with a constant time conditional move, either.
For most people, though, using "byzantine" bitops in their security-critical code is less important than being able to run it on a processor that doesn't implement IME or other presumable US backdoors. (Huawei backdoors, though?)
☆
nullc
4 days ago
root
parent
next
[ - ]
[ x ]
Ending up with visible timing side channels where there otherwise wouldn't be ones though is not an awesome tradeoff. Like... vulnerable to NSA vs vulnerable to everyone? The former is probably preferable. And as you say-- the RISC-V option may not be backdoor free, it may just be an alternative backdoor.
Seems like such an unforced error too, particular because CMOVs are extremely beneficial for performance in out of order deeply pipelined architectures. Though at least the performance side can be answered with extended behavior the security side needs guarantees (and ideally ones that aren't "this instruction sequence is constant time on some chips and variable time on others").
☆
kragen
4 days ago
root
parent
next
[ - ]
[ x ]
Yeah, obviously you do want to do the "byzantine" bitops, not just ship code that's vulnerable to timing side channels due to conditional jumps depending on secret data. But you can do that pretty easily on RISC-V, and it doesn't even cost much performance. It's four register-to-register instructions instead of the one you'd have with CMOV:
.globl minop
minop: slt t0, a0, a1 # set t0 to "is a0 < a1?" (0 or 1)
addi t0, t0, -1 # convert 0 to -1 (a0 ≥ a1) and 1 to 0 (a0 < a1)
sub t1, a1, a0 # set t1 := a1 - a0
and t1, t0, t1 # t1 is now either 0 or, if a0 ≥ a1, a1 - a0
add a0, a0, t1 # transform a0 into a1 if a0 ≥ a1
ret
Possibly what you meant by "byzantine bitops" is this version: minop: slt t0, a0, a1
addi t0, t0, -1
xor t1, a1, a0 # set t1 := a1 ^ a0
and t1, t0, t1
xor a0, a0, t1 # transform a0 into a1 using xor
ret
(http://canonical.org/~kragen/sw/dev3/minop.S http://canonical.org/~kragen/sw/dev3/testminop.c http://canonical.org/~kragen/sw/dev3/minoptests)I'm interested in knowing if there's a faster way to do this! You could do it in one less instruction with a multiply, but it's pretty common for a multiply to take multiple cycles.
Apparently CMOV isn't such a big win for superscalar architectures, which is what you'd normally use when performance is critical. But I don't know enough about superscalar architectures to really understand that assertion. And, for low-power architectures, people are moving to shorter pipeline lengths, like Cortex-M0 (3 stages) to Cortex-M0+ (2 stages).
In general, the RISC-V standard doesn't make any guarantees about execution time at all. That's out of its scope.
☆
IshKebab
3 days ago
root
parent
next
[ - ]
[ x ]
It does actually - there's the Zkt extension which declares that some instructions have data independent execution times. (Basically bitwise ops, add, sub, shift and multiply.)
If there was a cmov instruction it could be included in Zkt.
That said I'm also not sure how much performance benefit it gets you. I think the answer isn't obvious and you'd definitely need to do simulations to see.
☆
kragen
3 days ago
root
parent
next
[ - ]
[ x ]
Thank you!
☆
camel-cdr
3 days ago
root
parent
prev
next
[ - ]
[ x ]
https://github.com/riscvarchive/riscv-zicond/blob/main/zicon...
There are min/max instructions and zicond:Most conditional instruxtion sequences are two instructions, cmov is:
czero.eqz rd, rs1, rc
czero.nez rtmp, rs2, rc
or rd, rd, r
☆
kragen
3 days ago
root
parent
next
[ - ]
[ x ]
Aha, thanks! That seems like a much less appealing proposal to me than a simple mv.eqz/mv.nez. I wonder if there are merits to it that aren't obvious to me.
☆
camel-cdr
3 days ago
root
parent
next
[ - ]
[ x ]
The reason against a simple conditional move instruction was that this would require a 3R1W integer register file, while the current extension allows for an 2R1W one.
I can definitely see a three-source conditional move being added in conjunction with more 3R1W instructions (see what the scalar efficiency SIG is doing). But my understanding is that the 2R1W design is desired for some of the wide, high-performance designs. (register file size grows quadratically with port count)
☆
kragen
3 days ago
root
parent
next
[ - ]
[ x ]
Hmm, I guess I had thought you would implement it with a 2R1W register file by just not writing to the register file in the register-file-writing stage of the pipeline when the condition didn't hold, in such simple implementations, same as for jump or store instructions, or those that write to x0. You do have three read dependencies, but without knowing much, I'd guess that's a much smaller problem than a bloated four-port register file.
next
[ - ]