Hackr News App

Open-Source RISC-V: Energy Efficiency of Superscalar, Out-of-Order Execution

(arxiv.org)

101 points

by: PaulHoule

1 day ago

☆

27 comments

☆
dkjaudyeqooe

1 day ago

next
[ - ]
I feel like an open source RV CPU is very likely in the high-performance space.
The amount of effort required to design and implement such a device makes it difficult for a single company to invest in, but many interested users of it could band together to create a viable open source implementation.
I guess it's a question of a project that such an effort can crystalize around.
reply
☆
kimixa

1 day ago

parent
next
[ - ]
[ x ]
<@dkjaudyeqooe> Don't forget how much of a "high-performance" implementation is due to the physical implementation, a lot of engineering effort is put into that post-HDL. And much below HDL is hard to share, as it relies too much on (closed) fab IP libraries and PDK specifics. And then the verification of that result.
Which might discourage an Open Source hardware project with shared ownership as large as a high performance implementation would require - as each cooperating company would end up using rather different products anyway.
I fear it'll become just an "Dump Over The Wall An Old Snapshot" of a few different companies work at best, rather than true cooperation.
reply
☆
zozbot234

22 hours ago

root
parent
next
[ - ]
[ x ]
<@kimixa> There are open source PDK and IP libraries, though only for nodes far from the leading edge. OTOH, trailing-edge nodes are also the most viable overall for cheaper and smaller-scale fabrication.
reply
☆
adgjlsfhk1

19 hours ago

root
parent
prev
next
[ - ]
[ x ]
<@kimixa> I don't think open source will be getting anywhere near leading edge in the near future, but I feel like a really good n12 or n7 chip might be possible. That would be enough to get to ~Zen1 levels of performance (or maybe a bit higher since we know Zen1 had some fairly avoidable mistakes)
reply
☆
SlowTao

19 hours ago

parent
prev
next
[ - ]
[ x ]
<@dkjaudyeqooe> In a way I am not too worried about the ISA, but having a set boot system that you can target the system with. This is where x86 still wins and ARM have dropped the ball. You can boot something like FreeDOS on an 8086 or the latest i9 with the exact same code base thanks to BIOS compatibility. But with ARM you are looking at hundreds of different targets.
The issue with ARM looks to be creeping into Risc V because anyone can make an additional processor entirely to their own target. For better or worse.
A standard boot target is much more useful to the end user than an open chip behind yet another boot standard. That I am praising the mediocre and closed x86 for this is a little showing of how bad the situation can be.
reply
☆
wmf

21 hours ago

parent
prev
next
[ - ]
[ x ]
<@dkjaudyeqooe> I don't know if that kind of collaboration has ever worked in chip design. It seems simpler for one company to design the core and license it out (which is the Arm business model).
reply
☆
vFunct

19 hours ago

parent
prev
next
[ - ]
[ x ]
<@dkjaudyeqooe> Unfortunately, a lot of the architecture is decided by your technology node as well as library. Examples include cache architecture as well as performance-power tradeoffs. There are thousands of standard cells in libraries now, and that's all custom tuned for each technology node.
reply
☆
almostgotcaught

23 hours ago

parent
prev
next
[ - ]
[ x ]
<@dkjaudyeqooe> > The amount of effort required to design and implement such a device makes it difficult for a single company to invest in, but many interested users of it could band together to create a viable open source implementation.
There are lots of companies that have their own high-performance accelerator cores (though not general purpose). Multiple generations. Eg every FAANG (except Netflix, that I know of).
There are exactly zero such OSS cores.
So I think you have this exactly backwards.
reply
☆
Pet_Ant

1 day ago

prev
next
[ - ]
> some (e.g. BOOM, Xiangshan) are developed in Chisel with limited support from industrial electronic design automation (EDA) tools
Isn't translating between languages something that LLMs should excel at? I mean I'm sure it's more than just pasting it into ChatGPT but if the design has been validated and it's understood, validating the translated version should be several orders of magnitude easier than starting from scratch.
reply
☆
zozbot234

1 day ago

parent
next
[ - ]
[ x ]
<@Pet_Ant> Chisel can be compiled to Verilog out of the box, and Verilog itself should have the required support from existing EDA tools. That remark from the paper may perhaps be somewhat confused.
reply
☆
bjourne

20 hours ago

root
parent
next
[ - ]
[ x ]
<@zozbot234> That is not enough. The generated Verilog code can be very opaque which makes it very difficult to analyze in cycle-accurate simulators. It also is (afaik) mostly impossible to automatically correlate an error in the Verilog code with a specific line in the Chisel code. Also pure Verilog is often not enough. You also need tons of vendor-specific pragmas to ensure that the design synthesizes well.
reply
☆
IshKebab

1 day ago

root
parent
prev
next
[ - ]
[ x ]
<@zozbot234> This is true, but unless great care is taken to generate nice Verilog you're going to run into issues when you try to integrate standard tools like functional coverage, formal SVA, etc.
I haven't looked at the Chisel SVA but I do recall another HDL touting readable Verilog generation as a feature in response to Chisel's being bad (can't remember which one) so I guess it can't be great.
I think Veryl stands a decent chance of success precisely because it hews so closely to SystemVerilog - you don't lose access to all the feature industry uses. It's kind of the Typescript of SystemVerilog.
https://veryl-lang.org/
reply
☆
dkjaudyeqooe

1 day ago

parent
prev
next
[ - ]
[ x ]
<@Pet_Ant> > Isn't translating between languages something that LLMs should excel at?
No, not at all. Unless there is a large amount of training data relevant to the translation then LLMs are likely just to make up nonsense. Chisel is a very niche hardware description language.
reply
☆
Pet_Ant

1 day ago

root
parent
next
[ - ]
[ x ]
<@dkjaudyeqooe> Very niche? That's suprising to hear. I'm not in the space, and I know it's not in the big 2/3 (is SystemVerilog distinct from Verilog), but it's been around for 13 years and even DARPA has it on their radar:
> Chisel is mentioned by the Defense Advanced Research Projects Agency (DARPA) as a technology to improve the efficiency of electronic design, where smaller design teams do larger designs. Google has used Chisel to develop a Tensor Processing Unit for edge computing
[0] https://en.wikipedia.org/wiki/Chisel_(programming_language)#...
reply
☆
bee_rider

1 day ago

root
parent
next
[ - ]
[ x ]
<@Pet_Ant> I wonder if they just mean niche in the context of languages generally—human or programming? I mean there are, relatively speaking, boatloads and boatloads of open source software projects out there. Hardware open source projects, well a few exist…
reply
☆
MobiusHorizons

19 hours ago

root
parent
prev
next
[ - ]
[ x ]
<@Pet_Ant> I think it is niche in the sense that it is almost completely unused professionally. Most usage tends to be academic or hobbyist. I don’t mean to imply that it isn’t suitable for professional work, but more that it is not very easy to make work with the industrial EDA tools necessary for fabrication.
reply
☆
brucehoult

14 hours ago

root
parent
next
[ - ]
[ x ]
<@MobiusHorizons> SiFive, the leading RISC-V IP vendor, with cores available (at the moment) up to around Cortex-X2 level, has been taping out chips from Chisel since 2016.
Their first chip, a 32 bit microcontroller, ran at 320 MHz on TSC 180nm, while the comparable Arm Cortex-M4 was typically limited to 180 MHz on the same process node.
The EIC7700X, using SiFive P550 cores, given nice solid Core 2 Quad (or Raspbery Pi 4) performance.
SiFive's X280 cores are being used in rad-hard Microchip chips for NASA.
This is not exactly "academic" or "hobby".
reply
☆
adrian_b

12 hours ago

root
parent
next
[ - ]
[ x ]
<@brucehoult> SiFive has been founded by "academics", including some of those who have designed Chisel.
So it is no surprise that they have used their pet language.
Except for them, the professional use of Chisel is rare, and the future of SiFive is unclear.
Regardless how good it may be, it is difficult for any hardware-description language to replace the incumbents SystemVerilog and VHDL, because all designers are too dependent on whatever the foundries or the FPGA manufacturers support.
Choosing another language is pretty much impossible, unless you translate it to either SystemVerilog or VHDL. If you do that, then it is hard to justify using another language instead of writing directly in SystemVerilog or VHDL.
reply
☆
GregarianChild

9 hours ago

root
parent
next
[ - ]
[ x ]
<@adrian_b> Chisel has a compiler to Verilog. That is not the problem. Many semi-companies use a tool-chain to generate much Verilog from higher-level sources.
The rumour I heard was this: The problem with Chisel was that (at least in the past) the Chisel compiler did not preserve port structure well. So if you had a Chisel file that translated to 80M LoCs Verilog, then verified the 80M Verilog (which is very expensive), then made a tiny change to the source Chisel, the resulting new Verilog uses different port names even for the parts that were not affected by the change. (To quip: the (old?) Chisel compiler was a bit of a hash function ...) So you have to re-verify the whole 80M of Verilog. That is prohibitively expensive, compared to only reverifying the parts that truely need to change. The high verification costs forced by this problem were rumoured to nearly have sank a company.
This is a compiler problem, not a Chisel language problem. I was told that the compiler problem has been fixed since. But I did not check this.
reply
☆
brucehoult

8 hours ago

root
parent
prev
next
[ - ]
[ x ]
<@adrian_b> SiFive was founded by academics who had successfully taped out a number of processor chips. They subsequently hired many experienced industry CPU designers from Arm, Intel, AMD and others.
> the future of SiFive is unclear
What is that supposed to mean? The future of Intel is unclear. The future of Arm is unclear. The future of Tesla is unclear. The future of Boeing is unclear. That's just life in a highly competitive industry.
> Choosing another language is pretty much impossible, unless you translate it to either SystemVerilog or VHDL.
?? Which of course is exactly what Chisel has always done. Do you even know anything about it?
> If you do that, then it is hard to justify using another language instead of writing directly in SystemVerilog or VHDL.
No it is not.
Chisel enables much more abstraction than Verilog, enabling you to design not just a single CPU core but a family with very different characteristics. Diplomacy simply has no analog in the Verilog world.
Chisel, FIRRTL, CIRCT enable the same kind of optimisations on RTL as GCC or LLVM do for C code. In fact CIRCT is built on LLVM. You can emit Verilog that is optimised for different hardware technologies, including different PDKs, or FPGA vs ASIC, in a way that is completely impossible with Verilog.
reply
☆
dkjaudyeqooe

22 hours ago

root
parent
prev
next
[ - ]
[ x ]
<@Pet_Ant> Very niche on the scale of LLM training data.
reply
☆
eigenform

1 day ago

parent
prev
next
[ - ]
[ x ]
<@Pet_Ant> I'm not sure this sentence [from the paper] makes a lot of sense. The only thing non-standard is the use of Chisel (and then probably CIRCT to lower it into Verilog) - if you're actually taping these out, you're still feeding that to industry-standard EDA tools.
reply
☆
dlcarrier

1 day ago

parent
prev
next
[ - ]
[ x ]
<@Pet_Ant> To the contrary, it's something especially suited to being done parametrically. Effectively, you can make a really big regex string to convert one language into a subset of another, then let the optimizer of the second language make it performant.
reply
☆
fithisux

15 hours ago

prev
next
[ - ]
RISC-V needs also an open ecosystem to succeed. Open boards, with fully documented chips.
Maybe it will be a very positive step if the CPU/GPU/DSP fused cores materialize.
reply
☆
sylware

12 hours ago

prev
[ - ]
Anybody with deep knowledge of current RISC-V opensource implementations here?
Do harts have store queue and load queue optimizations? Namely some kind of memory request fusion?
I asked this question because since I am writing rv64 assembly, and since rv64 is a load/store architecture, I tend to pack as much as I can memory ordered loads and stores.
reply
☆
brucehoult

8 hours ago

parent
next
[ - ]
[ x ]
<@sylware> I suppose everything that isn't a toy implementation has a store queue.
Even the U54 Core Complex (later U54-MC) manual from August 2018 states in Section 3.4 "Stores are pipelined and commit on cycles where the data memory system is otherwise idle. Loads to addresses currently in the store pipeline result in a five-cycle penalty."
It probably inherited this from Rocket.
reply
☆
IshKebab

11 hours ago

parent
prev
[ - ]
[ x ]
<@sylware> I'm pretty sure XiangShan has a store queue. I expect the other chips mentioned do too - as I understand it it's a standard optimisation.
reply