Faster timing estimates #217
Replies: 37 comments
-
As seen in the linked discussion, there are routes to faster timing feedback than running full syn+pnr tools - thanks @bartokon. This likely involves estimation/modeling of some sort. Currently There still needs to be some work to make models like |
Beta Was this translation helpful? Give feedback.
-
https://www.rapidwright.io/docs/Introduction.html#what-is-rapidwright |
Beta Was this translation helpful? Give feedback.
-
Hiho, here are the times for artix implementation mypipeline. (Yea I left my PC for more than 8h of synth xD) . |
Beta Was this translation helpful? Give feedback.
-
And this is for Yosys+nextpnr (This graph looks so broken...) LFE5UM5G-85F-8BG756C |
Beta Was this translation helpful? Give feedback.
-
Wow how interesting that the nextpnr version came back so ...unpredictable... Doesnt seem like results from nextpnr ive gotten before... |
Beta Was this translation helpful? Give feedback.
-
@bartokon I want to have distinct issues for the two parts of the fmax finding problem. See #48 for more discussion like this. Can you share your above plots again there? :) |
Beta Was this translation helpful? Give feedback.
-
I find this paper pretty interesting on how to do static timings esimations: "Timing Driven C-Slow Retiming on RTL There are timing models for various primitives in function of width And related article where the author states "the µLN indicator can be used for fast static timing estimations", see https://www.eetimes.com/introducing-c-slow-retiming-system-hyper-pipelining/2/ |
Beta Was this translation helpful? Give feedback.
-
I'm sending a pull request with a function that estimates delays based on operations and bit sizes |
Beta Was this translation helpful? Give feedback.
-
See https://github.com/JulianKemmerer/PipelineC/blob/master/src/DEVICE_MODELS.py |
Beta Was this translation helpful? Give feedback.
-
Lets continue here @suarezvictor You ask: I encourage you to take the work in the direction you most want to work on. Feel free to experiment adding things to |
Beta Was this translation helpful? Give feedback.
-
I think getting to 1) models of basic pipelined operations and then 2) models of 'full netlist' timing are generally what we want. The goal being instead of asking a synthesis+pnr tool to give the timing of a pipelined operation (or entire design) there should be some kind of faster modeling that can be done... So lets focus on the smaller initial goal from 1): just predicting timing of pipelined basic operators... For that I need to explain how pipelining works right now... |
Beta Was this translation helpful? Give feedback.
-
I propose that for any select work, that you provide the data, and I try to develop the models. In case of the pipelining of a single binary operation (split in smaller widths) it should specify the operation, all the intermediate and final widths, plus the delay (or maybe, as a start, the input widths and number of stages) |
Beta Was this translation helpful? Give feedback.
-
Lets consider the basics first: You can say "pipeline an operation to N stages". Ex. pipeline an 32b adder to 2 stages (16b per stage) And these are the kinds of pipeline cache values @suarezvictor is looking for However the tool does not really use just a single 'stages' latency number to characterize the pipeline. Consider a design with three of these adders back to back in series
If you ask 'pipeline that to 2 stages', i.e. split into half - where does the split happen?
The stage split would occur half way through So pipelining is not ultimately done on the basic operations as 'pipeline to N' stages but instead is done as 'pipeline the op breaking into these pieces/slices' So the model I think would be most useful is Right now the tool describes the chunk/slices/locations of pipelining by each operation instance having an associated list of numbers.
It would not be hard to configure the tool to collect more data by doing a bunch of
And then you could work with that as Maybe the even stage slicing is all thats needed... Consider an odd example
Do we really need to model that case specifically? So maybe ex.
Biggest stage is 75%, then should have the same timing as a design "split evenly" into So maybe big TLDR here is that the timing model can take the form of: |
Beta Was this translation helpful? Give feedback.
-
The data is available in the repo path_delay_cache for a few different FPGA parts. Any more data needs to be manually collected by running the synthesis+pnr tools to build the cache+models. That is something I can help you setup and run (ex. write a adder function, tell to pipeline 0,1,2,3 etc clock cycles, collect data of latency->fmax, repeat for other operators, script ti to repeat for all the FPGA parts, repeat for all the widths etc) I personally dont have time to create and maintain these models so I am leaving it up to whoever wants to contribute to |
Beta Was this translation helpful? Give feedback.
-
My proposal is that you throw to the model all the data, and then the model can simplify the data before the estimation. In the current state, for example, when you specify the width of two operands, I simplify it by just taking the greater one. |
Beta Was this translation helpful? Give feedback.
-
It will increase in reliability over time as more data points are used to update the cache.(It likely isn't very accurate to start) Or if more specific tests are intentionally done (ex. a design of just a single BIN_OP_MINUS_int22_t_int22_t set to |
Beta Was this translation helpful? Give feedback.
-
let's assume it's useful data |
Beta Was this translation helpful? Give feedback.
-
I may not understand the data. When we hace lets say a 22+22 bit add, and you register a 0.5 factor, does it mean you made a 2 stage pipeline of 11 bit each? If you used 11 bit, we'll need that number. We need to know what the synthetizer was asked and what it returned. Maybe if you van illustrate an example with VHDL code it would be even clearer |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
how to round up if the operands doesn't have same size? |
Beta Was this translation helpful? Give feedback.
-
Generally for reference these questions about exactly what is seen at the lowest level might be getting towards 'what LUTs is this?' #45 |
Beta Was this translation helpful? Give feedback.
-
Also @suarezvictor I encourage you, if you want good data for a specific operator, write a little demo function of just that operator and do a ex. |
Beta Was this translation helpful? Give feedback.
-
I'll use the algorithm of taking the widest operando and round upwards with the factor, until we can do it in a cleaner way (i.e. just saving the actual operands sizes) |
Beta Was this translation helpful? Give feedback.
-
Re: stuff like building our own ~nextpnr - or just timing estimate tool etc Very cool to see ECP5 delays documented here from |
Beta Was this translation helpful? Give feedback.
-
I don't get it... 5 years ago? |
Beta Was this translation helpful? Give feedback.
-
I was noting that in that |
Beta Was this translation helpful? Give feedback.
-
You say before PNR? |
Beta Was this translation helpful? Give feedback.
-
In theory yeah - but its almost recreating PNR in difficulty - its still a big project to undertake - I was just noting I know where the raw timing data lives now, giving some more scope to the problem |
Beta Was this translation helpful? Give feedback.
-
Note to remember yosys longest topological path |
Beta Was this translation helpful? Give feedback.
-
Tell synth tool to not flatten (preserve hierarchy bounds) then report timing on each module? instead of synthesizing each module separately? No IO regs so will paths make sense? / doesnt work for combinatorial logic? |
Beta Was this translation helpful? Give feedback.
-
Discussed in #42
Originally posted by bartokon November 9, 2021
Hi,
I'm thinking if it is possible to create a fake part with fake timings, so whole place and route could take seconds (Cached synthesis p&r?) That part should have unlimited resources and each path delay could be for example 1.5*logic_delay. Logic delay could be avg from different FPGA families. All latency finding algorithms could much faster to iterate...
For example generate netlist from Pipeline parsed code then generate fake timing paths depending on logic depth and wire length.
Beta Was this translation helpful? Give feedback.
All reactions