[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnucap-devel] gnucap profiling
From: |
gserdyuk |
Subject: |
[Gnucap-devel] gnucap profiling |
Date: |
Fri, 30 Jan 2009 00:54:00 +0200 |
Hello All,
This is part of of discussion regarding GNUCAP profiling. It was begun
in email but then decided that it may be interesting to wide auditory.
address@hidden on 29-Jan-2009 wrote:
====================================================================
Fast MOSFET Model Implementation Proposal for Gnucap.
-----------------------------------------------------
This document is done in the course of evaluation of idea to substitute complex
MOSET model with much
simpler one and use it for simulation of digital circuits.
1. Code profiling
To understand where most time is spent some profiling was made. Profiling was
made using own Gnucap timers and
such tools like gprof (GNU profiler) [1], oprof [2], sysprof [3].
Tests was made on BICMOS circuit made of 400 MOSFETS, simulation time 20 and
200 ns, that is around 40 and 400
periods.
Simulation time, steps, iteration and elative times are presented in the Table
1.
Table 1. Gnucap Timers and Counters
------------------------------------------------------------------------------
value sim time, sec steps successful steps total
Sim duration, ns 20 200 20 200 20 200
embedded model 49.92 742.93 1107 15158 1114 15220
bsim model 65.96 648.87 191 1856 353 3576
-------------------------------------------------------------------------------
Value itrations time per iter, sec time/step,sec
Sim duration, ns 20 200 20 200 20 200
embedded model 6235 82995 0.008006 0.008952 0.044 0.048
bsim model 5207 50578 0.012668 0.012829 0.186 0.181
------------------------------------------------------------------------------
>From result it's visible that short time profiling (20 ns) is similar
to long term (200 ns) so short one can also be used for profiling.
Iteration time for both embedded and bsim model are
the same (similar) and number of iterations is similar too (comparable).
Meanwhile number of steps differs significantly.
Open Question: Al - may be you could comment a few words about that -
why is that difference.
Profiling results (selected functions) are presented at Table 2 (measured by
system timer)
Table 2(a): Profiling results, Embedded Model
-------------------------------------------------------------------------------------
Model: embedded model 20ns embedded model 200ns
sampling time 57.86 779.595
samples rel_time abs_time samples rel_time
abs_tm
gnucap 95.50% 100% 55.2563 96.57% 100.00%
752.85
| sweep 93.59% 98% 54.15117 95.95% 99.36%
748.02
|| sim::solve 88.19% 92% 51.02673 90.75% 93.97%
707.48
||| sim:solve_equat 11.69% 12% 6.763834 11.97% 12.40%
93.317
||| sim:load matrix 11.31% 12% 6.543966 11.30% 11.70%
88.094
||| sim::advance_time 5% 5% 2.893 5.13% 5.31%
39.993
||| sim::eval._models 51.20% 54% 29.62432 57.48% 59.52%
448.11
|||| DEV_MOS..do_it 51.20% 54% 29.62432 53.71% 55.62%
418.72
||||| MOS8::tr_eval 28.70% 30% 16.60582 29.99% 31.06%
233.80
|||| CARD_LIST::do_tr 20.31% 21% 11.75137 21.14% 21.89%
164.80
---------------------------------------------------------------------------------
Table 2(b): Profiling results, BSIM3 Model
-------------------------------------------------------------------------------------
Model: bsim3 20ns Bsim3 200ns
sampling time 76.305 693.67
samples rel_time abs_time samples rel_time
abs_time
gnucap 95.60% 100.00% 72.94758 97.56% 100.00%
676.7445
| sweep 92.39% 96.64% 70.49819 96.90% 99.32%
672.1662
|| sim:: solve 91.79% 96.01% 70.04036 96.20% 98.61%
667.3105
||| sim:: solve_equat 9.61% 10.05% 7.332911 9.67% 9.91%
67.07789
||| sim::load matrix 18.60% 19.46% 14.19273 22.40% 22.96%
155.3821
||| sim::advance_time 0.26% 0.27% 0.198393 0.26% 0.27%
1.803542
||| sim::eval._model 61.37% 64.19% 46.82838 61.89% 63.44%
429.3124
||||DEV_SPICE::do_tr 57.30% 59.94% 43.72277 57.25% 58.68%
397.1261
||||| BSIMload 44.40% 46.44% 33.87942 44.15% 45.25%
306.2553
2. Analysis
Embedded model
Sweep (main simulation loop) takes >97% of the time, i.e. overhead
related to data processing is small enough and can be neglected.
Around 50% of time takes SIM::evaluate_models() regardless of the
simulation length. Indeed, that means that if we'll improve model
infinitely (and evaluation time will be =0), speed gain will be
around twice. This is at the best, real implementation (whatever it
will be) anyway will take some computations.
There is no internal nodes for the model, so simplifying models we can
not gain from node number reduction.
Bsim model
For BDIM model considerations are pretty much the same.
SIM::evaluate_models() takes >60% of time. In there most significant
time takes BSIMload (45%), so expected speedup will be around 2 times
as in previous case.
3. Implementation
As a simplest implementation approach can propose just to add
completely new simplistic MOSFET model to Gnucap with own parameter
set. Conversion of BSIM parameters to these simplistic model
parameters could be done externally from Gnucap code, at or before netlist
generation step.
So simulation will look like:
1.Convert BSIM parameters to MOS_SIMPLE model parameters.
2.Generate netlist with MOS_SIMPLE code.
3.Simulate.
At the next step it would be possible to ember model transformation
into Gnucap (if Al will consider that reasonable).
OpenQuestion: Al √ do you think it is possible to insert to gnucap
model-converion capabilities or should that be external to Gnucap ?
This could give 50% speed up or so. To further improve speed it is
necessary to change computation model:
a)Implement different time-steps in different parts of circuit (this
can be done in same "continuous time" computation model, but quite
complex to implement AFAIK.
b)Switch to clocked discrete time model (unsure yet about that).
c)use even-driven simulation (like IRSIM), that will require
"adapters" to plug event-based part to "continuous time" and back.
.
Refernces
[1] Gprof home page. http://www.gnu.org/software/binutils/binutils.html,
http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html
[2] Oprofile home page. http://oprofile.sourceforge.net/news/
[3] Sysprof home page. http://www.daimi.au.dk/~sandmann/sysprof/
====================================================================
Al Davis <address@hidden> on 30-Jan-2009 wrote:
====================================================================
#1 -- code profiling -
What version of gnucap (and the models) are you using? There
are some significant changes in the 12-23 snapshot.
This is serious. If your tests were made with an older version,
they are no longer relevant.
What the numbers tell me is that the time step control in the
Spice BSIM model is inadequate.
353 steps, 191 successful, tells me that there were 162 steps
rejected. Almost as many steps were rejected as accepted.
Step rejections should be rare. They usually indicate some
kind of problem. They always indicate wasted time.
In contrast, 1114 steps, 1107 successful, tells me that 7 steps
were rejected. That's less than 1%. The "embedded" model uses
the step control associated with the internal components, which
is quite strict. As a result, there were very few rejections.
The iteration count is even more revealing. The "embedded"
model gives you an average of 5.6 iterations per step.
Considering that the algorithm needs one extra for checking,
and defaults to one extra as insurance, that means it usually
converges in 3 or 4 steps. That is good. It means that on
every time step only a minor trim is needed.
The gives you an average of 14.7 iterations per step, or 27
iterations per accepted step. A per-step iteration count this
high says that convergence is not easy. Considering the number
of rejections, and that the default settings allow 20
iterations per step, that tells me that often it was failing to
converge, then reducing the time step and trying again.
Clearly, the time steps taken were too large.
#2 -- analysis
If all you do is speed up model evaluation, you save 50%. That
is misleading. If you try running with "option nobypass" you
might see a bigger difference. Using a simpler model should
also reduce the iteration count and allow bigger time steps.
There already is a simple model. It's called "level 1".
Aside from that, it would be interesting to know how the
options "nobypass", "notraceload", "noincmode" change the
results.
#3 -- implementation
Use "level 1".
You could make a new model with modelgen that is derived from
level 1, that adds parameter mapping.
If you want something even simpler, start with level 1. Make
the capacitors linear. Eliminate code related to "lambda",
essentially setting lambda=0. If you do that, add a fixed
parallel resistor so you don't get "open circuit". Simplify
the diode. A two-region piecewise linear model may work well.
Whether a model is embedded or a plugin has no impact on speed.
All of the embedded models are designed as plugins. Therefore,
you should assume that all new models will be plugins.
There is overhead associated with the "spice-wrapper" which maps
data structures. It is probably not significant with a big
model like a BSIM, but probably very significant with simple
models like a level-1.
a) different time step in different parts of the circuit ...
This is difficult and very experimental. I don't know, without
trying it, what the benefit would be. The situation now is
that although the time step is global, models see it as local,
and iteration is local. If part of the circuit takes many
iterations and another part takes few, iteration on the part
that takes few steps stops when there is local convergence.
So, you do get some of the expected benefit now.
When you have extra time steps, the iteration count per time
step is usually reduced, because each step has a closer
starting point.
b) clocked discrete time model??? .... I'm not sure how that
would help.
c) event driven ... it already sort of is.
Other ideas ...
It should be significantly faster to use "Euler"
differentiation. For reduced accuracy fast simulations, Euler
is preferred. Euler time stepping ignores traditional
truncation error, and becomes completely based on events and
dv/dt.
"Gear" doesn't work right with the spice-wrapper. It is fine
anywhere else. This will be fixed. I think it substitutes
Euler now.
Even if "Gear" did work there, it would not be the best choice
for high speed simulations of digital circuits. Euler is.
====================================================================
Al Davis <address@hidden> on 3-Jan-2008 wrote:
====================================================================
> Aside from that, it would be interesting to know how the
> options "nobypass", "notraceload", "noincmode" change the
> results.
Doing this will make it slower, maybe by a lot. It would be
interesting to see by how much.
====================================================================
--
Best regards,
gserdyuk mailto:address@hidden
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gnucap-devel] gnucap profiling,
gserdyuk <=