| View previous topic :: View next topic |
| Author |
Message |
Guest
|
Posted: Tue Oct 18, 2005 5:34 am Post subject: Speed gain od MMX, ISSE, SSEx |
|
|
Can I imagine myself ( as user, not ASM coder but Z80 )
typical numbers (range ) of what is speed gain using
one of these CPU optimizations ?
I know, it has to depend on code, but just as rough imagine ? |
|
| Back to top |
|
 |
Clouded

Joined: 25 Jul 2005 Posts: 67
|
Posted: Tue Oct 18, 2005 8:38 am Post subject: |
|
|
kassandro is able to quote scarily precise speed up factors, so I will leave the main answer to him... but for what it's worth, in my last assembly project, x5 speed up. Range varies heavily on either side of that... _________________ a.k.a. mg262 |
|
| Back to top |
|
 |
kassandro Site Admin
Joined: 17 May 2005 Posts: 255
|
Posted: Sun Oct 23, 2005 8:45 pm Post subject: |
|
|
The speed gain of using SSE/SSE2/SSE3 depends very much on the subject and probably to a lesser extent also on the CPU.
With SSE you can process 8 8 bit quantities or 4 16 bit quantities simultaneously. With SSE2/SSE3 you can process 16 8 bit quantities or 8 16 bit quantities simultaneously, but there are some alignment problems, which can be very costly especially if you have only SSE2 and not SSE3.
There is one further advantage of SSE/SSE2/SSE3. Namely that programming is essentially jump free. This is also a disadvantage, if the code branches a lot. In C or normal assembler code, only one alternative of a branch is executed. In SSE/SSE2/SSE3 programming the CPU has always to execute both, because for some bytes one alternative has to be chosen, while for the other bytes the other alternative is necessary. In RemoveGrain I would say that the average gain is about 8 times for SSE, 9 times for SSE2 and 13 times for SSE3 on my cpu (Prescott P4 design). |
|
| Back to top |
|
 |
Libor aka Poutnik Guest
|
Posted: Sun Oct 23, 2005 10:05 pm Post subject: |
|
|
Thank you for the answer. I have already thought
it could be very fine work in SSE optimization tuning
in "SSE hostile code", where speed gain is not worthy enough for such coder effort. |
|
| Back to top |
|
 |
fish14765
Joined: 14 Dec 2005 Posts: 5
|
Posted: Wed Dec 14, 2005 9:44 pm Post subject: Hmmm... |
|
|
| kassandro wrote: | | The speed gain of using SSE/SSE2/SSE3 depends very much on the subject and probably to a lesser extent also on the CPU. |
Could someone explain this bit to me... _________________ uk gambling casinos online |
|
| Back to top |
|
 |
kassandro Site Admin
Joined: 17 May 2005 Posts: 255
|
Posted: Fri Dec 30, 2005 10:07 am Post subject: Re: Hmmm... |
|
|
| fish14765 wrote: | | kassandro wrote: | | The speed gain of using SSE/SSE2/SSE3 depends very much on the subject and probably to a lesser extent also on the CPU. |
Could someone explain this bit to me... |
MMX/SSE programming makes sense, when the input and output data are properly aligned and when no random memory lookup is needed. The quality of such an implementations depend on how much branchin is in the code. Fortunately jumps involving min and max operations do not cause branching in SSE programming. That's why SSE programming works so well for RemoveGrain etc..
CPU dependence is moderate for MMX/SSE, but very significant for SSE2/SSE3. The AMD implementation of SE2/SSE3 is very poor. Even the slowest CeleronD handily beats the fasted AMD cpu. I therefore recommend to use the SSE plugins for AMD processors even if the cpu supports SSE2 or SSE3. |
|
| Back to top |
|
 |
sade
Joined: 27 Mar 2006 Posts: 1
|
Posted: Mon Mar 27, 2006 11:05 am Post subject: |
|
|
| Quote: | | With SSE you can process 8 8 bit quantities or 4 16 bit quantities |
Does is it really make sense to use SSE when you can process the same number of pixels with MMX. After all SSE registers are wider and the instructions should therefor be slower, no?
| Quote: | | In RemoveGrain I would say that the average gain is about 8 times for SSE, 9 times for SSE2 and 13 times for SSE3 on my cpu (Prescott P4 design). |
does the SSE3 improvement only come through lldqu or do you use any of the floating point instructions? |
|
| Back to top |
|
 |
|