videoprocessing Forum Index videoprocessing
Video Processing based on Avisynth
 
  FAQFAQ    SearchSearch    MemberlistMemberlist    UsergroupsUsergroups    fchat fChat  RegisterRegister  
  ProfileProfile    Log in to check your private messages Log in to check your private messages    Log in Log in 

Speed gain od MMX, ISSE, SSEx

 
Post new topic   Reply to topic    videoprocessing Forum Index -> MMX/SSE/SSE2 Programming
View previous topic :: View next topic  
Author Message
Guest






PostPosted: Tue Oct 18, 2005 5:34 am    Post subject: Speed gain od MMX, ISSE, SSEx Reply with quote

Can I imagine myself ( as user, not ASM coder but Z80 )
typical numbers (range ) of what is speed gain using
one of these CPU optimizations ?
I know, it has to depend on code, but just as rough imagine ?
Back to top
Clouded



Joined: 25 Jul 2005
Posts: 67

PostPosted: Tue Oct 18, 2005 8:38 am    Post subject: Reply with quote

kassandro is able to quote scarily precise speed up factors, so I will leave the main answer to him... but for what it's worth, in my last assembly project, x5 speed up. Range varies heavily on either side of that...
_________________
a.k.a. mg262
Back to top
View user's profile Send private message
kassandro
Site Admin


Joined: 17 May 2005
Posts: 255

PostPosted: Sun Oct 23, 2005 8:45 pm    Post subject: Reply with quote

The speed gain of using SSE/SSE2/SSE3 depends very much on the subject and probably to a lesser extent also on the CPU.
With SSE you can process 8 8 bit quantities or 4 16 bit quantities simultaneously. With SSE2/SSE3 you can process 16 8 bit quantities or 8 16 bit quantities simultaneously, but there are some alignment problems, which can be very costly especially if you have only SSE2 and not SSE3.
There is one further advantage of SSE/SSE2/SSE3. Namely that programming is essentially jump free. This is also a disadvantage, if the code branches a lot. In C or normal assembler code, only one alternative of a branch is executed. In SSE/SSE2/SSE3 programming the CPU has always to execute both, because for some bytes one alternative has to be chosen, while for the other bytes the other alternative is necessary. In RemoveGrain I would say that the average gain is about 8 times for SSE, 9 times for SSE2 and 13 times for SSE3 on my cpu (Prescott P4 design).
Back to top
View user's profile Send private message
Libor aka Poutnik
Guest





PostPosted: Sun Oct 23, 2005 10:05 pm    Post subject: Reply with quote

Thank you for the answer. I have already thought
it could be very fine work in SSE optimization tuning
in "SSE hostile code", where speed gain is not worthy enough for such coder effort.
Back to top
fish14765



Joined: 14 Dec 2005
Posts: 5

PostPosted: Wed Dec 14, 2005 9:44 pm    Post subject: Hmmm... Reply with quote

kassandro wrote:
The speed gain of using SSE/SSE2/SSE3 depends very much on the subject and probably to a lesser extent also on the CPU.


Could someone explain this bit to me...
_________________
uk gambling casinos online
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kassandro
Site Admin


Joined: 17 May 2005
Posts: 255

PostPosted: Fri Dec 30, 2005 10:07 am    Post subject: Re: Hmmm... Reply with quote

fish14765 wrote:
kassandro wrote:
The speed gain of using SSE/SSE2/SSE3 depends very much on the subject and probably to a lesser extent also on the CPU.


Could someone explain this bit to me...


MMX/SSE programming makes sense, when the input and output data are properly aligned and when no random memory lookup is needed. The quality of such an implementations depend on how much branchin is in the code. Fortunately jumps involving min and max operations do not cause branching in SSE programming. That's why SSE programming works so well for RemoveGrain etc..
CPU dependence is moderate for MMX/SSE, but very significant for SSE2/SSE3. The AMD implementation of SE2/SSE3 is very poor. Even the slowest CeleronD handily beats the fasted AMD cpu. I therefore recommend to use the SSE plugins for AMD processors even if the cpu supports SSE2 or SSE3.
Back to top
View user's profile Send private message
sade



Joined: 27 Mar 2006
Posts: 1

PostPosted: Mon Mar 27, 2006 11:05 am    Post subject: Reply with quote

Quote:
With SSE you can process 8 8 bit quantities or 4 16 bit quantities

Does is it really make sense to use SSE when you can process the same number of pixels with MMX. After all SSE registers are wider and the instructions should therefor be slower, no?

Quote:
In RemoveGrain I would say that the average gain is about 8 times for SSE, 9 times for SSE2 and 13 times for SSE3 on my cpu (Prescott P4 design).

does the SSE3 improvement only come through lldqu or do you use any of the floating point instructions?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    videoprocessing Forum Index -> MMX/SSE/SSE2 Programming All times are GMT + 1 Hour


Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum