 |
videoprocessing Video Processing based on Avisynth
|
| View previous topic :: View next topic |
| Author |
Message |
Boulder
Joined: 19 Aug 2005 Posts: 15
|
Posted: Sun Sep 09, 2007 2:04 pm Post subject: Optimizing MVTools cont'd |
|
|
| kassandro wrote: | | Didée wrote: | Page layout is broken for me, too. Everything after kassandro's last post is squeezed at the right border. Looks quite funny.
|
The page layout is broken with both, Firefox and IE. May be we should try a new thread. | Here's the new thread..
| Quote: |
If you look at the MVDegrain1 source, the code for handling block overlapping is already SSE optimized. On the other hand, the code for temporal averaging of the blocks is not yet SSE optimised. Obviously the code for handling overlapping blocks was simply taken from the FFT3DFilter source. | Is there any chance you could take a look to find any quick-and-easy improvements? It would also be useful in FFT3DFilter if the source is actually pretty much the same.
Also, what do you guys think about the "divide" feature? I really don't know what to say about it and haven't done any testing yet.
| Code: | divide: post-processing motion vectors by dividing every block into 4 subblocks.
0 - do not divide;
1 - divide blocks and assign the original vector to all 4 subblocks;
2 - divide blocks and assign median (with 2 neighbors) vectors to subblocks;
Default = 0. Block size and overlap values must be selected to be acceptable after internal dividing. |
|
|
| Back to top |
|
 |
Didée
Joined: 02 Jul 2005 Posts: 46
|
Posted: Sun Sep 09, 2007 7:24 pm Post subject: |
|
|
Well ... ask Fizick for a more verbose explanation.
From what's written in the docs, it seems to make little sense to me. What are those 2 neighbors? Ex.: there's a 16x16 block with one vector. If you split this one up in four 8x8 blocks, each with the same vector, then
- median filtering each vector with the vectors of two opposed neighbours is a no-op, because each vector is surrounded by enough "same" vectors (from the splitting) so that a 3-element median will change nothing at all.
- median filtering each vector with the vecors of two perpendicular neighbours (from neighboured original 16x16 blocks, obviously) is pretty strong: you could easily lose good vectors that are following small objects.
Generally, it's not a bad idea at all. Basically this is sort of "denoising the vector field". I thought about this method already like two years ago, but never asked about such a feature...
(...since most times I suggest something for MVTools, Fizick concludes "no-no, we must not do so.") ;)
Probably it would require a 2-pass motion estimation ... do the initial motion search, denoise the vector field, then do an aditional pass either to check the SAD (or whatever metric) of the "denoised" vectors. Or do another small-range motion search with the denoised vector as additional predictor - a definite weakness of the current motion search is that predictors during motion search are "only" coming from the upper-left neighbourhood (because the whole ME is working its way from topleft to bottomright). But it's a dicegame if that particular piece of neighborhood can deliver useable prediction ... often enough it can not. 2-pass could improve on this problem, but ofcourse it would be noticeably slower.
However: Brainstorming about this topic is tricky -- the matter is full of pitfalls, everywhere. What sounds like a good idea in theory, might easily reveal as desastreous once you try it out.  |
|
| Back to top |
|
 |
kassandro Site Admin
Joined: 17 May 2005 Posts: 255
|
Posted: Mon Sep 10, 2007 1:53 am Post subject: Re: Optimizing MVTools cont'd |
|
|
| Boulder wrote: | | kassandro wrote: |
If you look at the MVDegrain1 source, the code for handling block overlapping is already SSE optimized. On the other hand, the code for temporal averaging of the blocks is not yet SSE optimised. Obviously the code for handling overlapping blocks was simply taken from the FFT3DFilter source. | Is there any chance you could take a look to find any quick-and-easy improvements? It would also be useful in FFT3DFilter if the source is actually pretty much the same.
|
No, I am not really good in hacking the code of other people. In all my Avisynth filters I never copied a single line of code from somebody else. I simply have my own style. It is also not clear if the improvement suggested above is significant. If most of the time is spent for searching motion vectors, it wouldn't. Roughly it would be as useful as Fizick's SSE optimisation of the treatment of overlapping blocks. Perhaps Fizick has included this code only, because he simply could copy it FFT3DFilter. Besides the handling of overlapping blocks, FFT3DFilter is very different from MVDegrain1/2.
| Quote: |
Also, what do you guys think about the "divide" feature? I really don't know what to say about it and haven't done any testing yet.
| Code: | divide: post-processing motion vectors by dividing every block into 4 subblocks.
0 - do not divide;
1 - divide blocks and assign the original vector to all 4 subblocks;
2 - divide blocks and assign median (with 2 neighbors) vectors to subblocks;
Default = 0. Block size and overlap values must be selected to be acceptable after internal dividing. |
|
Of course, this is just another attempt to deal with the discontinuity of motion vectors. This is of course a fundamental problem and so far it is dealt with by averaging overlapping blocks after motion compensation or motion compensated denoising.
| Didée wrote: |
- median filtering each vector with the vecors of two perpendicular neighbours (from neighboured original 16x16 blocks, obviously) is pretty strong: you could easily lose good vectors that are following small objects.
|
This is the only reasonable possibility for the "divide" feature, I can think of and of course you are right. For the very modest gain of continuity you loose some quality of the motion vectors. Sometimes discontinuity of motion vectors is simply caused by a poor search. However, discontinuity may also be quite natural, when one block belongs to the static background and the other to the moving foreground. Personally I only like motion vectors in the context of compression. There the discontinuity is not essential. Even bad motion vectors only hurt compression quality but not image quality. Otherwise motion compensation is an algorithmic mess without really good ideas. For me as a mathematician the beauty and logic of an algorithm is much more important than its usefulness and from that point of view I do not want to deal with this subject. |
|
| Back to top |
|
 |
Boulder
Joined: 19 Aug 2005 Posts: 15
|
Posted: Tue Oct 02, 2007 7:11 pm Post subject: |
|
|
| Another question about MVTools and overlapping - do you think that a 1/4 overlap is enough or should it be 1/2, or even leave it out? I currently use 16px block size and 4px for overlapping, 8px overlap is simply too slow for my purposes, at least until I can get a dual-core CPU some day. |
|
| Back to top |
|
 |
kassandro Site Admin
Joined: 17 May 2005 Posts: 255
|
Posted: Wed Oct 03, 2007 7:41 am Post subject: |
|
|
| Boulder wrote: | Another question about MVTools and overlapping - do you think that a 1/4 overlap is enough or should it be 1/2, or even leave it out?
|
As a non-user I am probably not really qualified to answer this question. On the other hand, overlapping is the only idea to deal with the discontinuity of motion vectors. For frame interpolation and any other situation, where smooth motion compensation is necessary, it is obviously a very good idea. I personally wouldn't use it for denoising. Rather I would prefer more aggressive, unlimited temporal cleaning in static areas and apply only spatial denoising in motion areas. That's what I tried with RemoveDirt.
| Quote: |
I currently use 16px block size and 4px for overlapping, 8px overlap is simply too slow for my purposes, at least until I can get a dual-core CPU some day. |
If you use 8px overlap you have to handle roughly 1.5*1.5 = 2.25 more blocks. In addition you have roughly twice as much overlapping work per block. Thus using 8px overlap should result in approximately 4.5 times the processing time for handling overlapping and 2.25 times the processing time for searching motion vectors. As the search for motion vectors is likely the dominant factor, you should not have more than, say, 2.5 times the processing time if you use 8px overlap.
I don't know how a multi-core CPU could be helpful here. To use a multi-core CPU for a processing task, the task has to be split into several smaller tasks, which can be handled independently from each other. However, for searching motion vectors, the motion vectors of the neighboring blocks, which have been computed before, must be used for starting the search. That really requires fully sequential processing and prohibits task division here. Of course, one can simply split the frame into two halves and process them independently. But that's a very lousy idea here, because it causes severe problems along the split line, which has to be treated as a boundary line in both half frames. |
|
| Back to top |
|
 |
Boulder
Joined: 19 Aug 2005 Posts: 15
|
Posted: Wed Oct 03, 2007 9:28 am Post subject: |
|
|
| kassandro wrote: | | I don't know how a multi-core CPU could be helpful here. | A multi-core CPU would help a lot since I could run two encoder processes simultaneously as I often have more than one video on the queue. Currently I get a slight performance gain on my HT-capable processor, but having two separate cores do the job should provide much better performance. Then again, I shouldn't complain..my scripts run rather fast compared to some stuff I've seen. Sometimes I hit 6-7fps on progressive material, and 2-3fps on interlaced. |
|
| Back to top |
|
 |
kassandro Site Admin
Joined: 17 May 2005 Posts: 255
|
Posted: Fri Oct 05, 2007 8:56 am Post subject: |
|
|
| HT should be less efficient, if you switch all the time between two different processes rather than two different threads of the same process. In that respect a dual core CPU is better. There each CPU has its own cache, perhaps even its own memory management unit. Actually HT would also make sense for multicore CPUs to use idle parts of the single CPUs for additional process threads. I am very satisfied with my cheap Athlon X2 system. I had only crashes, when I used the video overlay - obviously a bug in the NVIDIA driver. Otherwise the system is extremely robust compared with my old P4 board with VIA chipset. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|