sakurai-AC2009

0

No comments posted yet

Comments

Slide 1

RaVioli: A Parallel Video Processing Library with Auto Resolution Adjustability Hiroko SAKURAI† Masaomi OHNO† Shintaro OKADA‡ Tomoaki TSUMURA† Hiroshi MATSUO† † Nagoya Institute of Technology, Japan ‡ Toyota Motor Corp., Japan IADIS International Conference APPLIED COMPUTING 2009 November 19 – 21, 2009 Rome, Italy

Slide 2

Background(1/2): Portability of Video Applications Real-time video processing applications should run on a great variety of platforms Cell phones Cars PCs Principal goal of an application Long battery life High throughput Good accuracy Applied Computing 2009 2

Slide 3

Background(2/2): Many-Core Era is Coming Multi/Many-core processors have come into wide use Video processing applications have various parallelisms Pixels in video frames have data parallelism Multiple frames can be processed in parallel by pipelining promise good performance on such parallel systems Applied Computing 2009 3

Slide 4

A Video Processing Library: RaVioli RaVioli provides: Easy writeability of pseudo real-time video processing Interfaces for parallelization Detecting data dependencies and formulating reductions Balancing loads of pipeline stages Applied Computing 2009 4

Slide 5

Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results Applied Computing 2009 5

Slide 6

Traditional Image Processing Program Image processing program written by traditional C Applied Computing 2009 6 void main{ // Input image int luma; for(int y=0;y<180;y++){   for(int x=0;x<200;x++){ luma = (int)( InImg[x][y].R*0.299    +InImg[x][y].G*0.587    +InImg[x][y].B*0.114);   OutImg[x][y].R = luma; OutImg[x][y].G = luma; OutImg[x][y].B = luma;   } } } InImg OutImg

Slide 7

Image Processing Program with RaVioli Grayscale program using RaVioli Applied Computing 2009 7 RV_Image OutImg Higher-oder method procPix RV_Pixel GrayScale(RV_Pixel Pix){  int luma;  luma=(int)(    Pix.R()*0.299    +Pix.G()*0.587    +Pix.B()*0.114);  return(Pix.setRGB(luma, luma, luma)); } void main(){ RV_Image InImg,OutImg; // Input image OutImg=InImg.procPix(GrayScale); } Component function RV_Image InImg

Slide 8

Video Processing Program with RaVioli Video processing program with RaVioli Applied Computing 2009 8 RV_Image obj Higher-oder method

Slide 9

Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results Applied Computing 2009 9

Slide 10

Auto-Adjustment of Computation Load Spatial resolution (pixel rate) Ss: Spatial stride Temporal resolution (frame rate) St: Temporal stride Applied Computing 2009 10 Ss=1 Ss=2 St=1 St=2

Slide 11

Priority Set Which stride should be increased? (Spatial resolution, Temporal resolution)= (7,3) : keep spatial stride and temporal stride in the ratio of “3:7” (1,0) : keep spatial stride “1” Applied Computing 2009 11 We can specify resolution priorities by priority set Ss=1 Ss=2 St=1 St=2

Slide 12

Detecting Overload Applied Computing 2009 12 RV_Video class Frame interval Processing time < Overloaded! Image Processing program

Slide 13

Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 13

Slide 14

Parallelization: Block Decomposition Image processing with c/c++ Image processing with RaVioli RV_Pix GrayScale(RV_Pix Pix){ int Y; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_Img InImg, OutImg; OutImg = InImg.procPix(GrayScale); } void main(){ byte InImg[180][200]; byte OutImg[180][200]; for( int y=0; y<180; y++ ){ for( int x=0; x<200; x++ ){ OutImg[x][y]=(int)( InImg[x][y].R*0.299 +InImg[x][y].G*0.587 +InImg[x][y].B*0.114); } } }

Slide 15

Parallelization: Block Decomposition Image processing with RaVioli RV_Pix GrayScale(RV_Pix Pix){ int Y; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_Img InImg,OutImg; OutImg = InImg.procPix(GrayScale); } thread1 thread2 thread4 thread3 OutImg = InImg.procPix(GrayScale, 4); InImg

Slide 16

Translator for Block Decomposition Reduction operations may be required Applied Computing 2009 16 RV_Pix GrayScale(RV_Pix Pix){ int Y; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return(Pix.setRGB(Y, Y, Y) ); } void main(){ RV_Img InImg,OutImg; OutImg = InImg.procPix(GrayScale); } Translator RV_Pix GrayScale(RV_Pix Pix){ int Y; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_Img InImg,OutImg; OutImg = InImg.procPix(GrayScale, 4); } parallelize

Slide 17

for Reference: Example Code with OpenMP OpenMP Standardized model of parallel programming for C/C++ and FORTRAN sum Reduction pragma reduction(+:sum)

Slide 18

Reduction Op.s can be Automatically Added Applied Computing 2009 18 int sum = 0; void pixSum(RV_Pixel p){ sum += 1; } int main(){ RV_Image InputImg; //read image data in “InputImg” InputImg.procPix(pixSum); } sum += 1; _localsum+=1; sum+= _localsum; sum += 1 associative law ? commutative law ? Reduction operation _localsum += 1; inputImg.reduction(__pixSum); __thread int _localsum = 0; Component function void __pixSum(int threadNum) { mutex_lock(&Mutex); sum += _localsum; mutex_unlock(&Mutex); } InputImg.procPix(pixSum, 4); associative law OK! commutative law OK!

Slide 19

Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 19

Slide 20

Assisting Pipeline Implementation For building pipeline Whole process is split into several stages Several threads are created and assigned to the stages FIFOs are needed to be implemented and managed for data transfer between stages Applied Computing 2009 20 binarize edge detect hough trans thread1 thread2 thread3

Slide 21

Interface for Pipelining Applied Computing 2009 21 RV_Pipedata* GrayScale(RV_Pipedata* data){ // Grayscale processing for a frame return data; } RV_Pipedata* Laplacian(RV_Pipedata* data){ // Laplacian filter processing for a frame return data; } int main (){ RV_Pipeline pipe; pipe.push(GrayScale); pipe.push(Laplacian); pipe.run(); return 0; } thread1 thread2 push GrayScale Laplacian run RV_Pipeline pipe

Slide 22

Load Imbalance between Stages Applied Computing 2009 22 A B thread1 thread2 thread3 A B A B A B C thread1 thread2 thread3 ・ ・ ・ ・ ・ ・ ・ ・ ・ C C frame1 frame2 frame3 C 1 2 3 Pipeline stalls

Slide 23

Automatic Load Balancing Applied Computing 2009 23 thread1 thread2 thread3 frame1 frame2 frame3 A B C thread1 thread2 thread3 ・ ・ ・ B ・ ・ ・ thread1 C thread3 C thread2

Slide 24

Automatic Load Balancing Applied Computing 2009 24 thread1 thread2 thread3 A B A B A B frame1 frame2 frame3 A thread1 ・ ・ ・ ・ ・ ・ B thread1 C thread3 C thread2 C C C 1 2 3

Slide 25

Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic parallelization with block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 25

Slide 26

Evaluation: Resolution Adjustment 26 Spatial resolution : Temporal resolution 0:1 1:0 3:7 frame rate(fps) Number of pixels Priority set

Slide 27

Evaluation: Parallelization Functions Applied Computing 2009 27

Slide 28

Evaluation: Auto Block Decomposition Applied Computing 2009 28 hough pixAverage laplacian voronoi

Slide 29

Evaluation: Hough transform 29

Slide 30

Evaluation: Automatic load balancing 30

Slide 31

Conclusion RaVioli hides resolutions from programmers pseudo real-time processing has semi-automatic parallelization functions semi-automatic block decompotision load balancing mechanism between pipeline stages Our future works implementing automatic power-saving function to RaVioli making RaVioli adaptive to various platforms such as Cell Broadband Engine designing easy-to-write language which cooperates with RaVioli Applied Computing 2009 31

Summary: Hiroko Sakurai, Masaomi Ohno, Shintaro Okada, Tomoaki Tsumura, Hiroshi Matsuo: "RaVioli: a Parallel Video Processing Library with Auto Resolution Adjustability (full paper)", Proc. IADIS Int'l. Conf. Applied Computing 2009 (AC2009), Rome, Italy, Vol.1, pp.321-329 (Nov. 2009)

URL:
More by this User
Most Viewed