Opencv Canny 源码解析

void cv::Canny( const Mat& image, Mat& edges,
                double threshold1, double threshold2,
                int apertureSize, bool L2gradient )
{
    Mat src = image;
    edges.create(src.size(), CV_8U);
    CvMat _src = src, _dst = edges;
    cvCanny( &_src, &_dst, threshold1, threshold2,
        apertureSize + (L2gradient ? CV_CANNY_L2_GRADIENT : 0));
}

这里使用的代码是opencv 2.2 版本的代码,也是最早的代码,与最新版本的代码相比,基本思想不变,最重要的是代码结构简单,是最容易理解的版本。 本文主要是对代码进行了行间注释,解释canny 算法的c++实现细节。

阅读前提:

  • 已经理解canny 算法的基本原理,包括sobel 算子获取梯度图,canny 算法使用的极大值抑制与滞后阈值。
  • 希望了解opencv 的c++ 算法实现。

源码地址:https://github.com/opencv/opencv/blob/2.2/modules/imgproc/src/canny.cpp

canny函数接口

canny函数 调用了cvCanny, cvCanny 是主要的函数。 默认计算梯度幅值,使用L1范数,如果使用L2范数,实际的sobel 算子 大小= apertureSize + CV_CANNY_L2_GRADIENTCV_CANNY_L2_GRADIENT 的值为16

为了减少计算复杂度,使用L1范数更快。

void cv::Canny( const Mat& image, Mat& edges,
                double threshold1, double threshold2,
                int apertureSize, bool L2gradient )
{
    Mat src = image;
    edges.create(src.size(), CV_8U);
    CvMat _src = src, _dst = edges;
    cvCanny( &_src, &_dst, threshold1, threshold2,
        apertureSize + (L2gradient ? CV_CANNY_L2_GRADIENT : 0));
}

cvCanny函数签名

CV_IMPL void cvCanny( const void* srcarr, void* dstarr,
                      double low_thresh, double high_thresh,
                      int aperture_size )

CV_IMPL: 这是一个宏定义,值为:extern "C", 在 C++ 中,函数名会被编译器进行符号重载,即会根据函数参数的类型和数量生成不同的符号名。然而,C 函数没有符号重载的概念,因此在 C++ 代码中调用 C 函数时,需要使用 extern "C" 来告诉编译器使用 C 的链接规则来编译和链接这个函数。这样可以保证 C++ 代码和 C 代码之间的函数调用正确地进行。

cvCanny 函数的参数:

  • srcarr: 输入图片的指针

  • dstarr: 保存输出图片的指针

  • low_thresh: 低阈值,在cvCanny函数中,有一个步骤叫做极大值抑制,它处理的是梯度幅值图(稍后解释),其中它会将低于阈值的像素标记为非边缘,注意极大值抑制不止是标记像素,极大值抑制还有很多其他的步骤, 这里只是稍微解释low_thresh参数的一点含义。

  • high_thresh: 高阈值,同理,Canny算法处理过程中,会将高于该值的像素标记轮廓,而处于低阈值与高阈值中间的值标记为弱轮廓。注意:canny 算法最后输出的轮廓点包含:轮廓点与经过滞后阈值(稍后解释)的弱轮廓点.

  • aperture_size: sobel kernel size,一般是3x3的矩阵,soebel算法是为了得到梯度图。

接下来都是cvCanny函数的实现细节

获取梯度图

// get the src size
size = cvGetMatSize( src );

// horizontal gradient
dx = cvCreateMat( size.height, size.width, CV_16SC1 );
// vertical gradient
dy = cvCreateMat( size.height, size.width, CV_16SC1 );
// get the horizontal gradient
cvSobel( src, dx, 1, 0, aperture_size );
// get the vertical gradient
cvSobel( src, dy, 0, 1, aperture_size );

环形缓存区与边缘点缓存(map)

定义了3个缓存区:mag_buf[0]mag_buf[1]mag_buf[2],之所以说它们是环形缓存区后面再解释。 边缘点缓存(map),是用于保存轮廓点映射,在极大值抑制与滞后阈值中会使用,后面再具体解释。

这里贴出了代码,并进行了较为详细注释。

// allocates buffer, because 2 padding added, so size.width and size.height should add 2
buffer.allocate( (size.width+2)*(size.height+2) + (size.width+2)*3*sizeof(int) );

// First convert it to a char* type, and then convert it to an int* type.
mag_buf[0] = (int*)(char*)buffer;

// mag_buf[1] points to the second gradient cache, and the width is increased by 2 here because the gradient map needs to be padded with 1 on the left and right sides.
mag_buf[1] = mag_buf[0] + size.width + 2;

// Similarly, mag_buf[2] points to the third cache.
mag_buf[2] = mag_buf[1] + size.width + 2;

// Get the end position of the mag_buf[2] cache and then convert it to a uchar* type. The area pointed to by the pointer will be used for mapping contour points in the future. The mapped values ​​are 0, 1, and 2, which represent  weak contour, non-contour, and strong contour, respectively.
map = (uchar*)(mag_buf[2] + size.width + 2);

// represents map area width
mapstep = size.width + 2;

// 1 << 10 represents 1024
maxsize = MAX( 1 << 10, size.width*size.height/10 );
// the stack is used to restore strong contour points, and the stack used for Hysteresis threshold
stack.resize( maxsize );
// &stack[0] is the stack[0] pointer
stack_top = stack_bottom = &stack[0];

// mag_buf[0] is the first gradient cache, and (size.width+2)*sizeof(int) bytes is set 0
memset( mag_buf[0], 0, (size.width+2)*sizeof(int) );

// Consider the area pointed by th map pointer. the area's fisrt row and last row set to 1 which represents non-contour
memset( map, 1, mapstep );
memset( map + mapstep*(size.height + 1), 1, mapstep );

极大值抑制,边缘点入栈

入栈与出栈宏定义:d 代表着 轮廓点的地址。

#define CANNY_PUSH(d)    *(d) = (uchar)2, *stack_top++ = (d)
#define CANNY_POP(d)     (d) = *--stack_top

接下来继续介绍实现细节,这里贴出了代码,并进行了较为详细注释。

// calculate magnitude and angle of gradient, perform non-maxima supression.
// fill the map with one of the following values:
//   0 - the pixel might belong to an edge
//   1 - the pixel can not belong to an edge
//   2 - the pixel does belong to an edge
for( i = 0; i <= size.height; i++ )
{    
    // get the latest mag_buf, firstly is mag_buf[1], later is magbuf[2]
    int* _mag = mag_buf[(i > 0) + 1] + 1;
    // there float and int have same bytes
    float* _magf = (float*)_mag;
    // get dx i-th row pointer
    const short* _dx = (short*)(dx->data.ptr + dx->step*i);
    // get dy i-th row pointer
    const short* _dy = (short*)(dy->data.ptr + dy->step*i);
    uchar* _map;
    int x, y;
    ptrdiff_t magstep1, magstep2;
    // in the same row , if The previous one pixel belong to an edge, then the flag is 1
    int prev_flag = 0;

    if( i < size.height )
    {
        // in i-th row, first and last pixel set to 0
        _mag[-1] = _mag[size.width] = 0;

        if( !(flags & CV_CANNY_L2_GRADIENT) )
            for( j = 0; j < size.width; j++ )
                // calculate the magnitude 
                _mag[j] = abs(_dx[j]) + abs(_dy[j]);
        /*else if( icvFilterSobelVert_8u16s_C1R_p != 0 ) // check for IPP
        {
            // use vectorized sqrt
            mag_row.data.fl = _magf;
            for( j = 0; j < size.width; j++ )
            {
                x = _dx[j]; y = _dy[j];
                _magf[j] = (float)((double)x*x + (double)y*y);
            }
            cvPow( &mag_row, &mag_row, 0.5 );
        }*/
        else
        {
            for( j = 0; j < size.width; j++ )
            {
                x = _dx[j]; y = _dy[j];
                _magf[j] = (float)std::sqrt((double)x*x + (double)y*y);
            }
        }
    }
    else
        // when reach the bottom, padding 0 to magbuf[2]
        memset( _mag-1, 0, (size.width + 2)*sizeof(int) );

    // at the very beginning we do not have a complete ring
    // buffer of 3 magnitude rows for non-maxima suppression
    if( i == 0 )
        continue;
    // get pointer in map which point to i-th row and 1 col pixel. add one is because padding 1 in map
    _map = map + mapstep*i + 1;
    // first and last pixel set 1 which means non-contour or non-edge
    _map[-1] = _map[size.width] = 1;
    
    // the central row , previous is mag_buf[0] the next one is mag_buf[2]
    _mag = mag_buf[1] + 1; // take the central row
    // get the previous row pointer  of dx , the current one is i-th row, but the (i-1)-th row perform non-maxima supression
    _dx = (short*)(dx->data.ptr + dx->step*(i-1));
    _dy = (short*)(dy->data.ptr + dy->step*(i-1));
    
    // calculate distance, magstep1 is the distance next row to current row in the   same col。magstep2 is current row to previous row in the same col
    magstep1 = mag_buf[2] - mag_buf[1];
    magstep2 = mag_buf[0] - mag_buf[1];

    if( (stack_top - stack_bottom) + size.width > maxsize )
    {
        int sz = (int)(stack_top - stack_bottom);
        maxsize = MAX( maxsize * 3/2, maxsize + 8 );
        stack.resize(maxsize);
        stack_bottom = &stack[0];
        stack_top = stack_bottom + sz;
    }

    for( j = 0; j < size.width; j++ )
    {
        #define CANNY_SHIFT 15
        // TG22 is Integer type, it is convenient for calculation, represent 22.5 degrees
        #define TG22  (int)(0.4142135623730950488016887242097*(1<<CANNY_SHIFT) + 0.5)

        x = _dx[j];
        y = _dy[j];
        // if x and y are same sign s = 1, else s = 0
        int s = x ^ y;
        int m = _mag[j];

        x = abs(x);
        y = abs(y);
        if( m > low )
        {   
            int tg22x = x * TG22;
            int tg67x = tg22x + ((x + x) << CANNY_SHIFT);

            y <<= CANNY_SHIFT;
            // y < tg22x represent angle of gradient more than (360-22.5)) degrees and less than 22.5 degrees. more than (180-22.5) degrees and less than (180+22.5) degrees
            if( y < tg22x )
            {
                // Compare magnitude in the 0-degree direction
                if( m > _mag[j-1] && m >= _mag[j+1] )
                {    
                    // !prev_flag and _map[j-mapstep]!= 2 make nearby pixels Both are edges, It reduces the workload of hysteresis threshold.
                    if( m > high && !prev_flag && _map[j-mapstep] != 2 )
                    {
                        // push postion of (_map+j) into stack, it is edges, it will set 2
                        CANNY_PUSH( _map + j );
                        prev_flag = 1;
                    }
                    else
                        // maybe a edge
                        _map[j] = (uchar)0;
                    // if prev_flag == 1,  next time !prev_flag != 1
                    continue;
                }
            }
            // y > tg67x represent angle of gradient more than (90-22.5) degrees less than (90+22.5) degrees. more than (270-22.5) degress and less (270+22.5) degrees.
            else if( y > tg67x )
            {
                // Compare magnitude in the 90-degree direction
                if( m > _mag[j+magstep2] && m >= _mag[j+magstep1] )
                {
                    if( m > high && !prev_flag && _map[j-mapstep] != 2 )
                    {
                        CANNY_PUSH( _map + j );
                        prev_flag = 1;
                    }
                    else
                        _map[j] = (uchar)0;
                    continue;
                }
            }
            else
            // when s == -1, represent angle of gradient  is between the [135-22.5, 135+22.5] or [315-22.5, 315+22.5]
            // when s == 1, represent angle of gradient  is between the [45-22.5, 45+22.5] or [225-22.5, 225+22.5]
            {
                s = s < 0 ? -1 : 1;
                // Compare magnitude in the 45-degree or 135-degree direction
                if( m > _mag[j+magstep2-s] && m > _mag[j+magstep1+s] )
                {
                    if( m > high && !prev_flag && _map[j-mapstep] != 2 )
                    {
                        CANNY_PUSH( _map + j );
                        prev_flag = 1;
                    }
                    else
                        _map[j] = (uchar)0;
                    continue;
                }
            }
        }
        // this loop no edge, set flag 0 , so next time !prev_flag will be true
        prev_flag = 0;
        // 1 represent not a edge
        _map[j] = (uchar)1;
    }

    // scroll the ring buffer
    _mag = mag_buf[0];
    mag_buf[0] = mag_buf[1];
    mag_buf[1] = mag_buf[2];
    mag_buf[2] = _mag;
}

极大值抑制采用了循环缓存,分别是mag_buf[0] mag_buf[1] mag_buf[2], 并会进行循环移动,保证mag_buf[1]是用来执行极大值抑制的行。 并在边缘梯度周围补0,这个解释了 一开始mag_buf[0] 设置为0, 循环到最后时,mag_buf[2]也设置为0 另外缓存的第一个元素与最后一个元素都是0。

!prev_flag && _map[j-mapstep] != 2 这个判断上次循环已经当前列的上一行是否已经存在边缘,如果存在,那么这个位置的像素就不算边缘,这样可以减少滞后阈值的计算量,因为滞后阈值会判断当前像素的周围是否标记为0的像素,如果存在,那么也记为边缘。为了减少重复计算,这里极大值抑制就通过这种方式减少了入栈的像素,以此来减少计算量。

这里判断梯度方向时,会将梯度方向判定为[0, 45, 90, 135]之一,用如下图表示:

滞后阈值

滞后阈值的的作用是将一些可能是边缘点的像素设置为边缘点,这可以连接一些间断了的边缘,提高边缘检测的鲁棒性。

滞后阈值的代码比较简短,主要是将边缘点出栈,然后判断改点8个方向上是否存在标记为0的边缘点,如果存在,那么该点也入栈,并标记为边缘点,一直到栈为空。


// now track the edges (hysteresis thresholding)
while( stack_top > stack_bottom )
{
    uchar* m;
    if( (stack_top - stack_bottom) + 8 > maxsize )
    {
        int sz = (int)(stack_top - stack_bottom);
        maxsize = MAX( maxsize * 3/2, maxsize + 8 );
        stack.resize(maxsize);
        stack_bottom = &stack[0];
        stack_top = stack_bottom + sz;
    }

    CANNY_POP(m);

    if( !m[-1] )
        CANNY_PUSH( m - 1 );
    if( !m[1] )
        CANNY_PUSH( m + 1 );
    if( !m[-mapstep-1] )
        CANNY_PUSH( m - mapstep - 1 );
    if( !m[-mapstep] )
        CANNY_PUSH( m - mapstep );
    if( !m[-mapstep+1] )
        CANNY_PUSH( m - mapstep + 1 );
    if( !m[mapstep-1] )
        CANNY_PUSH( m + mapstep - 1 );
    if( !m[mapstep] )
        CANNY_PUSH( m + mapstep );
    if( !m[mapstep+1] )
        CANNY_PUSH( m + mapstep + 1 );
}

构造最后的边缘图片

// the final pass, form the final image
for( i = 0; i < size.height; i++ )
{
    const uchar* _map = map + mapstep*(i+1) + 1;
    uchar* _dst = dst->data.ptr + dst->step*i;
    
    for( j = 0; j < size.width; j++ )
        _dst[j] = (uchar)-(_map[j] >> 1);
}

_map[j] 如果是0, (_map[j] >> 1) 还是0, -(_map[j] >> 1) 是0, (uchar)-(_map[j] >> 1) 是0 _map[j] 如果是1,(_map[j] >> 1) 还是0,-(_map[j] >> 1) 是0, (uchar)-(_map[j] >> 1) 是0 _map[j] 如果是2,(_map[j] >> 1) 是1,-(_map[j] >> 1) 是-1, (uchar)-(_map[j] >> 1) 是255

所以只有标记为2的像素会以225的值输出。