Case Study: How we made Canny edge detector algorithm run faster? (part 1)

Canny edge detector algorithm is a famous algorithm in image processing. One of our customers was inquiring about the performance of that algorithm, so our performance engineers took a look.

As always when dealing with unfamiliar codebases, we didn’t know what to expect. Some algorithms have a lot of room for performance improvements, others don’t. The implementation we got was a C implementation with a lot of obsolete constructs (loops running over pointers, functions without return types, etc.) that you don’t see a lot nowadays.

In this post we will talk about the modifications we did to the serial version of Canny. With these changes the hardware can execute the code faster and the compiler can optimize it better. We didn’t use multiple CPU cores or offloading to the graphic card to speed it up, we leave that for our next post.

Profiling

The first step was to use a profiler to figure out which functions take a lot of time. Here is the flamegraph of the Canny algorithm for a really large image (15360 x 8640 pixels).

Four functions clearly dominate the profile: gaussian_smooth, derivative_x_y, non_max_supp and apply_hysteresis. We didn’t know what the functions do, but we did know what are the key principles that the code must obey in order to be fast. So let’s dig into the code:

Analysis

Function gaussian_smooth

Here is the source code of the gaussian_smooth function (only relevant parts):

void gaussian_smooth(unsigned char *image, int rows, int cols, float sigma,
    short int **smoothedim)
{
    int r, c, rr, cc,     /* Counter variables. */
        windowsize,        /* Dimension of the gaussian kernel. */
        center;            /* Half of the windowsize. */
    float *tempim,        /* Buffer for separable filter gaussian smoothing. */
        *kernel,        /* A one dimensional gaussian kernel. */
        dot,            /* Dot product summing variable. */
        sum;            /* Sum of the kernel weights variable. */

  ...

    /****************************************************************************
    * Blur in the x - direction.
    ****************************************************************************/
    if (CANNY_LIB_VERBOSE) printf("   Bluring the image in the X-direction.\n");
    for (r = 0; r<rows; r++){
        for (c = 0; c<cols; c++){
            dot = 0.0;
            sum = 0.0;
            for (cc = (-center); cc <= center; cc++){
                if (((c + cc) >= 0) ＆＆ ((c + cc) < cols)){
                    dot += (float)image[r*cols + (c + cc)] * kernel[center + cc];
                    sum += kernel[center + cc];
                }
            }
            tempim[r*cols + c] = dot / sum;
        }
    }

    /****************************************************************************
    * Blur in the y - direction.
    ****************************************************************************/
    if (CANNY_LIB_VERBOSE) printf("   Bluring the image in the Y-direction.\n");
    for (c = 0; c<cols; c++){
        for (r = 0; r<rows; r++){
            sum = 0.0;
            dot = 0.0;
            for (rr = (-center); rr <= center; rr++){
                if (((r + rr) >= 0) ＆＆ ((r + rr) < rows)){
                    dot += tempim[(r + rr)*cols + c] * kernel[center + rr];
                    sum += kernel[center + rr];
                }
            }
            (*smoothedim)[r*cols + c] = (short int)(dot*BOOSTBLURFACTOR / sum + 0.5);
        }
    }

    free(tempim);
    free(kernel);
}

Two loop nests are clearly visible: a loop nest that does the blurring in the x-direction (lines 18-30), and a loop nest that blurs in the y-direction (lines 36-48).

These are the rules of thumb to get good performance in this type of code:

When iterating over arrays or matrices, all the memory accesses should be sequential ¹. We want to avoid strided ² and random accesses ³ as much as possible.
The innermost loop should have a lot of iterations (high trip count). This will make more optimizations opportunities available to the compiler through vectorization, loop unrolling, loop interleaving or software pipelining, etc.
Move all loop-invariant computations outside of the loop. The smaller the loop body, the faster it will run.

Applying these principles will allow the hardware to do its job as fast as possible, also it will allow the compiler to generate the most efficient version of the code by vectorizing the innermost loop. When we say vectorization, we mean that the compiler will generate special SIMD instructions that are faster than the regular instructions because they process more than one piece of data with a single instruction.

Let’s see how our code relates to these principles. In the first loop nest (lines 18-30), doing blurring in the x-direction, all memory accesses are sequential. Inside the innermost loop there is a condition if (((c + cc) >= 0) && ((c + cc) < cols)) (line 23) that checks for memory out of bound access. This code is loop invariant; we can move it outside of the loop by changing the loop start and end conditions.

It is not clear what the trip count of the innermost loop (line 22) is by just looking at the code. When we measured on a real input image, the loop count was really low, only 5. This is too small of a number for the compiler to work on, but luckily, the outer loop that iterates over variable c has a high trip count. We can interchange the two loops (line 22 and line 19); by doing this we wouldn’t be introducing any non-sequential memory accesses. To make the interchange possible, however, we need to introduce a temporary array to store the intermediate results.

So, here is how our loop nest looks when transformed:

    float* dot_arr = calloc(cols, sizeof(float));
    float* sum_arr = calloc(cols, sizeof(float));

    for (r = 0; r < rows; r++) {
        memset(dot_arr, 0, cols * sizeof(float));
        memset(sum_arr, 0, cols * sizeof(float));
        for (cc = (-center); cc <= center; cc++) {
            for (c = MAX(0, -cc); c < MIN(cols, cols - cc); c++) {
                dot_arr[[c]] +=
                    (float)image[r * cols + (c + cc)] * kernel[center + cc];
                sum_arr[[c]] += kernel[center + cc];
            }
        }

        for (c = 0; c < cols; c++) {
            tempim[r * cols + c] = dot_arr[[c]] / sum_arr[[c]];
        }
    }

We introduced two new temporary arrays dot_arr and sum_arr (lines 1 and 2) to hold the temporary values. Please also note that we moved the innermost if condition outside of the loop, by changing the loop boundaries (line 8). This loop nest is now hardware and compiler friendly.

The second loop nest looks like this:

    for (c = 0; c<cols; c++){
        for (r = 0; r<rows; r++){
            sum = 0.0;
            dot = 0.0;
            for (rr = (-center); rr <= center; rr++){
                if (((r + rr) >= 0) ＆＆ ((r + rr) < rows)){
                    dot += tempim[(r + rr)*cols + c] * kernel[center + rr];
                    sum += kernel[center + rr];
                }
            }
            (*smoothedim)[r*cols + c] = (short int)(dot*BOOSTBLURFACTOR / sum + 0.5);
        }
    }

It looks similar to the previous loop nest, but it is much more inefficient. Access to the arrays tempim (line 7) and smoothedim (line 11) is with stride cols which is really bad for performance. The innermost loop (line 5) has a low trip count (same as the previous loop nest). And there is a condition inside the innermost loop that we can move outside of the loop (line 6).

We can safely interchange loop over variable c (line 1) and loop over variable r (line 2) and get rid of non-sequential accesses. But ideally, we want to move the short innermost loop over rr (line 5) and interchange it with other loops that have a higher trip count. Actually, with an introduction of temporary arrays to hold intermediate values we can do this, and our code looks like this:

  for (r = 0; r<rows; r++){
        memset(dot_arr, 0, cols * sizeof(float));
        memset(sum_arr, 0, cols * sizeof(float));
        for (rr = (-center); rr <= center; rr++){
            if (((r + rr) >= 0) ＆＆ ((r + rr) < rows)){
                for (c = 0; c<cols; c++){
                    dot_arr[[c]] += tempim[(r + rr) * cols + c] * kernel[center + rr];
                    sum_arr[[c]] += kernel[center + rr];
                }
            }
        }
        for (c = 0; c < cols; c++) {
            (*smoothedim)[r*cols + c] = (short int)(dot_arr[[c]]*BOOSTBLURFACTOR / sum_arr[[c]] + 0.5);
        }
    }

This transformation is more complex. We completely rearranged the loops, from (c, r, rr) to (r, rr, c), and by doing this we’ve made our code much hardware friendlier and easier for the compiler to optimize.

For both loop nests, in the original version, CLANG compiler didn’t vectorize the innermost loop. Its cost model calculated that the loop was not worth vectorizing, both because of the low trip count and non-sequential memory access pattern. For the optimized version, CLANG vectorized innermost loop for both loop nests.

The runtime of the function gaussian_smooth fell down from 5.4 seconds to 0.5 seconds. The overall program runtime went down from 9 seconds to 4.5 seconds. The function basically became irrelevant as far as overall time consumption.

Function derivative_x_y

The second function that dominates the performance profile is derivative_x_y. Let’s look at the source code (only relevant parts):

void derrivative_x_y(short int *smoothedim, int rows, int cols,
    short int **delta_x, short int **delta_y)
{
    int r, c, pos;

    ...

    /****************************************************************************
    * Compute the x-derivative. Adjust the derivative at the borders to avoid
    * losing pixels.
    ****************************************************************************/
    if (CANNY_LIB_VERBOSE) printf("   Computing the X-direction derivative.\n");
    for (r = 0; r<rows; r++){
        pos = r * cols;
        (*delta_x)[pos] = smoothedim[pos + 1] - smoothedim[pos];
        pos++;
        for (c = 1; c<(cols - 1); c++, pos++){
            (*delta_x)[pos] = smoothedim[pos + 1] - smoothedim[pos - 1];
        }
        (*delta_x)[pos] = smoothedim[pos] - smoothedim[pos - 1];
    }

    /****************************************************************************
    * Compute the y-derivative. Adjust the derivative at the borders to avoid
    * losing pixels.
    ****************************************************************************/
    if (CANNY_LIB_VERBOSE) printf("   Computing the Y-direction derivative.\n");
    for (c = 0; c<cols; c++){
        pos = c;
        (*delta_y)[pos] = smoothedim[pos + cols] - smoothedim[pos];
        pos += cols;
        for (r = 1; r<(rows - 1); r++, pos += cols){
            (*delta_y)[pos] = smoothedim[pos + cols] - smoothedim[pos - cols];
        }
        (*delta_y)[pos] = smoothedim[pos] - smoothedim[pos - cols];
    }
}

This function again consists of two loop nests. The first loop nest (lines 13-21) computes the derivative in the X direction, the second loop (lines 28-36) computes it in the Y direction.

The first loop nest looks good. The innermost loop has a high trip count and all the memory accesses are to sequential memory addresses.

The second loop nest however looks bad. Memory accesses to the arrays are with stride cols, which is bad for performance. To perform the loop interchange, the loop over c (line 28) and loop over r (line 32) should be perfectly nested, which is not the case here (lines 29-31 and line 35 are preventing the interchange).

We can perform the interchange, however, by splitting the loop over variable c (lines 28-36) into two loops. The first loop would deal with the code that is preventing us from doing the interchange. The second loop would be perfectly nested with the inner loop, thus allowing loop interchange. Here is how the transformation looks like for the second loop:

    for (c = 0; c < cols; c++) {
        pos = c;
        (*delta_y)[pos] = smoothedim[pos + cols] - smoothedim[pos];
        pos = rows * (rows - 1) + c;
        (*delta_y)[pos] = smoothedim[pos] - smoothedim[pos - cols];
    }
    for (r = 1; r < (rows - 1); r++) {
        for (c = 0; c < cols; c++) {
            pos = r * cols + c;
            (*delta_y)[pos] = smoothedim[pos + cols] - smoothedim[pos - cols];
        }
    }

We split the original loop nest into two loop nests (lines 1-6 and lines 8-13). The first loop (line 1) deals with the edge cases that were preventing loop interchange. The second loop nest (line 7) now processes data sequentially. The first loop is not optimal, but it has only cols iterations, compared to the second loop which has cols * (rows - 1) iterations and which dominates the runtime.

In the original function, CLANG vectorized the innermost loop in the first loop nest, but not in the second. Its cost model again predicted that the vectorization is inefficient due to non-sequential memory access pattern. In the optimized version, CLANG vectorized innermost loop in both loop nests.

The runtime of the function derrivative_x_y went down from 2.8 seconds to 0.15 seconds. The overall program runtime went down from 4.5 seconds to 1.8 seconds.

Function non_max_supp

This is a really long function that is showing up in the performance profile, so we are omitting its source code here.

Its main loop body processes pixels row-wise which is good for the performance. For each pixel, it also accesses the neighboring pixels, but which pixels are accessed depends on a complicated set of nested if conditions.

Subscribe to our newsletter

and receive in-depth technical articles, white papers, videos, webinars, product announcements, and more.

The major slowdown in this case comes from the nested if conditions. This type of code causes a large number of branch mispredictions, and each misprediction is costly. One way to speed it up is to go branchless.

Going branchless didn’t bring the expected speedup because there was a lot of work to be done in the branch bodies. So, no significant speedup was possible by rewriting parts of this function.

Function apply_hysteresis

Function apply_hysteresis is a longer function consisting of several loops. Some loops iterate over sequential memory locations, and these are fine; others iterate over strided memory locations, but there is nothing we can do about it since these are simple loops where no interchange is possible. E.g. one such loop:

   for (r = 0, pos = 0; r<rows; r++, pos += cols){
        edge[pos] = NOEDGE;
        edge[pos + cols - 1] = NOEDGE;
    }

However, one loop stands out: the histogram computing loop. It looks like this:

   int hist[32768];

    for (r = 0, pos = 0; r<rows; r++){
        for (c = 0; c<cols; c++, pos++){
            if (edge[pos] == POSSIBLE_EDGE) hist[mag[pos]]++;
        }
    }

This loop computes a histogram over a really large iteration space. Access to array hist (line 5) is random, which is bad, and the worse thing is that no vectorization is possible due to possible dependencies. Imagine that mag[0] = 10 and mag[1] = 10. If this is the case, executing hist[mag[0]]++ and hist[mag[1]]++ in parallel would give an undefined result.

It is however possible to speed up histograms with a trick. Instead of having one histogram array, let’s have two. In that case we could unroll a loop by a factor of two and perform two histograms in parallel (modern architectures can perform several instructions at once if the instructions do not depend on one another).

  int hist[32768], hist1[32768]; 

    memset(hist, 0, 32768 * sizeof(int));
    memset(hist1, 0, 32768 * sizeof(int));
    int end = (rows * cols) / 2 * 2; 
    for (pos = 0; pos < end; pos+=2) {
        hist[mag[pos]] += (edge[pos] == POSSIBLE_EDGE);
        hist1[mag[pos + 1]] += (edge[pos + 1] == POSSIBLE_EDGE);
    }

    if (end != rows * cols) {
        hist[mag[end]] += (edge[end] == POSSIBLE_EDGE);
    }

    for (pos = 0; pos < 32768; pos++) {
        hist[pos] += hist1[pos];
    }

The speed up was more modest this time. Originally the function took 640 ms to execute, with the change this time went down to 450 ms. Overall program runtime went down from 1.8 seconds to 1.6 seconds.

Conclusion

There was a lot of speed up potential in this implementation of Canny edge detector. By making modest changes to the Canny source code, the runtime went down from 9 seconds to 1.6 seconds which is a remarkable result. The key to having good performance is focusing on the hot loops, and making sure there is a good memory access pattern and enough iterations for the compiler to be able to do its job.

If we wanted even more speed, we would have to move from a serial to parallelized version of the code. We will cover parallelization in our next post.

Additional information
You can easily see the changes we made by comparing the source file canny-original.c and canny-optimized.c.
All the tests were executed on AMD Ryzen 7 4800H CPU with 16 cores and 16 GB of RAM memory with CLANG 10 compiler on Ubuntu 20.04.

🔗 Go to part 2 of ‘How we made the Canny edge detector run faster‘

^{1 Sequential access means that the program is accessing neighboring memory addresses: a[0], a[1], a[2], etc.}
^{2 Strided access means that the program is accessing memory addresses which are not neighboring, but with a constant stride, e.g: a[0], a[4], a[8], etc.}
^{3 Random access means that the program is accessing memory addresses in random order, e.g. a[5], a[28], a[11], a[141], etc.}

Build correct, secure, modern and fast Fortran, C and C++ scientific software

See our plans

Book a demo

Cookie	Type	Duration	Description
	0
__asc	0	30 minutes
__auc	0	1 year
__bs_id	0	1 year
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__lxGr__ses	0	15 minutes
__lxGr__var_654116	0	15 minutes
__lxGr__var_654122	0	15 minutes
__lxGr__var_654124	0	15 minutes
__lxGr__var_654130	0	15 minutes
__lxGr__var_654134	0	15 minutes
__lxGr__var_654146	0	15 minutes
__lxGr__var_654157	0	15 minutes
__lxGr__var_654161	0	15 minutes
__lxGr__var_654163	0	15 minutes
__lxGr__var_654165	0	15 minutes
__lxGr__var_654333	0	15 minutes
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
__stidv	0	1 year
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmt_onm	0	10 minutes
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_abck	0	1 year
_cb	0	1 year
_cb_ls	0	1 year
_cb_svref	0	30 minutes
_chartbeat2	0	1 year
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_ga_Y5Q8GRTQY9	0	2 years
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_bgs	0	1 minute
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gat_hearst	0	1 minute
_gat_tDelegDominio	0	1 minute
_gat_tDominio	0	1 minute
_gat_tRollupComscore	0	1 minute
_gat_tRollupDelegacion	0	1 minute
_gat_tRollupGlobal	0	1 minute
_gat_tRollupLvgTotal	0	1 minute
_gat_tRollupNivel1	0	1 minute
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_li_ss	0	1 month

Cookie	Type	Duration	Description
__cfduid	1	1 month	The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and doesn't store any personally identifiable information.
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given the consent to the usage of cookies under the category 'Necessary'.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	0	1 hour	This cookie is set by GDPR Cookie Consent plugin. The purpose of this cookie is to check whether or not the user has given their consent to the usage of cookies under the category 'Non-Necessary'.
DSID	1	1 hour	To note specific user identity. Contains hashed/encrypted unique ID.
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	0		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
pmpro_visit	0		The cookie is set by the Paid Membership Pro plugin. The cookie is used to manage user memberships.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	0	1 hour	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Type	Duration	Description
__gads	0	1 year	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
__stid	0	1 year	The cookie is set by ShareThis. The cookie is used for site analytics to determine the pages visited, the amount of time spent, etc.
_ga	0	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gat_gtag_UA_25587466_8	0	1 minute	Google uses this cookie to distinguish users.
_gat_gtag_UA_84471197_16	0	1 minute	Google uses this cookie to distinguish users.
_gid	0	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
ad-id	1	7 months	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
demdex	0	5 months	This cookie is set under the domain demdex.net and is used by Adobe Audience Manager to help identify a unique visitor across domains.
GPS	0	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
pardot	0		The cookie is set when the visitor is logged in as a Pardot user.
tk_lr	0	1 year	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_or	0	5 years	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack
tk_r3d	0	3 days	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience

Cookie	Type	Duration	Description
_fbp	0	2 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
_kuid_	0	5 months	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
ad-privacy	1	5 years	Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content to the users.
ATN	1	2 years	The cookie is set by atdmt.com. The cookies stores data about the user behavior on multiple websites. The data is then used to serve relevant advertisements to the users on the website.
dpm	0	5 months	The cookie is set by demdex.net. This cookie assigns a unique ID to each visiting user that allows third-party advertisers target that users with relevant ads.
everest_g_v2	0	1 year	The cookie is set under eversttech.net domain. The purpose of the cookie is to map clicks to other events on the client's website.
fr	1	2 months	The cookie is set by Facebook to show relevant advertisements to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
khaos	0	1 year	This cookie is set by rubiconproject.com. The cookie is used to store user data in an anonymous form such as the IP address, geographical location, websites visited, and the ads clicked. The purpose of the cookie is to tailor the ads displayed to the users based on the users movement on other site in the same ad network.
ljt_reader	0	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
mako_uid	0	1 year	This cookie is set under the domain ps.eyeota.net. The cookies is used to collect data about the users' visit to the website such as the pages visited. The data is used to create a users' profile in terms of their interest and demographic. This data is used for targeted advertising and marketing.
mc	0	1 year	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
NID	1	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
p2	0	1 week	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
personalization_id	0	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
PUBMDCID	0	2 months	This cookie is set by pubmatic.com. The cookie stores an ID that is used to display ads on the users' browser.
si	0	5 years	The cookies is set by ownerIQ for the purpose of providing relevant advertisement.
TDCPM	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	0	1 year	The cookie is set by CloudFare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	0	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
tuuid	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	0	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	0	1 month	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisements.
uuid	0	1 month	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	1	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
__utma	0	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	0	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc	0		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	0	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	0	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat	0	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gat_UA-5144860-2	0	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
AMP_TOKEN	0	1 hour	This cookie is set by Google Analytics - This cookie contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
audit	0	1 year	This cookie is set by Rubicon Project and is used for recording cookie consent data.
bcookie	0	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	0		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	0	1 day	This cookie is set by LinkedIn and used for routing.
mailchimp_landing_site	0	4 weeks	The cookie is set by the email marketing service MailChimp.
na_id	0	1 year	This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
ouid	0	1 year	The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
pid	0	1 year	Helps users identify the users and lets the users use twitter related features from the webpage they are visiting.
PugT	0	1 month	This cookie is set by pubmatic.com. The purpose of the cookie is to check when the cookies were last updated on the browser in order to limit the number of calls to the server-side cookie store.
sid	0		This cookie is very common and is used for session state management.
test_cookie	0	11 months	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
YSC	1		This cookies is set by Youtube and is used to track the views of embedded videos.

Case Study: How we made Canny edge detector algorithm run faster? (part 1)

Profiling

Analysis

Function gaussian_smooth

Function derivative_x_y

Function non_max_supp

Function apply_hysteresis

Conclusion

PRODUCT

COMPANY

SEARCH

NEWSLETTER

Profiling

Analysis

Function gaussian_smooth

Function derivative_x_y

Function non_max_supp

Function apply_hysteresis

Conclusion

Reader Interactions

Leave a Reply Cancel reply

Footer

PRODUCT

COMPANY

SEARCH

NEWSLETTER