Hi Gerhard,
I've been playing around with your toy example, and as far as lambda functions go, it will only perform better if the alternative is an actual (not inline) function call.
Consider the two following functions (not inlined so I can find the assembly code):
static __attribute__ ((noinline)) double do_things_lambda(const Hits &c) { // define a predicate auto layer2 = [](const Hit& h) { return h.layer() == 2; };
// define an operation.. double x = 0; auto sumx = [&x](const Hit& h) { x += h.x(); };
// and... action! . for ( const auto& hit : c ) { if (!layer2(*hit)) continue; sumx( *hit ); }
return x; }
static __attribute__ ((noinline)) double do_things_normal(const Hits &c) { double x = 0;
for ( const auto& hit : c ) { if (hit->layer() != 2) continue; x += hit->x(); }
return x; }
Both functions result in the same assembly code:
.LFB1852: .cfi_startproc movq (%rdi), %rax movq 8(%rdi), %rcx xorpd %xmm0, %xmm0 cmpq %rcx, %rax jne .L4 jmp .L8 .p2align 4,,10 .p2align 3 .L3: addq $8, %rax cmpq %rax, %rcx je .L7 .L4: movq (%rax), %rdx cmpl $2, 12(%rdx) jne .L3 movss (%rdx), %xmm1 addq $8, %rax cmpq %rax, %rcx cvtps2pd %xmm1, %xmm1 addsd %xmm1, %xmm0 jne .L4 .L7: rep ret .L8: ret .cfi_endproc
As for me, I'd rather see the immediately obvious code as in do_things_normal(), rather than trying to figure out what all lambda-one does.
To be continued... :)
-- Wilco
Hi Wilco,
in this example, I’m certainly not going to argue against you ;-)
But if you start ‘lifting’ (i.e. factor out implentation details, and writing fully generic code) then one ‘has’ to remove the ‘detail' of what the predicate actually does and ‘package’ it into some functor which is passed into the generic code — in general, I’d like to see some ‘higher level iterators’, along the lines of http://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/ada... in which case one can separate the ‘looping over the good bits of stuff only’ from the ‘do something with the individual items of good stuff’. Extrapolating from this example, the predicate shouldn’t even appear in the loop body, but in the loop-control part. That way one can ‘compose’ various predicates, and also various loop bodies. Concrete example: there is no ‘std::transform_if’ — but if uses the boost range adaptors, then there is no need for one, as the ‘if’ bit is pushed into the iterators ;-)
So in that sense, it is good that the compilers we have are good enough that a lambda has no overhead ;-) Also, in case the lambda becomes more than a few lines of code, or is used in several locations, then it makes sense to code a dedicated struct with an operator() and give it a name — just the fact that it has a name serves as documentation…
Cheers, — Gerhard
On 11 Feb 2014, at 14:08, Wilco Baan Hofman wilcobh@nikhef.nl wrote:
Hi Gerhard,
I've been playing around with your toy example, and as far as lambda functions go, it will only perform better if the alternative is an actual (not inline) function call.
Consider the two following functions (not inlined so I can find the assembly code):
static __attribute__ ((noinline)) double do_things_lambda(const Hits &c) { // define a predicate auto layer2 = [](const Hit& h) { return h.layer() == 2; };
// define an operation.. double x = 0; auto sumx = [&x](const Hit& h) { x += h.x(); };
// and... action! . for ( const auto& hit : c ) { if (!layer2(*hit)) continue; sumx( *hit ); }
return x; }
static __attribute__ ((noinline)) double do_things_normal(const Hits &c) { double x = 0;
for ( const auto& hit : c ) { if (hit->layer() != 2) continue; x += hit->x(); }
return x; }
Both functions result in the same assembly code:
.LFB1852: .cfi_startproc movq (%rdi), %rax movq 8(%rdi), %rcx xorpd %xmm0, %xmm0 cmpq %rcx, %rax jne .L4 jmp .L8 .p2align 4,,10 .p2align 3 .L3: addq $8, %rax cmpq %rax, %rcx je .L7 .L4: movq (%rax), %rdx cmpl $2, 12(%rdx) jne .L3 movss (%rdx), %xmm1 addq $8, %rax cmpq %rax, %rcx cvtps2pd %xmm1, %xmm1 addsd %xmm1, %xmm0 jne .L4 .L7: rep ret .L8: ret .cfi_endproc
As for me, I'd rather see the immediately obvious code as in do_things_normal(), rather than trying to figure out what all lambda-one does.
To be continued... :)
-- Wilco
Ltop mailing list Ltop@nikhef.nl https://mailman.nikhef.nl/mailman/listinfo/ltop
On 11/02/14 16:38, Gerhard Raven wrote:
Hi Wilco,
in this example, I’m certainly not going to argue against you ;-)
But if you start ‘lifting’ (i.e. factor out implentation details, and writing fully generic code) then one ‘has’ to remove the ‘detail' of what the predicate actually does and ‘package’ it into some functor which is passed into the generic code — in general, I’d like to see some ‘higher level iterators’, along the lines of http://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/ada... in which case one can separate the ‘looping over the good bits of stuff only’ from the ‘do something with the individual items of good stuff’. Extrapolating from this example, the predicate shouldn’t even appear in the loop body, but in the loop-control part. That way one can ‘compose’ various predicates, and also various loop bodies. Concrete example: there is no ‘std::transform_if’ — but if uses the boost range adaptors, then there is no need for one, as the ‘if’ bit is pushed into the iterators ;-)
Most of this can be done with inline functions, though lambda functions are particularly useful for as arguments for maps, sorts, filters, or other functions which require a predicate.
So in that sense, it is good that the compilers we have are good enough that a lambda has no overhead ;-) Also, in case the lambda becomes more than a few lines of code, or is used in several locations, then it makes sense to code a dedicated struct with an operator() and give it a name — just the fact that it has a name serves as documentation…
Sure, but if you start naming your lambda/anonymous functions and reusing them in multiple functions, they are no longer anonymous and might as well use an inline function, which has the same net effect.
-- Wilco