131 lines
4.0 KiB
Plaintext
131 lines
4.0 KiB
Plaintext
[section Object Code]
|
|
|
|
Let's look at some assembly. All assembly here was produced with Clang 4.0
|
|
with `-O3`. Given these definitions:
|
|
|
|
[arithmetic_perf_decls]
|
|
|
|
Here is a _yap_-based arithmetic function:
|
|
|
|
[arithmetic_perf_eval_as_yap_expr]
|
|
|
|
and the assembly it produces:
|
|
|
|
arithmetic_perf[0x100001c00] <+0>: pushq %rbp
|
|
arithmetic_perf[0x100001c01] <+1>: movq %rsp, %rbp
|
|
arithmetic_perf[0x100001c04] <+4>: mulsd %xmm1, %xmm0
|
|
arithmetic_perf[0x100001c08] <+8>: addsd %xmm2, %xmm0
|
|
arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
|
|
arithmetic_perf[0x100001c10] <+16>: mulsd %xmm1, %xmm1
|
|
arithmetic_perf[0x100001c14] <+20>: addsd %xmm0, %xmm1
|
|
arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
|
|
arithmetic_perf[0x100001c1c] <+28>: popq %rbp
|
|
arithmetic_perf[0x100001c1d] <+29>: retq
|
|
|
|
And for the equivalent function using builtin expressions:
|
|
|
|
[arithmetic_perf_eval_as_cpp_expr]
|
|
|
|
the assembly is:
|
|
|
|
arithmetic_perf[0x100001e10] <+0>: pushq %rbp
|
|
arithmetic_perf[0x100001e11] <+1>: movq %rsp, %rbp
|
|
arithmetic_perf[0x100001e14] <+4>: mulsd %xmm1, %xmm0
|
|
arithmetic_perf[0x100001e18] <+8>: addsd %xmm2, %xmm0
|
|
arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
|
|
arithmetic_perf[0x100001e20] <+16>: mulsd %xmm1, %xmm1
|
|
arithmetic_perf[0x100001e24] <+20>: addsd %xmm0, %xmm1
|
|
arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
|
|
arithmetic_perf[0x100001e2c] <+28>: popq %rbp
|
|
arithmetic_perf[0x100001e2d] <+29>: retq
|
|
|
|
If we increase the number of terminals by a factor of four:
|
|
|
|
[arithmetic_perf_eval_as_yap_expr_4x]
|
|
|
|
the results are the same: in this simple case, the _yap_ and builtin
|
|
expressions result in the same object code.
|
|
|
|
However, increasing the number of terminals by an additional factor of 2.5
|
|
(for a total of 90 terminals), the inliner can no longer do as well for _yap_
|
|
expressions as for builtin ones.
|
|
|
|
More complex nonarithmetic code produces more mixed results. For example, here
|
|
is a function using code from the Map Assign example:
|
|
|
|
std::map<std::string, int> make_map_with_boost_yap ()
|
|
{
|
|
return map_list_of
|
|
("<", 1)
|
|
("<=",2)
|
|
(">", 3)
|
|
(">=",4)
|
|
("=", 5)
|
|
("<>",6)
|
|
;
|
|
}
|
|
|
|
By contrast, here is the Boost.Assign version of the same function:
|
|
|
|
std::map<std::string, int> make_map_with_boost_assign ()
|
|
{
|
|
return boost::assign::map_list_of
|
|
("<", 1)
|
|
("<=",2)
|
|
(">", 3)
|
|
(">=",4)
|
|
("=", 5)
|
|
("<>",6)
|
|
;
|
|
}
|
|
|
|
Here is how you might do it "manually":
|
|
|
|
std::map<std::string, int> make_map_manually ()
|
|
{
|
|
std::map<std::string, int> retval;
|
|
retval.emplace("<", 1);
|
|
retval.emplace("<=",2);
|
|
retval.emplace(">", 3);
|
|
retval.emplace(">=",4);
|
|
retval.emplace("=", 5);
|
|
retval.emplace("<>",6);
|
|
return retval;
|
|
}
|
|
|
|
Finally, here is the same map created from an initializer list:
|
|
|
|
std::map<std::string, int> make_map_inializer_list ()
|
|
{
|
|
std::map<std::string, int> retval = {
|
|
{"<", 1},
|
|
{"<=",2},
|
|
{">", 3},
|
|
{">=",4},
|
|
{"=", 5},
|
|
{"<>",6}
|
|
};
|
|
return retval;
|
|
}
|
|
|
|
All of these produce roughly the same amount of assembly instructions.
|
|
Benchmarking these four functions with Google Benchmark yields these results:
|
|
|
|
[table Runtimes of Different Map Constructions
|
|
[[Function] [Time (ns)]]
|
|
|
|
[[make_map_with_boost_yap()] [1285]]
|
|
[[make_map_with_boost_assign()] [1459]]
|
|
[[make_map_manually()] [985]]
|
|
[[make_map_inializer_list()] [954]]
|
|
]
|
|
|
|
The _yap_-based implementation finishes in the middle of the pack.
|
|
|
|
In general, the expression trees produced by _yap_ get evaluated down to
|
|
something close to the hand-written equivalent. There is an abstraction
|
|
penalty, but it is small for reasonably-sized expressions.
|
|
|
|
|
|
[endsect]
|