What does the tbb::scalable_allocator
in Intel Threading Building Blocks actually do under the hood ?
It can certainly be effective. I've just used it to take 25% off an apps' execution time (and see an increase in CPU utilization from ~200% to 350% on a 4-core system) by changing a single std::vector<T>
to std::vector<T,tbb::scalable_allocator<T> >
. On the other hand in another app I've seen it double an already large memory consumption and send things to swap city.
Intel's own documentation doesn't give a lot away (e.g a short section at the end of this FAQ). Can anyone tell me what tricks it uses before I go and dig into its code myself ?
UPDATE: Just using TBB 3.0 for the first time, and seen my best speedup from scalable_allocator yet. Changing a single vector<int>
to a vector<int,scalable_allocator<int> >
reduced the runtime of something from 85s to 35s (Debian Lenny, Core2, with TBB 3.0 from testing).
-
There is a good paper on the allocator: download.intel.com/technology/itj/2007/v11i4/5-foundations/5-Foundations_for_Scalable_Multi-core_Software.pdf
My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn't compare the memory usage though.
timday : Thanks! Article contains exactly the sort of information I was looking for.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.