This document describes how the heap profiler works and how to add heap profiling support to your allocator. If you just want to know how to use it, see Heap Profiling with MemoryInfra
[TOC]
The heap profiler consists of tree main components:
These components are designed to work well together, but to be usable independently as well.
When there is a way to get notified of all allocations and frees, this is the normal flow:
AllocationContextTracker::GetContextSnapshot()
to get an AllocationContext
.AllocationRegister
by calling Insert()
.Remove()
.ExportHeapDump()
, and add the generated heap dump to
the memory dump.* aside An allocator can skip step 2 and 3 if it is able to store the context itself, and if it is able to enumerate all allocations for step 4.
When heap profiling is enabled (the --enable-heap-profiling
flag is passed),
the memory dump manager calls OnHeapProfilingEnabled()
on every
MemoryDumpProvider
as early as possible, so allocators can start recording
allocations. This should be done even when tracing has not been started,
because these allocations might still be around when a heap dump happens during
tracing.
The AllocationContextTracker
is a thread-local object. Its
main purpose is to keep track of a pseudo stack of trace events. Chrome has
been instrumented with lots of TRACE_EVENT
macros. These trace events push
their name to a thread-local stack when they go into scope, and pop when they
go out of scope, if all of the following conditions have been met:
--enable-heap-profiling
flag).This means that allocations that occur before tracing is started will not have backtrace information in their context.
A thread-local instance of the context tracker is initialized lazily when it is
first accessed. This might be because a trace event pushed or popped, or because
GetContextSnapshot()
was called when an allocation occurred.
AllocationContext
is what is used to group and break down
allocations. Currently AllocationContext
has the following fields:
It is possible to modify this context after insertion into the register, for instance to set the type name if it was not known at the time of allocation.
The AllocationRegister
is a hash table specialized for
storing (size, AllocationContext)
pairs by address. It has been optimized for
Chrome's typical number of unfreed allocations, and it is backed by mmap
memory directly so there are no reentrancy issues when using it to record
malloc
allocations.
The allocation register is threading-agnostic. Access must be synchronised properly.
Dumping every single allocation in the allocation register straight into the
trace log is not an option due to the sheer volume (~300k unfreed allocations).
The role of the ExportHeapDump()
function is to group
allocations, striking a balance between trace log size and detail.
See the Heap Dump Format document for more details about the structure of the heap dump in the trace log.
Below is an example of adding heap profiling support to an allocator that has an existing memory dump provider.
class FooDumpProvider : public MemoryDumpProvider {
// Kept as pointer because |AllocationRegister| allocates a lot of virtual
// address space when constructed, so only construct it when heap profiling is
// enabled.
scoped_ptr<AllocationRegister> allocation_register_;
Lock allocation_register_lock_;
static FooDumpProvider* GetInstance();
void InsertAllocation(void* address, size_t size) {
AllocationContext context = AllocationContextTracker::GetContextSnapshot();
AutoLock lock(allocation_register_lock_);
allocation_register_->Insert(address, size, context);
}
void RemoveAllocation(void* address) {
AutoLock lock(allocation_register_lock_);
allocation_register_->Remove(address);
}
// Will be called as early as possible by the memory dump manager.
void OnHeapProfilingEnabled(bool enabled) override {
AutoLock lock(allocation_register_lock_);
allocation_register_.reset(new AllocationRegister());
// At this point, make sure that from now on, for every allocation and
// free, |FooDumpProvider::GetInstance()->InsertAllocation()| and
// |RemoveAllocation| are called.
}
bool OnMemoryDump(const MemoryDumpArgs& args,
ProcessMemoryDump& pmd) override {
// Do regular dumping here.
// Dump the heap only for detailed dumps.
if (args.level_of_detail == MemoryDumpLevelOfDetail::DETAILED) {
TraceEventMemoryOverhead overhead;
hash_map<AllocationContext, size_t> bytes_by_context;
{
AutoLock lock(allocation_register_lock_);
if (allocation_register_) {
// Group allocations in the register into |bytes_by_context|, but do
// no additional processing inside the lock.
for (const auto& alloc_size : *allocation_register_)
bytes_by_context[alloc_size.context] += alloc_size.size;
allocation_register_->EstimateTraceMemoryOverhead(&overhead);
}
}
if (!bytes_by_context.empty()) {
scoped_refptr<TracedValue> heap_dump = ExportHeapDump(
bytes_by_context,
pmd->session_state()->stack_frame_deduplicator(),
pmb->session_state()->type_name_deduplicator());
pmd->AddHeapDump("foo_allocator", heap_dump);
overhead.DumpInto("tracing/heap_profiler", pmd);
}
}
return true;
}
};
* aside
The implementation for malloc
is more complicated because it needs to deal
with reentrancy.