|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149 |
- This is a description of the profile.proto format.
-
- # Overview
-
- Profile.proto is a data representation for profile data. It is independent of
- the type of data being collected and the sampling process used to collect that
- data. On disk, it is represented as a gzip-compressed protocol buffer, described
- at src/proto/profile.proto
-
- A profile in this context refers to a collection of samples, each one
- representing measurements performed at a certain point in the life of a job. A
- sample associates a set of measurement values with a list of locations, commonly
- representing the program call stack when the sample was taken.
-
- Tools such as pprof analyze these samples and display this information in
- multiple forms, such as identifying hottest locations, building graphical call
- graphs or trees, etc.
-
- # General structure of a profile
-
- A profile is represented on a Profile message, which contain the following
- fields:
-
- * *sample*: A profile sample, with the values measured and the associated call
- stack as a list of location ids. Samples with identical call stacks can be
- merged by adding their respective values, element by element.
- * *location*: A unique place in the program, commonly mapped to a single
- instruction address. It has a unique nonzero id, to be referenced from the
- samples. It contains source information in the form of lines, and a mapping id
- that points to a binary.
- * *function*: A program function as defined in the program source. It has a
- unique nonzero id, referenced from the location lines. It contains a
- human-readable name for the function (eg a C++ demangled name), a system name
- (eg a C++ mangled name), the name of the corresponding source file, and other
- function attributes.
- * *mapping*: A binary that is part of the program during the profile
- collection. It has a unique nonzero id, referenced from the locations. It
- includes details on how the binary was mapped during program execution. By
- convention the main program binary is the first mapping, followed by any
- shared libraries.
- * *string_table*: All strings in the profile are represented as indices into
- this repeating field. The first string is empty, so index == 0 always
- represents the empty string.
-
- # Measurement values
-
- Measurement values are represented as 64-bit integers. The profile contains an
- explicit description of each value represented, using a ValueType message, with
- two fields:
-
- * *Type*: A human-readable description of the type semantics. For example “cpu”
- to represent CPU time, “wall” or “time” for wallclock time, or “memory” for
- bytes allocated.
- * *Unit*: A human-readable name of the unit represented by the 64-bit integer
- values. For example, it could be “nanoseconds” or “milliseconds” for a time
- value, or “bytes” or “megabytes” for a memory size. If this is just
- representing a number of events, the recommended unit name is “count”.
-
- A profile can represent multiple measurements per sample, but all samples must
- have the same number and type of measurements. The actual values are stored in
- the Sample.value fields, each one described by the corresponding
- Profile.sample_type field.
-
- Some profiles have a uniform period that describe the granularity of the data
- collection. For example, a CPU profile may have a period of 100ms, or a memory
- allocation profile may have a period of 512kb. Profiles can optionally describe
- such a value on the Profile.period and Profile.period_type fields. The profile
- period is meant for human consumption and does not affect the interpretation of
- the profiling data.
-
- By convention, the first value on all profiles is the number of samples
- collected at this call stack, with unit “count”. Because the profile does not
- describe the sampling process beyond the optional period, it must include
- unsampled values for all measurements. For example, a CPU profile could have
- value[0] == samples, and value[1] == time in milliseconds.
-
- ## Locations, functions and mappings
-
- Each sample lists the id of each location where the sample was collected, in
- bottom-up order. Each location has an explicit unique nonzero integer id,
- independent of its position in the profile, and holds additional information to
- identify the corresponding source.
-
- The profile source is expected to perform any adjustment required to the
- locations in order to point to the calls in the stack. For example, if the
- profile source extracts the call stack by walking back over the program stack,
- it must adjust the instruction addresses to point to the actual call
- instruction, instead of the instruction that each call will return to.
-
- Sources usually generate profiles that fall into these two categories:
-
- * *Unsymbolized profiles*: These only contain instruction addresses, and are to
- be symbolized by a separate tool. It is critical for each location to point to
- a valid mapping, which will provide the information required for
- symbolization. These are used for profiles of compiled languages, such as C++
- and Go.
-
- * *Symbolized profiles*: These contain all the symbol information available for
- the profile. Mappings and instruction addresses are optional for symbolized
- locations. These are used for profiles of interpreted or jitted languages,
- such as Java or Python. Also, the profile format allows the generation of
- mixed profiles, with symbolized and unsymbolized locations.
-
- The symbol information is represented in the repeating lines field of the
- Location message. A location has multiple lines if it reflects multiple program
- sources, for example if representing inlined call stacks. Lines reference
- functions by their unique nonzero id, and the source line number within the
- source file listed by the function. A function contains the source attributes
- for a function, including its name, source file, etc. Functions include both a
- user and a system form of the name, for example to include C++ demangled and
- mangled names. For profiles where only a single name exists, both should be set
- to the same string.
-
- Mappings are also referenced from locations by their unique nonzero id, and
- include all information needed to symbolize addresses within the mapping. It
- includes similar information to the Linux /proc/self/maps file. Locations
- associated to a mapping should have addresses that land between the mapping
- start and limit. Also, if available, mappings should include a build id to
- uniquely identify the version of the binary being used.
-
- ## Labels
-
- Samples optionally contain labels, which are annotations to discriminate samples
- with identical locations. For example, a label can be used on a malloc profile
- to indicate allocation size, so two samples on the same call stack with sizes
- 2MB and 4MB do not get merged into a single sample with two allocations and a
- size of 6MB.
-
- Labels can be string-based or numeric. They are represented by the Label
- message, with a key identifying the label and either a string or numeric
- value. For numeric labels, the measurement unit can be specified in the profile.
- If no unit is specified and the key is "request" or "alignment",
- then the units are assumed to be "bytes". Otherwise when no unit is specified
- the key will be used as the measurement unit of the numeric value. All tags with
- the same key should have the same unit.
-
- ## Keep and drop expressions
-
- Some profile sources may have knowledge of locations that are uninteresting or
- irrelevant. However, if symbolization is needed in order to identify these
- locations, the profile source may not be able to remove them when the profile is
- generated. The profile format provides a mechanism to identify these frames by
- name, through regular expressions.
-
- These expressions must match the function name in its entirety. Frames that
- match Profile.drop\_frames will be dropped from the profile, along with any
- frames below it. Frames that match Profile.keep\_frames will be kept, even if
- they match drop\_frames.
-
|