You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 7.8 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
  1. This is a description of the profile.proto format.
  2. # Overview
  3. Profile.proto is a data representation for profile data. It is independent of
  4. the type of data being collected and the sampling process used to collect that
  5. data. On disk, it is represented as a gzip-compressed protocol buffer, described
  6. at src/proto/profile.proto
  7. A profile in this context refers to a collection of samples, each one
  8. representing measurements performed at a certain point in the life of a job. A
  9. sample associates a set of measurement values with a list of locations, commonly
  10. representing the program call stack when the sample was taken.
  11. Tools such as pprof analyze these samples and display this information in
  12. multiple forms, such as identifying hottest locations, building graphical call
  13. graphs or trees, etc.
  14. # General structure of a profile
  15. A profile is represented on a Profile message, which contain the following
  16. fields:
  17. * *sample*: A profile sample, with the values measured and the associated call
  18. stack as a list of location ids. Samples with identical call stacks can be
  19. merged by adding their respective values, element by element.
  20. * *location*: A unique place in the program, commonly mapped to a single
  21. instruction address. It has a unique nonzero id, to be referenced from the
  22. samples. It contains source information in the form of lines, and a mapping id
  23. that points to a binary.
  24. * *function*: A program function as defined in the program source. It has a
  25. unique nonzero id, referenced from the location lines. It contains a
  26. human-readable name for the function (eg a C++ demangled name), a system name
  27. (eg a C++ mangled name), the name of the corresponding source file, and other
  28. function attributes.
  29. * *mapping*: A binary that is part of the program during the profile
  30. collection. It has a unique nonzero id, referenced from the locations. It
  31. includes details on how the binary was mapped during program execution. By
  32. convention the main program binary is the first mapping, followed by any
  33. shared libraries.
  34. * *string_table*: All strings in the profile are represented as indices into
  35. this repeating field. The first string is empty, so index == 0 always
  36. represents the empty string.
  37. # Measurement values
  38. Measurement values are represented as 64-bit integers. The profile contains an
  39. explicit description of each value represented, using a ValueType message, with
  40. two fields:
  41. * *Type*: A human-readable description of the type semantics. For example “cpu”
  42. to represent CPU time, “wall” or “time” for wallclock time, or “memory” for
  43. bytes allocated.
  44. * *Unit*: A human-readable name of the unit represented by the 64-bit integer
  45. values. For example, it could be “nanoseconds” or “milliseconds” for a time
  46. value, or “bytes” or “megabytes” for a memory size. If this is just
  47. representing a number of events, the recommended unit name is “count”.
  48. A profile can represent multiple measurements per sample, but all samples must
  49. have the same number and type of measurements. The actual values are stored in
  50. the Sample.value fields, each one described by the corresponding
  51. Profile.sample_type field.
  52. Some profiles have a uniform period that describe the granularity of the data
  53. collection. For example, a CPU profile may have a period of 100ms, or a memory
  54. allocation profile may have a period of 512kb. Profiles can optionally describe
  55. such a value on the Profile.period and Profile.period_type fields. The profile
  56. period is meant for human consumption and does not affect the interpretation of
  57. the profiling data.
  58. By convention, the first value on all profiles is the number of samples
  59. collected at this call stack, with unit “count”. Because the profile does not
  60. describe the sampling process beyond the optional period, it must include
  61. unsampled values for all measurements. For example, a CPU profile could have
  62. value[0] == samples, and value[1] == time in milliseconds.
  63. ## Locations, functions and mappings
  64. Each sample lists the id of each location where the sample was collected, in
  65. bottom-up order. Each location has an explicit unique nonzero integer id,
  66. independent of its position in the profile, and holds additional information to
  67. identify the corresponding source.
  68. The profile source is expected to perform any adjustment required to the
  69. locations in order to point to the calls in the stack. For example, if the
  70. profile source extracts the call stack by walking back over the program stack,
  71. it must adjust the instruction addresses to point to the actual call
  72. instruction, instead of the instruction that each call will return to.
  73. Sources usually generate profiles that fall into these two categories:
  74. * *Unsymbolized profiles*: These only contain instruction addresses, and are to
  75. be symbolized by a separate tool. It is critical for each location to point to
  76. a valid mapping, which will provide the information required for
  77. symbolization. These are used for profiles of compiled languages, such as C++
  78. and Go.
  79. * *Symbolized profiles*: These contain all the symbol information available for
  80. the profile. Mappings and instruction addresses are optional for symbolized
  81. locations. These are used for profiles of interpreted or jitted languages,
  82. such as Java or Python. Also, the profile format allows the generation of
  83. mixed profiles, with symbolized and unsymbolized locations.
  84. The symbol information is represented in the repeating lines field of the
  85. Location message. A location has multiple lines if it reflects multiple program
  86. sources, for example if representing inlined call stacks. Lines reference
  87. functions by their unique nonzero id, and the source line number within the
  88. source file listed by the function. A function contains the source attributes
  89. for a function, including its name, source file, etc. Functions include both a
  90. user and a system form of the name, for example to include C++ demangled and
  91. mangled names. For profiles where only a single name exists, both should be set
  92. to the same string.
  93. Mappings are also referenced from locations by their unique nonzero id, and
  94. include all information needed to symbolize addresses within the mapping. It
  95. includes similar information to the Linux /proc/self/maps file. Locations
  96. associated to a mapping should have addresses that land between the mapping
  97. start and limit. Also, if available, mappings should include a build id to
  98. uniquely identify the version of the binary being used.
  99. ## Labels
  100. Samples optionally contain labels, which are annotations to discriminate samples
  101. with identical locations. For example, a label can be used on a malloc profile
  102. to indicate allocation size, so two samples on the same call stack with sizes
  103. 2MB and 4MB do not get merged into a single sample with two allocations and a
  104. size of 6MB.
  105. Labels can be string-based or numeric. They are represented by the Label
  106. message, with a key identifying the label and either a string or numeric
  107. value. For numeric labels, the measurement unit can be specified in the profile.
  108. If no unit is specified and the key is "request" or "alignment",
  109. then the units are assumed to be "bytes". Otherwise when no unit is specified
  110. the key will be used as the measurement unit of the numeric value. All tags with
  111. the same key should have the same unit.
  112. ## Keep and drop expressions
  113. Some profile sources may have knowledge of locations that are uninteresting or
  114. irrelevant. However, if symbolization is needed in order to identify these
  115. locations, the profile source may not be able to remove them when the profile is
  116. generated. The profile format provides a mechanism to identify these frames by
  117. name, through regular expressions.
  118. These expressions must match the function name in its entirety. Frames that
  119. match Profile.drop\_frames will be dropped from the profile, along with any
  120. frames below it. Frames that match Profile.keep\_frames will be kept, even if
  121. they match drop\_frames.