diff options
author | Bertrand SIMONNET <bsimonnet@google.com> | 2015-12-21 11:50:58 -0800 |
---|---|---|
committer | Bertrand SIMONNET <bsimonnet@google.com> | 2016-01-06 17:19:30 -0800 |
commit | 5451fc175104bba335308ffe75b22a7724ef1434 (patch) | |
tree | 26c38dfa556d28b6c8f4166d2bd7dd3b337cf969 | |
parent | b4f534d77926867f2cded1f1c0d82381b32cceb8 (diff) | |
download | metricsd-5451fc175104bba335308ffe75b22a7724ef1434.tar.gz |
Fix documentation.
This fixes the documentation for UploadService and the README file:
* stop mentioning Chrome OS as we forked the code.
* update the histogram declaration workflow. It is not tied to Chrome's
histograms.xml file anymore.
* update the architecture explanation of metricsd. We split
metrics_daemon into two daemons and are now using binder to log
metrics.
* convert README to markdown to make it prettier when viewed in gitiles.
Bug: 26314417
Change-Id: I1e492f1211c1784e65dd4d3e473bb9aacefc3b5d
-rw-r--r-- | README | 150 | ||||
-rw-r--r-- | README.md | 124 | ||||
-rw-r--r-- | uploader/upload_service.h | 43 |
3 files changed, 147 insertions, 170 deletions
@@ -1,150 +0,0 @@ -Copyright (C) 2015 The Android Open Source Project - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. - -================================================================================ - -The Chrome OS "metrics" package contains utilities for client-side user metric -collection. -When Chrome is installed, Chrome will take care of aggregating and uploading the -metrics to the UMA server. -When Chrome is not installed (embedded build) and the metrics_uploader USE flag -is set, metrics_daemon will aggregate and upload the metrics itself. - - -================================================================================ -The Metrics Library: libmetrics -================================================================================ - -libmetrics is a small library that implements the basic C and C++ API for -metrics collection. All metrics collection is funneled through this library. The -easiest and recommended way for a client-side module to collect user metrics is -to link libmetrics and use its APIs to send metrics to Chrome for transport to -UMA. In order to use the library in a module, you need to do the following: - -- Add a dependence (DEPEND and RDEPEND) on chromeos-base/metrics to the module's - ebuild. - -- Link the module with libmetrics (for example, by passing -lmetrics to the - module's link command). Both libmetrics.so and libmetrics.a are built and - installed under $SYSROOT/usr/lib/. Note that by default -lmetrics will link - against libmetrics.so, which is preferred. - -- To access the metrics library API in the module, include the - <metrics/metrics_library.h> header file. The file is installed in - $SYSROOT/usr/include/ when the metrics library is built and installed. - -- The API is documented in metrics_library.h under src/platform/metrics/. Before - using the API methods, a MetricsLibrary object needs to be constructed and - initialized through its Init method. - - For more information on the C API see c_metrics_library.h. - -- Samples are sent to Chrome only if the "/home/chronos/Consent To Send Stats" - file exists or the metrics are declared enabled in the policy file (see the - AreMetricsEnabled API method). - -- On the target platform, shortly after the sample is sent, it should be visible - in Chrome through "about:histograms". - - -================================================================================ -Histogram Naming Convention -================================================================================ - -Use TrackerArea.MetricName. For example: - -Platform.DailyUseTime -Network.TimeToDrop - - -================================================================================ -Server Side -================================================================================ - -If the histogram data is visible in about:histograms, it will be sent by an -official Chrome build to UMA, assuming the user has opted into metrics -collection. To make the histogram visible on "chromedashboard", the histogram -description XML file needs to be updated (steps 2 and 3 after following the -"Details on how to add your own histograms" link under the Histograms tab). -Include the string "Chrome OS" in the histogram description so that it's easier -to distinguish Chrome OS specific metrics from general Chrome histograms. - -The UMA server logs and keeps the collected field data even if the metric's name -is not added to the histogram XML. However, the dashboard histogram for that -metric will show field data as of the histogram XML update date; it will not -include data for older dates. If past data needs to be displayed, manual -server-side intervention is required. In other words, one should assume that -field data collection starts only after the histogram XML has been updated. - - -================================================================================ -The Metrics Client: metrics_client -================================================================================ - -metrics_client is a simple shell command-line utility for sending histogram -samples and user actions. It's installed under /usr/bin on the target platform -and uses libmetrics to send the data to Chrome. The utility is useful for -generating metrics from shell scripts. - -For usage information and command-line options, run "metrics_client" on the -target platform or look for "Usage:" in metrics_client.cc. - - -================================================================================ -The Metrics Daemon: metrics_daemon -================================================================================ - -metrics_daemon is a daemon that runs in the background on the target platform -and is intended for passive or ongoing metrics collection, or metrics collection -requiring feedback from multiple modules. For example, it listens to D-Bus -signals related to the user session and screen saver states to determine if the -user is actively using the device or not and generates the corresponding -data. The metrics daemon uses libmetrics to send the data to Chrome. - -The recommended way to generate metrics data from a module is to link and use -libmetrics directly. However, the module could instead send signals to or -communicate in some alternative way with the metrics daemon. Then the metrics -daemon needs to monitor for the relevant events and take appropriate action -- -for example, aggregate data and send the histogram samples. - - -================================================================================ -FAQ -================================================================================ - -Q. What should my histogram's |min| and |max| values be set at? - -A. You should set the values to a range that covers the vast majority of samples - that would appear in the field. Note that samples below the |min| will still - be collected in the underflow bucket and samples above the |max| will end up - in the overflow bucket. Also, the reported mean of the data will be correct - regardless of the range. - -Q. How many buckets should I use in my histogram? - -A. You should allocate as many buckets as necessary to perform proper analysis - on the collected data. Note, however, that the memory allocated in Chrome for - each histogram is proportional to the number of buckets. Therefore, it is - strongly recommended to keep this number low (e.g., 50 is normal, while 100 - is probably high). - -Q. When should I use an enumeration (linear) histogram vs. a regular - (exponential) histogram? - -A. Enumeration histograms should really be used only for sampling enumerated - events and, in some cases, percentages. Normally, you should use a regular - histogram with exponential bucket layout that provides higher resolution at - the low end of the range and lower resolution at the high end. Regular - histograms are generally used for collecting performance data (e.g., timing, - memory usage, power) as well as aggregated event counts. diff --git a/README.md b/README.md new file mode 100644 index 0000000..8d4828c --- /dev/null +++ b/README.md @@ -0,0 +1,124 @@ +Metricsd +======== + +The metricsd daemon is used to gather metrics from the platform and application, +aggregate them and upload them periodically to a server. +The metrics will then be available in their aggregated form to the developer +for analysis. + +Three components are provided to interact with `metricsd`: `libmetrics`, +`metrics_collector` and `metrics_client`. + +The Metrics Library: libmetrics +------------------------------- + +`libmetrics` is a small library that implements the basic C++ API for +metrics collection. All metrics collection is funneled through this library. The +easiest and recommended way for a client-side module to collect user metrics is +to link `libmetrics` and use its APIs to send metrics to `metricsd` for transport to +UMA. In order to use the library in a module, you need to do the following: + +- Add a dependency on the shared library in your Android.mk file: + `LOCAL_SHARED_LIBRARIES += libmetrics` + +- To access the metrics library API in the module, include the + <metrics/metrics_library.h> header file. + +- The API is documented in `metrics_library.h`. Before using the API methods, a + MetricsLibrary object needs to be constructed and initialized through its + Init method. + +- Samples are uploaded only if the `/data/misc/metrics/enabled` file exists. + + +Server Side +----------- + +You will be able to see all uploaded metrics on the metrics dashboard, +accessible via the developer console. + +*** note +It usually takes a day for metrics to be available on the dashboard. +*** + + +The Metrics Client: metrics_client +---------------------------------- + +`metrics_client` is a simple shell command-line utility for sending histogram +samples and querying `metricsd`. It's installed under `/system/bin` on the target +platform and uses `libmetrics`. + +For usage information and command-line options, run `metrics_client` on the +target platform or look for "Usage:" in `metrics_client.cc`. + + +The Metrics Daemon: metricsd +---------------------------- + +`metricsd` is the daemon that listens for metrics logging calls (via Binder), +aggregates the metrics and uploads them periodically. This daemon should start as +early as possible so that depending daemons can log at any time. + +`metricsd` is made of two threads that work as follows: + +* The binder thread listens for one-way Binder calls, aggregates the metrics in + memory (via `base::StatisticsRecorder`) and increments the crash counters when a + crash is reported. This thread is kept as simple as possible to ensure the + maximum throughput possible. +* The uploader thread takes care of backing up the metrics to disk periodically + (to avoid losing metrics on crashes), collecting metadata about the client + (version number, channel, etc..) and uploading the metrics periodically to the + server. + + +The Metrics Collector: metrics_collector +---------------------------------------- + +metrics_collector is a daemon that runs in the background on the target platform, +gathers health information about the system and maintains long running counters +(ex: number of crashes per week). + +The recommended way to generate metrics data from a module is to link and use +libmetrics directly. However, we may not want to add a dependency on libmetrics +to some modules (ex: kernel). In this case, we can add a collector to +metrics_collector that will, for example, take measurements and report them +periodically to metricsd (this is the case for the disk utilization histogram). + + +FAQ +--- + +### What should my histogram's |min| and |max| values be set at? + +You should set the values to a range that covers the vast majority of samples +that would appear in the field. Note that samples below the |min| will still +be collected in the underflow bucket and samples above the |max| will end up +in the overflow bucket. Also, the reported mean of the data will be correct +regardless of the range. + +### How many buckets should I use in my histogram? + +You should allocate as many buckets as necessary to perform proper analysis +on the collected data. Note, however, that the memory allocated in metricsd +for each histogram is proportional to the number of buckets. Therefore, it is +strongly recommended to keep this number low (e.g., 50 is normal, while 100 +is probably high). + +### When should I use an enumeration (linear) histogram vs. a regular (exponential) histogram? + +Enumeration histograms should really be used only for sampling enumerated +events and, in some cases, percentages. Normally, you should use a regular +histogram with exponential bucket layout that provides higher resolution at +the low end of the range and lower resolution at the high end. Regular +histograms are generally used for collecting performance data (e.g., timing, +memory usage, power) as well as aggregated event counts. + +### How can I test that my histogram was reported correctly? + +* Make sure no error messages appear in logcat when you log a sample. +* Run `metrics_client -d` to dump the currently aggregated metrics. Your + histogram should appear in the list. +* Make sure that the aggregated metrics were uploaded to the server successfully + (check for an OK message from `metricsd` in logcat). +* After a day, your histogram should be available on the dashboard. diff --git a/uploader/upload_service.h b/uploader/upload_service.h index 1d36121..420653e 100644 --- a/uploader/upload_service.h +++ b/uploader/upload_service.h @@ -34,30 +34,33 @@ class SystemProfileSetter; -// Service responsible for uploading the metrics periodically to the server. -// This service works as a simple 2-state state-machine. +// Service responsible for backing up the currently aggregated metrics to disk +// and uploading them periodically to the server. // -// The two states are the presence or not of a staged log. -// A staged log is a compressed protobuffer containing both the aggregated -// metrics and event and information about the client. (product, -// model_manifest_id, etc...). +// A given metrics sample can be in one of three locations. +// * in-memory metrics: in memory aggregated metrics, waiting to be staged for +// upload. +// * saved log: protobuf message, written to disk periodically and on shutdown +// to make a backup of metrics data for uploading later. +// * staged log: protobuf message waiting to be uploaded. // -// At regular intervals, the upload event will be triggered and the following -// will happen: -// * if a staged log is present: -// The previous upload may have failed for various reason. We then retry to -// upload the same log. -// - if the upload is successful, we discard the log (therefore -// transitioning back to no staged log) -// - if the upload fails, we keep the log to try again later. +// The service works as follows: +// On startup, we create the in-memory metrics from the saved log if it exists. // -// * if no staged logs are present: -// Take a snapshot of the aggregated metrics, save it to disk and try to send -// it: -// - if the upload succeeds, we discard the staged log (transitioning back -// to the no staged log state) -// - if the upload fails, we continue and will retry to upload later. +// Periodically (every |disk_persistence_interval_| seconds), we take a snapshot +// of the in-memory metrics and save them to disk. // +// Periodically (every |upload_interval| seconds), we: +// * take a snapshot of the in-memory metrics and create the staged log +// * save the staged log to disk to avoid losing it if metricsd or the system +// crashes between two uploads. +// * delete the last saved log: all the metrics contained in it are also in the +// newly created staged log. +// +// On shutdown (SIGINT or SIGTERM), we save the in-memory metrics to disk. +// +// Note: the in-memory metrics can be stored in |current_log_| or +// base::StatisticsRecorder. class UploadService : public base::HistogramFlattener, public brillo::Daemon { public: UploadService(const std::string& server, |