Fix documentation.

This fixes the documentation for UploadService and the README file: * stop mentioning Chrome OS as we forked the code. * update the histogram declaration workflow. It is not tied to Chrome's histograms.xml file anymore. * update the architecture explanation of metricsd. We split metrics_daemon into two daemons and are now using binder to log metrics. * convert README to markdown to make it prettier when viewed in gitiles. Bug: 26314417 Change-Id: I1e492f1211c1784e65dd4d3e473bb9aacefc3b5d
author: Bertrand SIMONNET <bsimonnet@google.com> 2015-12-21 11:50:58 -0800
committer: Bertrand SIMONNET <bsimonnet@google.com> 2016-01-06 17:19:30 -0800
commit: 5451fc175104bba335308ffe75b22a7724ef1434 (patch)
tree: 26c38dfa556d28b6c8f4166d2bd7dd3b337cf969
parent: b4f534d77926867f2cded1f1c0d82381b32cceb8 (diff)
download: metricsd-5451fc175104bba335308ffe75b22a7724ef1434.tar.gz
3 files changed, 147 insertions, 170 deletions
diff --git a/README b/README
deleted file mode 100644
index d4c9a0e..0000000
--- a/README
+++ /dev/null
@@ -1,150 +0,0 @@
-Copyright (C) 2015 The Android Open Source Project
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-
-================================================================================
-
-The Chrome OS "metrics" package contains utilities for client-side user metric
-collection.
-When Chrome is installed, Chrome will take care of aggregating and uploading the
-metrics to the UMA server.
-When Chrome is not installed (embedded build) and the metrics_uploader USE flag
-is set, metrics_daemon will aggregate and upload the metrics itself.
-
-
-================================================================================
-The Metrics Library: libmetrics
-================================================================================
-
-libmetrics is a small library that implements the basic C and C++ API for
-metrics collection. All metrics collection is funneled through this library. The
-easiest and recommended way for a client-side module to collect user metrics is
-to link libmetrics and use its APIs to send metrics to Chrome for transport to
-UMA. In order to use the library in a module, you need to do the following:
-
-- Add a dependence (DEPEND and RDEPEND) on chromeos-base/metrics to the module's
-  ebuild.
-
-- Link the module with libmetrics (for example, by passing -lmetrics to the
-  module's link command). Both libmetrics.so and libmetrics.a are built and
-  installed under $SYSROOT/usr/lib/. Note that by default -lmetrics will link
-  against libmetrics.so, which is preferred.
-
-- To access the metrics library API in the module, include the
-  <metrics/metrics_library.h> header file. The file is installed in
-  $SYSROOT/usr/include/ when the metrics library is built and installed.
-
-- The API is documented in metrics_library.h under src/platform/metrics/. Before
-  using the API methods, a MetricsLibrary object needs to be constructed and
-  initialized through its Init method.
-
-  For more information on the C API see c_metrics_library.h.
-
-- Samples are sent to Chrome only if the "/home/chronos/Consent To Send Stats"
-  file exists or the metrics are declared enabled in the policy file (see the
-  AreMetricsEnabled API method).
-
-- On the target platform, shortly after the sample is sent, it should be visible
-  in Chrome through "about:histograms".
-
-
-================================================================================
-Histogram Naming Convention
-================================================================================
-
-Use TrackerArea.MetricName. For example:
-
-Platform.DailyUseTime
-Network.TimeToDrop
-
-
-================================================================================
-Server Side
-================================================================================
-
-If the histogram data is visible in about:histograms, it will be sent by an
-official Chrome build to UMA, assuming the user has opted into metrics
-collection. To make the histogram visible on "chromedashboard", the histogram
-description XML file needs to be updated (steps 2 and 3 after following the
-"Details on how to add your own histograms" link under the Histograms tab).
-Include the string "Chrome OS" in the histogram description so that it's easier
-to distinguish Chrome OS specific metrics from general Chrome histograms.
-
-The UMA server logs and keeps the collected field data even if the metric's name
-is not added to the histogram XML. However, the dashboard histogram for that
-metric will show field data as of the histogram XML update date; it will not
-include data for older dates. If past data needs to be displayed, manual
-server-side intervention is required. In other words, one should assume that
-field data collection starts only after the histogram XML has been updated.
-
-
-================================================================================
-The Metrics Client: metrics_client
-================================================================================
-
-metrics_client is a simple shell command-line utility for sending histogram
-samples and user actions. It's installed under /usr/bin on the target platform
-and uses libmetrics to send the data to Chrome. The utility is useful for
-generating metrics from shell scripts.
-
-For usage information and command-line options, run "metrics_client" on the
-target platform or look for "Usage:" in metrics_client.cc.
-
-
-================================================================================
-The Metrics Daemon: metrics_daemon
-================================================================================
-
-metrics_daemon is a daemon that runs in the background on the target platform
-and is intended for passive or ongoing metrics collection, or metrics collection
-requiring feedback from multiple modules. For example, it listens to D-Bus
-signals related to the user session and screen saver states to determine if the
-user is actively using the device or not and generates the corresponding
-data. The metrics daemon uses libmetrics to send the data to Chrome.
-
-The recommended way to generate metrics data from a module is to link and use
-libmetrics directly. However, the module could instead send signals to or
-communicate in some alternative way with the metrics daemon. Then the metrics
-daemon needs to monitor for the relevant events and take appropriate action --
-for example, aggregate data and send the histogram samples.
-
-
-================================================================================
-FAQ
-================================================================================
-
-Q. What should my histogram's |min| and |max| values be set at?
-
-A. You should set the values to a range that covers the vast majority of samples
-   that would appear in the field. Note that samples below the |min| will still
-   be collected in the underflow bucket and samples above the |max| will end up
-   in the overflow bucket. Also, the reported mean of the data will be correct
-   regardless of the range.
-
-Q. How many buckets should I use in my histogram?
-
-A. You should allocate as many buckets as necessary to perform proper analysis
-   on the collected data. Note, however, that the memory allocated in Chrome for
-   each histogram is proportional to the number of buckets. Therefore, it is
-   strongly recommended to keep this number low (e.g., 50 is normal, while 100
-   is probably high).
-
-Q. When should I use an enumeration (linear) histogram vs. a regular
-   (exponential) histogram?
-
-A. Enumeration histograms should really be used only for sampling enumerated
-   events and, in some cases, percentages. Normally, you should use a regular
-   histogram with exponential bucket layout that provides higher resolution at
-   the low end of the range and lower resolution at the high end. Regular
-   histograms are generally used for collecting performance data (e.g., timing,
-   memory usage, power) as well as aggregated event counts.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..8d4828c
--- /dev/null
+++ b/README.md
@@ -0,0 +1,124 @@
+Metricsd
+========
+
+The metricsd daemon is used to gather metrics from the platform and application,
+aggregate them and upload them periodically to a server.
+The metrics will then be available in their aggregated form to the developer
+for analysis.
+
+Three components are provided to interact with `metricsd`: `libmetrics`,
+`metrics_collector` and `metrics_client`.
+
+The Metrics Library: libmetrics
+-------------------------------
+
+`libmetrics` is a small library that implements the basic C++ API for
+metrics collection. All metrics collection is funneled through this library. The
+easiest and recommended way for a client-side module to collect user metrics is
+to link `libmetrics` and use its APIs to send metrics to `metricsd` for transport to
+UMA. In order to use the library in a module, you need to do the following:
+
+- Add a dependency on the shared library in your Android.mk file:
+  `LOCAL_SHARED_LIBRARIES += libmetrics`
+
+- To access the metrics library API in the module, include the
+  <metrics/metrics_library.h> header file.
+
+- The API is documented in `metrics_library.h`. Before using the API methods, a
+  MetricsLibrary object needs to be constructed and initialized through its
+  Init method.
+
+- Samples are uploaded only if the `/data/misc/metrics/enabled` file exists.
+
+
+Server Side
+-----------
+
+You will be able to see all uploaded metrics on the metrics dashboard,
+accessible via the developer console.
+
+*** note
+It usually takes a day for metrics to be available on the dashboard.
+***
+
+
+The Metrics Client: metrics_client
+----------------------------------
+
+`metrics_client` is a simple shell command-line utility for sending histogram
+samples and querying `metricsd`. It's installed under `/system/bin` on the target
+platform and uses `libmetrics`.
+
+For usage information and command-line options, run `metrics_client` on the
+target platform or look for "Usage:" in `metrics_client.cc`.
+
+
+The Metrics Daemon: metricsd
+----------------------------
+
+`metricsd` is the daemon that listens for metrics logging calls (via Binder),
+aggregates the metrics and uploads them periodically. This daemon should start as
+early as possible so that depending daemons can log at any time.
+
+`metricsd` is made of two threads that work as follows:
+
+* The binder thread listens for one-way Binder calls, aggregates the metrics in
+  memory (via `base::StatisticsRecorder`) and increments the crash counters when a
+  crash is reported. This thread is kept as simple as possible to ensure the
+  maximum throughput possible.
+* The uploader thread takes care of backing up the metrics to disk periodically
+  (to avoid losing metrics on crashes), collecting metadata about the client
+  (version number, channel, etc..) and uploading the metrics periodically to the
+  server.
+
+
+The Metrics Collector: metrics_collector
+----------------------------------------
+
+metrics_collector is a daemon that runs in the background on the target platform,
+gathers health information about the system and maintains long running counters
+(ex: number of crashes per week).
+
+The recommended way to generate metrics data from a module is to link and use
+libmetrics directly. However, we may not want to add a dependency on libmetrics
+to some modules (ex: kernel). In this case, we can add a collector to
+metrics_collector that will, for example, take measurements and report them
+periodically to metricsd (this is the case for the disk utilization histogram).
+
+
+FAQ
+---
+
+### What should my histogram's |min| and |max| values be set at?
+
+You should set the values to a range that covers the vast majority of samples
+that would appear in the field. Note that samples below the |min| will still
+be collected in the underflow bucket and samples above the |max| will end up
+in the overflow bucket. Also, the reported mean of the data will be correct
+regardless of the range.
+
+### How many buckets should I use in my histogram?
+
+You should allocate as many buckets as necessary to perform proper analysis
+on the collected data. Note, however, that the memory allocated in metricsd
+for each histogram is proportional to the number of buckets. Therefore, it is
+strongly recommended to keep this number low (e.g., 50 is normal, while 100
+is probably high).
+
+### When should I use an enumeration (linear) histogram vs. a regular (exponential) histogram?
+
+Enumeration histograms should really be used only for sampling enumerated
+events and, in some cases, percentages. Normally, you should use a regular
+histogram with exponential bucket layout that provides higher resolution at
+the low end of the range and lower resolution at the high end. Regular
+histograms are generally used for collecting performance data (e.g., timing,
+memory usage, power) as well as aggregated event counts.
+
+### How can I test that my histogram was reported correctly?
+
+* Make sure no error messages appear in logcat when you log a sample.
+* Run `metrics_client -d` to dump the currently aggregated metrics. Your
+  histogram should appear in the list.
+* Make sure that the aggregated metrics were uploaded to the server successfully
+  (check for an OK message from `metricsd` in logcat).
+* After a day, your histogram should be available on the dashboard.
diff --git a/uploader/upload_service.h b/uploader/upload_service.h
index 1d36121..420653e 100644
--- a/uploader/upload_service.h
+++ b/uploader/upload_service.h
@@ -34,30 +34,33 @@
 
 class SystemProfileSetter;
 
-// Service responsible for uploading the metrics periodically to the server.
-// This service works as a simple 2-state state-machine.
+// Service responsible for backing up the currently aggregated metrics to disk
+// and uploading them periodically to the server.
 //
-// The two states are the presence or not of a staged log.
-// A staged log is a compressed protobuffer containing both the aggregated
-// metrics and event and information about the client. (product,
-// model_manifest_id, etc...).
+// A given metrics sample can be in one of three locations.
+// * in-memory metrics: in memory aggregated metrics, waiting to be staged for
+//   upload.
+// * saved log: protobuf message, written to disk periodically and on shutdown
+//   to make a backup of metrics data for uploading later.
+// * staged log: protobuf message waiting to be uploaded.
 //
-// At regular intervals, the upload event will be triggered and the following
-// will happen:
-// * if a staged log is present:
-//    The previous upload may have failed for various reason. We then retry to
-//    upload the same log.
-//    - if the upload is successful, we discard the log (therefore
-//      transitioning back to no staged log)
-//    - if the upload fails, we keep the log to try again later.
+// The service works as follows:
+// On startup, we create the in-memory metrics from the saved log if it exists.
 //
-// * if no staged logs are present:
-//    Take a snapshot of the aggregated metrics, save it to disk and try to send
-//    it:
-//    - if the upload succeeds, we discard the staged log (transitioning back
-//      to the no staged log state)
-//    - if the upload fails, we continue and will retry to upload later.
+// Periodically (every |disk_persistence_interval_| seconds), we take a snapshot
+// of the in-memory metrics and save them to disk.
 //
+// Periodically (every |upload_interval| seconds), we:
+// * take a snapshot of the in-memory metrics and create the staged log
+// * save the staged log to disk to avoid losing it if metricsd or the system
+//   crashes between two uploads.
+// * delete the last saved log: all the metrics contained in it are also in the
+//   newly created staged log.
+//
+// On shutdown (SIGINT or SIGTERM), we save the in-memory metrics to disk.
+//
+// Note: the in-memory metrics can be stored in |current_log_| or
+// base::StatisticsRecorder.
 class UploadService : public base::HistogramFlattener, public brillo::Daemon {
  public:
   UploadService(const std::string& server,
author	Bertrand SIMONNET <bsimonnet@google.com>	2015-12-21 11:50:58 -0800
committer	Bertrand SIMONNET <bsimonnet@google.com>	2016-01-06 17:19:30 -0800
commit	5451fc175104bba335308ffe75b22a7724ef1434 (patch)
tree	26c38dfa556d28b6c8f4166d2bd7dd3b337cf969
parent	b4f534d77926867f2cded1f1c0d82381b32cceb8 (diff)
download	metricsd-5451fc175104bba335308ffe75b22a7724ef1434.tar.gz