Age | Commit message (Collapse) | Author |
|
Change-Id: I1c291fba04d70b5a5613a495a404bc38de76e1fd
|
|
Adds trace parsing for ext4_da_write_begin, ext4_da_write_end,
ext4_sync_file_enter, and ext4_sync_file_exit events
Test: run the 4 new unit tests (nosetests tests/test_filesystem.py)
Change-Id: I9b7049d5e9879cc3dedb5fd9d3d1433fa8304618
|
|
Traces from clock_set_rate, clock_enable, clock_disable need
specialized parsing. Add additional feature to parse these traces.
Test: run the 3 new unit tests (nosetests tests/test_common_clk.py)
Change-Id: Ib93b72697fc4d5eb30cffb914bbe0cb4c4cd872d
|
|
bare_trace: Fix get_duration() for window use
|
|
The previous implementation only looked at the maximum timestamp
of the traces. However, if trace windows are used to analyse a
subset of the full trace, get_duration() would return an
erroneous value:
(times are normalized)
trace = <trace built with window=[5, 10]>
print trace.get_duration()
> 10.0
This commit adds a lower-bound lookup to fix this issue.
tests/test_baretrace.py has also been updated to account
for this change.
|
|
There are some issues with caching at the moment which I haven't been
able to debug but that are confirmed as affecting multiple
people. Until we can find the time to fix the issue, let's disable
caching by default.
|
|
Classes that implement the generate_data_dict method have an additional argument
to the signature. This leads to errors in case the base class method is called
with the additional argument.
This patch also adds two tests for FTrace and SysTrace parsing.
|
|
To prevent annoying warning.
Change-Id: I60f006a3c806864939019a90d12b236487fb51aa
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
There's a bug in trappy caching where corruption or removal of single CSVs
causes all CSVs to get removed. Fix this and add a test-case to cover it.
Also test that only missing items are regenerated.
Change-Id: Iea41c8e0a005515790b580e7f78f06bdd60141a3
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Change-Id: I238bb50ed1907def19b23b0610eec87234ef4d51
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
Change-Id: I3e8ff4d9d11220cddeac955e9a949fc3464ecc36
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
Forward propogate secondary DF into primary DF and return the merged DF.
Implements: https://github.com/ARM-software/trappy/issues/250
Change-Id: I312d77302bbca8bb13bfa598785ebc0cc879fe34
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
There's a bug in trappy caching where corruption or removal of single CSVs
causes all CSVs to get removed. Fix this and add a test-case to cover it.
Also test that only missing items are regenerated.
Change-Id: Iea41c8e0a005515790b580e7f78f06bdd60141a3
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
- json params loading represents keys as unicode, use dumps to fix it
- finalize_objects need to be run always for cases where some events not
cached, fix it by moving it up and also handle bjackman's concern that we run
it twice by skipping finalize for cached events.
- fix time normalization breakage as a result of the above changes
Change-Id: I2011de0ae8112e937ee61baee8d53a63d0bbe85a
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
- json params loading represents keys as unicode, use dumps to fix it
- finalize_objects need to be run always for cases where some events not
cached, fix it by moving it up and also handle bjackman's concern that we run
it twice by skipping finalize for cached events.
- fix time normalization breakage as a result of the above changes
Change-Id: I2011de0ae8112e937ee61baee8d53a63d0bbe85a
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
Change-Id: Ie79698d90e0406cc11c52d364144ec08c33dfac4
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
Lets move this out of the main parsing loop so that the code is easier to read.
Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
generate_data_dict uses try/except for integer conversion. This is expensive
consider the frequency of the calls. Optimize it by using a slightly more
uglier but much fast version of the same. With this I get a speed of ~8.5% in
parsing.
Change-Id: I909ad170756fd284c7d924950945e755880ceb5f
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
I found that trappy spends a lot of time looking for the array pattern "{}".
Vast majority of trace lines don't have them. Instead of running the regex
every time, check for the ={} pattern using the 'in' operator which is much
faster than the regex for the common case, and then do the regex sub if needed.
The speed up is huge and in my test run, I saw parsing stage on a 100MB trace
speed up from 13s -> 10.5s !!
Change-Id: I7ae68efc99eaf8844624871be2a4d66a7820a9b0
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Its not necessary to check for unique word twice. Reusing the 'class detection'
path for this purpose is sufficient. This improve performance by 3-6% of the
parsing stage.
Change-Id: Iff8ebd0086d2ccc10bec14e3d40a63f3c04aaf1a
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Some trace analysis tasks are most easily expressed by iterating over
events in chronological order. Add an apply_callbacks() method to the
GenericFtrace class that will run user-defined functions on trace
events in the order they occurred.
Signed-off-by: Connor O'Brien <connoro@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Store a JSON file with the parameters used to parse a trace in the cache. If, on
retrieval, we find a mismatch then simply invalidate the cache.
A richer implementation might detect whether we can still use the cache (e.g. if
the requested window is a subset of the cached one), but for now I don't think
that's worth the effort/complexity.
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes
< 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from
csv. We leverage this and write out a data frames into a csv file when they are
created for the first time. Next time we read it out if it exists. To make
sure, the cache isn't stale, we take the md5sum of the trace file and also
ensure all CSVs exist before reading from the cache. I get a speed up of 16s to
1s when parsing a 100MB trace.
Co-developed-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Since we no longer generate these, we should let the user either remove
their text files manually or merge the files if they no longer have the
.dat file. The exception describes how to deal with this format change
and which events should be moved from the raw to the formatted text file
if the user no longer has the .dat file.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Now that we have events constructed before generating trace files, we can
scan them and determine which events need to be formatted and which should
be unformatted (raw). trace-cmd report lets us specify events we want to
print raw. Use this to produce a single trace.txt which has all the
necessary formatted & raw events instead of calling trace-cmd report
twice.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
This patch set is to modify the parsing step so that we don't need to
build a raw and a formatted trace file. To do that, we need to know
which events should have raw output and which should use their default
formatting. The events indicate which are which, but currently we
generate the trace files before we populate the events.
Splitting the initialisation into two parts means that we can populate
the events so that a later patch can create the text trace with each
event either formatted or raw as required.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
When the sched_switch event is added to the parser the constructor is
not called so we never set the parse_raw flag. Ensure there is a class
variable to hold the correct value and use it in the constructor.
We use the raw file to extract the events since we actually do construct
one of these objects during parsing.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
The regex is broken for strings like this:
C|594|HW_VSYNC_ON_0|1
This is because the "1" (data) is optional and so the greedy operator globs everything causing
func = "HW_VSYNC_ON_0|1" and data = NaN
Change the data to be greedy until '|' is encountered so with that:
func = "HW_VSYNC_ON_0" and data = "1".
Change-Id: I3695a8da8ad10598192cea08ca92e27271565ccf
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
FTrace can be configured to report events timestamp using different
clock sources which can be selected via the trace_clock sysfs attribute
as described in:
https://www.kernel.org/doc/Documentation/trace/ftrace.txt
The global clock source reports time in [s] with a [us] resolution.
Other sources instead, like for example the boot clock, uses [ns].
Thus, in these last cases we do not have decimals in the timestamp.
Let's update the special fields regexp to match both [s] and [ns]
formatted times and do the required pre-processing to ensure that
DataFrames are alwasy expressed using a [s].[decimals] format.
This also update the base test to add a set of trace events which are
expressed in [ns] resolution.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Since Patrick changed hasnt_started to use the regex for checking if a line is
valid, its better if we don't use the hardcoded logic we have now that checks
if the trace line begins with '#'. Instead lets piggy back onto the base class
if systrace thinks the line shouldn't be skipped. This makes it a more
consistent design.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Useful for joining DataFrames that have timestamp collisions or for
iterating through a group of DataFrames in line order.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Knowing the line number of a trace event into the original trace file can
be super useful for two main reasons:
1. it can be used as a truly unique ID of each DF entry, since it is not
subject to the possible index collisions due to the timestamp resoluton
2. it can be super-useful to correlate DF entries with the original trace
If this ID is also tracked the same way tools like kernelshark does, than
we have a definitive and easy tool to match DF records with kernelshark.
Since kernelshark uses a 0-based indexing for trace events, let's account
for proper trace events line numbers starting from 0. This number is available
as a class property, in case we are interested to know to real total number
of events which are available in the original trace. Moreover, it will be
used by a following patch to add a unique ID to each DF entry.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Currently we open an FTrace two times: one to parse metadata and another
to actually parse trace events. The second time we open the trace we don't
skip the metadata since they are filtered out by the attention words match.
Since trappy is now empowered by the trace_hasnt_started() iterator filter,
let's use this to skip the metadata before entering the parsing loop.
This change is preparatory for a proper accounting of trace events from
the parsing loop.
While we are at that, let's be eco-friendly and recycle the special field
regexp also for the exit condition of __populate_metadata().
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Certain fake events can be injected using tracing_mark_write and should be
classified into their own event classes than tracing_mark_write. One way to
handle this is to use tracing_mark_write only as a backup. For this reason lets
introduced the concept of backup event classes and classify TracingMarkWrite as
that.
This case normally doesn't happen because dynamic events are inserted into the
beginning of the event list from what I've seen. Since tracing_mark_write is a
built-in event, and not dynamic, its always at after and so is less preferred.
However we should protect for the future case where there might be a built-in
which happens to be after TrackingMarkWrite class in the list, so lets handle
it now.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Here we provide custom data string parsing for Android userspace events.
Borrowed the regex and parsing logic from Patrick's patch.
Co-developed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Create tracing_mark_write as an event and treat it like such. It overrides data
string processing since it cannot use 'key=value' parsing. Instead it relies on
the tracer to provide one (such as systrace has custom trace format from
userspace events). If tracer cannot provide one, then it just returns a default
dictionary with key 'string' and value as the string in the event.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
In preparation of being able to parse systrace, split out data string
processing into separate function so it can be overridden by sub classes.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
This is needed so that we're not dependent anymore on making sure data starts
at something with '=' and will allow to parse custom systrace
tracing_mark_write events which don't have an '=' in them. The new regex will
also work for lines that do already have an '=' and doesn't change their
behavior.
The last line of the new regex in this patch works like this:
r"(?P<timestamp>[0-9]+\.[0-9]+): (\w+:\s+)+(?P<data>.+)"
We use skip over '<string>:<whitespace> ' and the next thing we find is
considered the start of the data. This works well to find the start of a line
for existing traces and is needed for systraces which don't have '='.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
ftrace: Improve error messages when failing to parse trace
javimerino: Added a test and fixed a typo.
|
|
|
|
This patch reduces the frequency of calling getrusage resulting in a ~3-5%
performance improvement in parsing. We do a check in the beginning and then
check once every 10000 events.
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
|
|
base: Don't error on event field values containing '='
|
|
If a trace field value contains a '=', we currently get a ValueError('Too many
values to unpack'). Instead, let's only split on the first '='.
In practice if a field value contains a '=' it's probably because a kernel
developer typo'd a custom event like:
/* Note missing space between "%d" and "bar" */
trace_printk("my_broken_event: foo=%dbar=%d", foo, bar)
So I did consider raising an explicit 'malformed event field' error. But this
approach is more flexible in case someone really wanted to trace fields
containing strings with '=' in them.
|
|
stats/grammar: Raise helpful error when asked to parse absent events
|
|
If you use the Parser to access events that Trappy understands, but that are not
present in the trace, you currently get an inscrutable exception when trying to
access `.loc[ self._window[0]:]` on the empty DataFrame in _get_data_frame.
Ideally attempting to parse absent events would just return an empty DataFrame,
but then we don't know what columns it should have. So instead let's just raise
a more helpful error saying that the event is not present.
|
|
If you typo a trace event name in a call to Parser.solve, you currently get a
KeyError from trying to access the result of globals(); the stacktrace goes
through a pyparsing callback to str_to_attr via _pre_process ("""Pre-process
accessors for super-indexing""")...
Add an exception with a message to hint at what went wrong.
|
|
|