Age | Commit message (Collapse) | Author |
|
This reverts commit 2b9b25c74416328a0f68af5b60b73534e4ccbd02.
|
|
|
|
Lets move this out of the main parsing loop so that the code is easier to read.
Suggested-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
generate_data_dict uses try/except for integer conversion. This is expensive
consider the frequency of the calls. Optimize it by using a slightly more
uglier but much fast version of the same. With this I get a speed of ~8.5% in
parsing.
Change-Id: I909ad170756fd284c7d924950945e755880ceb5f
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
I found that trappy spends a lot of time looking for the array pattern "{}".
Vast majority of trace lines don't have them. Instead of running the regex
every time, check for the ={} pattern using the 'in' operator which is much
faster than the regex for the common case, and then do the regex sub if needed.
The speed up is huge and in my test run, I saw parsing stage on a 100MB trace
speed up from 13s -> 10.5s !!
Change-Id: I7ae68efc99eaf8844624871be2a4d66a7820a9b0
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Its not necessary to check for unique word twice. Reusing the 'class detection'
path for this purpose is sufficient. This improve performance by 3-6% of the
parsing stage.
Change-Id: Iff8ebd0086d2ccc10bec14e3d40a63f3c04aaf1a
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
|
|
Some trace analysis tasks are most easily expressed by iterating over
events in chronological order. Add an apply_callbacks() method to the
GenericFtrace class that will run user-defined functions on trace
events in the order they occurred.
Signed-off-by: Connor O'Brien <connoro@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
|
|
Add test for invalid cache
Add test for caching with extra events
Add test for normalize_time and window parameters
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Store a JSON file with the parameters used to parse a trace in the cache. If, on
retrieval, we find a mismatch then simply invalidate the cache.
A richer implementation might detect whether we can still use the cache (e.g. if
the requested window is a subset of the cached one), but for now I don't think
that's worth the effort/complexity.
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes
< 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from
csv. We leverage this and write out a data frames into a csv file when they are
created for the first time. Next time we read it out if it exists. To make
sure, the cache isn't stale, we take the md5sum of the trace file and also
ensure all CSVs exist before reading from the cache. I get a speed up of 16s to
1s when parsing a 100MB trace.
Co-developed-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
|
|
|
|
tests/test_ftrace.py:
Stop creating and checking for existence of <trace>.raw.txt files.
tests/test_sched.py:
Stop creating raw.txt file.
tests/raw_trace.raw.txt:
Remove the file as it is no longer needed. The raw-formatted events
are moved into raw_trace.txt
tests/raw_trace.txt:
Replace the formatted sched_switch events with raw formatted events,
like we get when parsing a trace file.
tests/trace_empty.txt:
Remove the default sched_ events we would expect to parse in raw
format - since we want to have events here which we are not looking
for.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Since we no longer generate these, we should let the user either remove
their text files manually or merge the files if they no longer have the
.dat file. The exception describes how to deal with this format change
and which events should be moved from the raw to the formatted text file
if the user no longer has the .dat file.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Now that we have events constructed before generating trace files, we can
scan them and determine which events need to be formatted and which should
be unformatted (raw). trace-cmd report lets us specify events we want to
print raw. Use this to produce a single trace.txt which has all the
necessary formatted & raw events instead of calling trace-cmd report
twice.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
This patch set is to modify the parsing step so that we don't need to
build a raw and a formatted trace file. To do that, we need to know
which events should have raw output and which should use their default
formatting. The events indicate which are which, but currently we
generate the trace files before we populate the events.
Splitting the initialisation into two parts means that we can populate
the events so that a later patch can create the text trace with each
event either formatted or raw as required.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
When the sched_switch event is added to the parser the constructor is
not called so we never set the parse_raw flag. Ensure there is a class
variable to hold the correct value and use it in the constructor.
We use the raw file to extract the events since we actually do construct
one of these objects during parsing.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
generate_data_dict uses try/except for integer conversion. This is expensive
consider the frequency of the calls. Optimize it by using a slightly more
uglier but much fast version of the same. With this I get a speed of ~8.5% in
parsing.
Change-Id: I909ad170756fd284c7d924950945e755880ceb5f
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
I found that trappy spends a lot of time looking for the array pattern "{}".
Vast majority of trace lines don't have them. Instead of running the regex
every time, check for the ={} pattern using the 'in' operator which is much
faster than the regex for the common case.
The speed up is huge and in my test run, I saw parsing stage on a 100MB trace
speed up from 13s -> 10.5s !!
Change-Id: I7ae68efc99eaf8844624871be2a4d66a7820a9b0
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
Its not necessary to check for unique word twice. Reusing the 'class detection'
path for this purpose is sufficient. This improve performance by 3-6% of the
parsing stage.
Change-Id: Iff8ebd0086d2ccc10bec14e3d40a63f3c04aaf1a
Signed-off-by: Joel Fernandes <joelaf@google.com>
|
|
|
|
Some trace analysis tasks are most easily expressed by iterating over
events in chronological order. Add a run_event_callbacks() method to
the GenericFTrace class that will run user-defined functions on trace
events in the order they occurred.
Signed-off-by: Connor O'Brien <connoro@google.com>
|
|
|
|
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
The regex is broken for strings like this:
C|594|HW_VSYNC_ON_0|1
This is because the "1" (data) is optional and so the greedy operator globs everything causing
func = "HW_VSYNC_ON_0|1" and data = NaN
Change the data to be greedy until '|' is encountered so with that:
func = "HW_VSYNC_ON_0" and data = "1".
Change-Id: I3695a8da8ad10598192cea08ca92e27271565ccf
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
I promised @derkling I would write this so here you go.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
|
|
FTrace can be configured to report events timestamp using different
clock sources which can be selected via the trace_clock sysfs attribute
as described in:
https://www.kernel.org/doc/Documentation/trace/ftrace.txt
The global clock source reports time in [s] with a [us] resolution.
Other sources instead, like for example the boot clock, uses [ns].
Thus, in these last cases we do not have decimals in the timestamp.
Let's update the special fields regexp to match both [s] and [ns]
formatted times and do the required pre-processing to ensure that
DataFrames are alwasy expressed using a [s].[decimals] format.
This also update the base test to add a set of trace events which are
expressed in [ns] resolution.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
|
|
|
|
Due to the recent patches to skip invalid lines using a regex, this test case
fails. Fix it by appending the invalid line to the end of the test trace.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Since Patrick changed hasnt_started to use the regex for checking if a line is
valid, its better if we don't use the hardcoded logic we have now that checks
if the trace line begins with '#'. Instead lets piggy back onto the base class
if systrace thinks the line shouldn't be skipped. This makes it a more
consistent design.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Let's test that the parsed events have the proper line number reported
in their DF.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Useful for joining DataFrames that have timestamp collisions or for
iterating through a group of DataFrames in line order.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Knowing the line number of a trace event into the original trace file can
be super useful for two main reasons:
1. it can be used as a truly unique ID of each DF entry, since it is not
subject to the possible index collisions due to the timestamp resoluton
2. it can be super-useful to correlate DF entries with the original trace
If this ID is also tracked the same way tools like kernelshark does, than
we have a definitive and easy tool to match DF records with kernelshark.
Since kernelshark uses a 0-based indexing for trace events, let's account
for proper trace events line numbers starting from 0. This number is available
as a class property, in case we are interested to know to real total number
of events which are available in the original trace. Moreover, it will be
used by a following patch to add a unique ID to each DF entry.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Currently we open an FTrace two times: one to parse metadata and another
to actually parse trace events. The second time we open the trace we don't
skip the metadata since they are filtered out by the attention words match.
Since trappy is now empowered by the trace_hasnt_started() iterator filter,
let's use this to skip the metadata before entering the parsing loop.
This change is preparatory for a proper accounting of trace events from
the parsing loop.
While we are at that, let's be eco-friendly and recycle the special field
regexp also for the exit condition of __populate_metadata().
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
|
|
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Certain fake events can be injected using tracing_mark_write and should be
classified into their own event classes than tracing_mark_write. One way to
handle this is to use tracing_mark_write only as a backup. For this reason lets
introduced the concept of backup event classes and classify TracingMarkWrite as
that.
This case normally doesn't happen because dynamic events are inserted into the
beginning of the event list from what I've seen. Since tracing_mark_write is a
built-in event, and not dynamic, its always at after and so is less preferred.
However we should protect for the future case where there might be a built-in
which happens to be after TrackingMarkWrite class in the list, so lets handle
it now.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Here we provide custom data string parsing for Android userspace events.
Borrowed the regex and parsing logic from Patrick's patch.
Co-developed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Create tracing_mark_write as an event and treat it like such. It overrides data
string processing since it cannot use 'key=value' parsing. Instead it relies on
the tracer to provide one (such as systrace has custom trace format from
userspace events). If tracer cannot provide one, then it just returns a default
dictionary with key 'string' and value as the string in the event.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
In preparation of being able to parse systrace, split out data string
processing into separate function so it can be overridden by sub classes.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
This is needed so that we're not dependent anymore on making sure data starts
at something with '=' and will allow to parse custom systrace
tracing_mark_write events which don't have an '=' in them. The new regex will
also work for lines that do already have an '=' and doesn't change their
behavior.
The last line of the new regex in this patch works like this:
r"(?P<timestamp>[0-9]+\.[0-9]+): (\w+:\s+)+(?P<data>.+)"
We use skip over '<string>:<whitespace> ' and the next thing we find is
considered the start of the data. This works well to find the start of a line
for existing traces and is needed for systraces which don't have '='.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
into HEAD"
This reverts commit fdd827ae1b164ce7e05c5131d0c32f8028e395a5, reversing
changes made to 3bcad0b54476cdb1162a71ffdab54bfc4b80f9bf.
|
|
This reverts commit 0cf2110dc3824e4fdf1463b5b785b06c36474d7c.
|
|
This reverts commit da27bee4e96d3e2f4513d1f3394d781642ec2ce5.
|