Age | Commit message (Collapse) | Author |
|
Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes
< 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from
csv. We leverage this and write out a data frames into a csv file when they are
created for the first time. Next time we read it out if it exists. To make
sure, the cache isn't stale, we take the md5sum of the trace file and also
ensure all CSVs exist before reading from the cache. I get a speed up of 16s to
1s when parsing a 100MB trace.
Co-developed-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
tests/test_ftrace.py:
Stop creating and checking for existence of <trace>.raw.txt files.
tests/test_sched.py:
Stop creating raw.txt file.
tests/raw_trace.raw.txt:
Remove the file as it is no longer needed. The raw-formatted events
are moved into raw_trace.txt
tests/raw_trace.txt:
Replace the formatted sched_switch events with raw formatted events,
like we get when parsing a trace file.
tests/trace_empty.txt:
Remove the default sched_ events we would expect to parse in raw
format - since we want to have events here which we are not looking
for.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
This patch set is to modify the parsing step so that we don't need to
build a raw and a formatted trace file. To do that, we need to know
which events should have raw output and which should use their default
formatting. The events indicate which are which, but currently we
generate the trace files before we populate the events.
Splitting the initialisation into two parts means that we can populate
the events so that a later patch can create the text trace with each
event either formatted or raw as required.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
I promised @derkling I would write this so here you go.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
FTrace can be configured to report events timestamp using different
clock sources which can be selected via the trace_clock sysfs attribute
as described in:
https://www.kernel.org/doc/Documentation/trace/ftrace.txt
The global clock source reports time in [s] with a [us] resolution.
Other sources instead, like for example the boot clock, uses [ns].
Thus, in these last cases we do not have decimals in the timestamp.
Let's update the special fields regexp to match both [s] and [ns]
formatted times and do the required pre-processing to ensure that
DataFrames are alwasy expressed using a [s].[decimals] format.
This also update the base test to add a set of trace events which are
expressed in [ns] resolution.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Due to the recent patches to skip invalid lines using a regex, this test case
fails. Fix it by appending the invalid line to the end of the test trace.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Let's test that the parsed events have the proper line number reported
in their DF.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Useful for joining DataFrames that have timestamp collisions or for
iterating through a group of DataFrames in line order.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: KP Singh <kpsingh@google.com>
|
|
ftrace: Improve error messages when failing to parse trace
javimerino: Added a test and fixed a typo.
|
|
|
|
|
|
base: Don't error on event field values containing '='
|
|
If a trace field value contains a '=', we currently get a ValueError('Too many
values to unpack'). Instead, let's only split on the first '='.
In practice if a field value contains a '=' it's probably because a kernel
developer typo'd a custom event like:
/* Note missing space between "%d" and "bar" */
trace_printk("my_broken_event: foo=%dbar=%d", foo, bar)
So I did consider raising an explicit 'malformed event field' error. But this
approach is more flexible in case someone really wanted to trace fields
containing strings with '=' in them.
|
|
If you use the Parser to access events that Trappy understands, but that are not
present in the trace, you currently get an inscrutable exception when trying to
access `.loc[ self._window[0]:]` on the empty DataFrame in _get_data_frame.
Ideally attempting to parse absent events would just return an empty DataFrame,
but then we don't know what columns it should have. So instead let's just raise
a more helpful error saying that the event is not present.
|
|
|
|
Pandas DataFrame columns may be named with any hashable object.
|
|
This requires removing cpu_idle events from the trace_empty.txt, so that
the FTrace tests that use it are not broken.
|
|
The EAS patchset includes an enhancement of sched_migrate_task that
includes a enum field printed with __print_symbolic that only gets
parsed in the trace.txt and not in the trace.raw.txt.
Remove the instances of sched_migrate_task from the trace_empty file to
keep the tests testing a trace that is "empty", that is, no event in it
is parsed.
|
|
grammar: apply filters to data accesses
|
|
Change-Id: I85327148401972ec477fd6c43889a487bc8ae083
|
|
Parser() operations happens across the whole event. While it is
possible to filter the events and add them back to the trace object with
.add_parsed_event(), it is a kludge that we could remove by bringing the
concept of filters from the plotters to here.
With this change, we can simplify this:
ftrace = trappy.FTrace(trace_fname)
sbt_dfr = ftrace.sched_boost_task.data_frame
boost_task_rtapp = sbt_dfr[sbt_dfr.comm == rta_task_name]
ftrace.add_parsed_event("boost_task_rtapp", boost_task_rtapp)
analyzer = Analyzer(ftrace, {})
analyzer.assertStatement("blah")
To:
ftrace = trappy.FTrace(trace_fname)
analyzer = Analyzer(ftrace, filters={"comm": "rta_task_name"})
analyzer.assertStatement("blah")
This fixes #145
|
|
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
|
|
ILinePlot: only pass the necessary data when xlim is passed
|
|
xlim limits the x axis to a given range, but for ILinePlot we pass all
the data to dygraph and then let dygraph apply the window. That means
that we include a lot of useless data in the notebook and we lose
performance parsing data that will never be plotted.
Improve xlim for ILinePlot so that it only embeds the data relevant for
the plot.
|
|
Plotter color everywhere v2
|
|
Now that we have color support in all plots, augment the signal spec to
have color as an additional third parameter. With this syntax, signal
"thermal:temp:255,0,255" plots the temp column of the thermal trace in
pink.
|
|
Make the signals test more thorough.
|
|
The static plotter accepts a colors parameter to change the colors of
the lines. Let the ILinePlot accept the same argument with the same
syntax so that we can customize the colors of the plots in the same way
for all plots created by trappy.plotter.
|
|
specified
When you try to plot a dataframe without specifying the column, the code
that checks it fails with
KeyError: "column"
Fix the check so that the appropriate error (ValueError: Column not
specified for DataFrame input) is printed.
|
|
Old systrace html files don't use "<!-- BEGIN TRACE -->" to indicate
the beginning of the trace beginning or '<script class="trace-data"' for
the raw trace data.
This patch tries adds a compatible string for the old systrace format
preserving the support for the new systrace format. It adds
"<title>Android System Trace</title>" and " var linuxPerfData" to
indicate trace start and trace data start.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
|
|
The prototype of MultiTriggerAggregator sets aggfunc to be an optional
parameter (it defaults to None). Fix it so that callers of
MultiTriggerAggregator that don't specify an aggfunc actually work and
don't barf:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-141-1fd320357650> in <module>()
----> 1 l = vector_agg.aggregate(level="cluster")
/usr/local/share/trappy/trappy/stats/Aggregator.pyc in aggregate(self, **kwargs)
138 for group in level_groups:
139 group = listify(group)
--> 140 level_res = self._aggfunc(self._result[group[0]], **kwargs)
141
142 for node in group[1:]:
TypeError: 'NoneType' object is not callable
|
|
Since the dawn of time we have a "data_frame" variable in ILinePlot that
is actually defined as a pd.Series() but it's acting as a dict. Rename
it and turn the pd.Series into a dict, as it is simpler and it really is
what this variable moves around.
|
|
Signed-off-by: Kapileshwar Singh <kpsingh@google.com>
|
|
This tests checks the following:
- given two data series with different indexes
- ILinePlot will merge them using _fix_indexes
- the set of indexes of the merged series must be the union of the set of
indexes of the initial series
|
|
612384c (ftrace: match comm names that have a '[' as part of their name)
fixed parsing of special fields to match also names with contains a '['
by forcing a task name to match up to 16 chars.
This regexp seems to not be robust enough for systrace collected traces.
Indeed, not only names can contains the '[' char but, in general, they can
also be longer than 16 chars.
What happen is that, while the linux kernel truncates a task name (comm) to
16 chars, the systrace tool tries its best to "fix" that by fixing truncated
names. Specifically, this is done by the "atrace" agent, e.g.
https://github.com/catapult-project/catapult/blob/0d71be6/systrace/systrace/agents/atrace_agent.py#L583
Thus, in traces collected via systrace/atrace, not only names can contain
the '[' character but they can also be longer than 16 chars.
This patch uses a more generic reg-exp, which is a refinemend of the one
proposed in PR #90, which finally should allow to match task names of any
length and set of used characters.
A meaningless but challenging task name has been added in the test to
proof the patch and ensure a more robust coverage of this "sensible" code.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|
|
SysTrace does not report the same medatata generated by trace-cmd
report. The "_cpus" attribute is required by plot_trace to properly
dimension the plot.
This patch ensures that this attribute is available by estimating the
number of CPUs from the events available in the trace. The sched_switch
event is always enabled, thus it should be good enough to calculate the
number of cpus.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|
|
For trace-cmd traces, we use the raw trace to parse sched_switch.
That's not possible for SysTrace traces, we don't have access to the raw
data. Luckily SysTrace doesn't mangle the data as much as trace-cmd's
plugin, so a simple string replace is enough to get rid of a stray
"==>".
|
|
The SysTrace class has the same interface as FTrace. SysTrace doesn't
support raw traces. The only special thing that we have to do is skip
all the HTML in the file and parse the actual trace.
|
|
If one of the dataframes passed to ILinePlot have duplicate indexes,
view() fails with
File "/home/javi/src/trappy/trappy/tests/trappy/plotter/ILinePlot.py", line 151, in view
self._plot(self._attr["permute"], test)
File "/home/javi/src/trappy/trappy/tests/trappy/plotter/ILinePlot.py", line 199, in _plot
data_frame = self._fix_indexes(data_frame)
File "/home/javi/src/trappy/trappy/tests/trappy/plotter/ILinePlot.py", line 243, in _fix_indexes
merged_df = pd.concat(data_frame.get_values(), axis=1)
File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 813, in concat
return op.get_result()
File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 966, in get_result
tmpdf = DataFrame(data, index=index)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 226, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 363, in _init_dict
dtype=dtype)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 5163, in _arrays_to_mgr
arrays = _homogenize(arrays, index, dtype)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 5465, in _homogenize
v = v.reindex(index, copy=False)
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2268, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1962, in reindex
method, fill_value, copy).__finalize__(self)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1981, in _reindex_axes
fill_value=fill_value, copy=copy, allow_dups=False)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 2073, in _reindex_with_indexers
copy=copy)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3503, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 2086, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Fix it by filtering the DataFrames through handle_duplicate_index()
before merging them.
Change-Id: I34e6af523d14e35c3a17652924fbaa5989c1243d
|
|
Let view() parse the test parameter so that we can test ILinePlot from
the testsuite. It's not a complete solution, but it's good enough to
get us started.
Change-Id: Ie0ce7a541b8714b579118143208b818277a585a6
|
|
Test fixes
|
|
When you try to plot a dataframe with only one index, matplotlib
complains:
---8<---
/usr/lib/python2.7/dist-packages/matplotlib/axes/_base.py:2787: UserWarning: Attempting to set identical left==right re
sults
in singular transformations; automatically expanding.
left=0.0, right=0.0
'left=%s, right=%s') % (left, right))
---8<---
Change the timestamp of the second event to make it more realistic and
make sure that the dataframe that is plotted has more than one row.
|
|
pandas 0.17.1 has a more accurate assert_series_equal() that makes
test_filter_prev_values fail because the index name is different:
======================================================================
FAIL: Trigger works with a filter that depends on previous values of the same pivot
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/javi/src/trappy/tests/test_stats.py", line 227, in test_filter_prev_values
assert_series_equal(expected, trigger.generate("blank"))
File "/usr/lib/python2.7/dist-packages/pandas/util/testing.py", line 924, in assert_series_equal
obj='{0}.index'.format(obj))
File "/usr/lib/python2.7/dist-packages/pandas/util/testing.py", line 694, in assert_index_equal
assert_attr_equal('names', left, right, obj=obj)
File "/usr/lib/python2.7/dist-packages/pandas/util/testing.py", line 729, in assert_attr_equal
left_attr, right_attr)
File "/usr/lib/python2.7/dist-packages/pandas/util/testing.py", line 819, in raise_assert_detail
raise AssertionError(msg)
AssertionError: Series.index are different
Attribute "names" are different
[left]: [u'Time']
[right]: [None]
----------------------------------------------------------------------
The series that trigger generates don't have a name for the index.
Remove the named index from the expected output so that the test passes
again.
|
|
If tar fails it's a test error. Don't blindly ignore it.
|
|
990f925224107 ("run: update special fields regexp to parse task names with a
space") fixed parsing task names that had spaces by assuming that no task
would have a "[" as part of their name. The day has come that we have found
a task with a "[".
Set the regexp to match anything up to 16 characters. The kernel seems to
limit the task name to 16 characters so this should be safe from now on (until
we find another task that breaks the regexp, that is).
|
|
|
|
Calculating our the timestamp for the index of events is not needed. Tell
trappy to not normalize the time and that way we don't need to do the
error-prone calculation ourselves.
|
|
timestap -> timestamp. It was working because a previous timestamp was
present.
|