Skip to content

[Diangostics][dotnet-trace] Add collect-linux verb #47894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Aug 7, 2025

Summary

Add new collect-linux verb to utilize new user_events based event tracing.


Internal previews

📄 File 🔗 Preview link
docs/core/diagnostics/dotnet-trace.md dotnet-trace performance analysis utility

@mdh1418 mdh1418 requested review from tommcdon and a team as code owners August 7, 2025 20:53
@dotnetrepoman dotnetrepoman bot added this to the August 2025 milestone Aug 7, 2025
@mdh1418 mdh1418 marked this pull request as draft August 7, 2025 21:16
@mdh1418 mdh1418 force-pushed the diagnostics_dotnet_trace_collect_linux branch from bc293c0 to e1a3ec7 Compare August 7, 2025 21:19
@@ -43,7 +43,7 @@ The `dotnet-trace` tool:
* Is a cross-platform .NET Core tool.
* Enables the collection of .NET Core traces of a running process without a native profiler.
* Is built on [`EventPipe`](./eventpipe.md) of the .NET Core runtime.
* Delivers the same experience on Windows, Linux, or macOS.
* On Linux, provides additional integration with kernel user_events for native tracing tool compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to tweak the wording a bit later, but not focusing on this for the moment :)

### Synopsis

```dotnetcli
dotnet-trace collect-linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can prune away some of these options

  • buffersize - If OneCollect can reasonably pick the size then we may not need an option to explicitly set it.
  • diagnostics-port - I think we'd only need this in some advanced scenarios. Given that we could be profiling multiple processes maybe we'd even need multiple ports? I'd suggest lets leave it out for now and wait to see what happens.
  • resume-runtime - if we don't have diagnostics port then we shouldn't need this.
  • event-filters - I'd suggest we leave this out for now to keep things simpler.
  • tracepoint-configs - can we leave this out? If the user cares about the tracepoint names that imples they are going to use some other tool besides dotnet-trace to record the events. But if that is true I'm not sure why they'd want dotnet-trace to be creating a nettrace file as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a scenario where users are capturing events with another tool, but they want to start capturing runtime events from their .NET app as well, and just use dotnet-trace as the medium to start a user_events EventPipeSession?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can conceive of that scenario, but I'm suggesting we don't do anything to proactively support it right now to keep the scope smaller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pruned and grouped the options into functionality (the previous order already deviates from dotnet-trace collect --help, and unless there's a reason to keep the options in some sort of alphabetical order, I think the functional order is easier to understand)

[--diagnostic-port]
[-o|--output <trace-file-path>]
[-p|--process-id <pid>]
[--profile <profile-name>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to decide what profiles are available and which events they each collect. I expect this is mostly the same as for the 'collect' verb, but cpu-sampling probably should be different. We also might want a thread-time profile that collects context switches.

If we do update these profiles, we should strongly consider also renaming the highly misleading "cpu-sampling" profile for the collect verb. Currently that profile collects thread-time information, not CPU samples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lightly updated the profiles for both collect and collect-linux. Still up for refinement once I have a better understanding of how to mix/match the .NET events and linux perf events. I'm wondering if we should allow --profiles if we want to offer several orthogonal profiles and allow customers to mix/match. We already have some logic to mix a profile with the providers/keywords as well.

# All enabled events from MyCustomProvider will be written to MyCustomProvider_custom_events
```

- **`--kernel-events <list-of-kernel-events>` (optional)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should work with Beau to figure out what kind of event names can be supported. For example https://man7.org/linux/man-pages/man1/perf-record.1.html shows a whole bunch of different things can be specified as an event and I have no idea what part of that OneCollect handles.

I'm hoping we can at least support anything in /sys/kernel/tracing/available_events as well as the symbolic PMU events like cpu-cycles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like all events under /sys/kernel/tracing/available_events can be enabled, so I changed the wording to suggest that all of those events found there or found categorically under /sys/kernel/tracing/events/ can be enabled through this option --perf-event-tracepoints.

| `net` | Network-related events | `net:netif_rx`, `net:net_dev_xmit` |
| `fs` | Filesystem I/O events | `ext4:*`, `vfs:*` |
| `mm` | Memory management events | `kmem:*`, `vmscan:*` |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone specified "sched:sched_wakeup,sched:sched_switch" what provider name, event name, and field lists would we expect to show up in the nettrace file? (This level of detail may not go into the docs, but we should understand it ourselves to decide what info should go in the docs)

- **perf**: Use `perf list user_events*` to see available events
- **System monitoring tools**: Any tool that can consume Linux tracepoints

### Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower in this doc are some examples of using dotnet-trace collect in various situations. No need just yet, but we'd probably want to update or add to those examples for the collect-linux functionality once we've clarified what it will be.

Copy link
Contributor

PoliCheck Scan Report

The following report lists PoliCheck issues in PR files. Before you merge the PR, you must fix all severity-1 issues. Other issues are also a high priority. The AI Review Details column lists suggestions for either removing or replacing the terms. If you find a false positive result, mention it in a PR comment and include this text: #policheck-false-positive. This feedback helps reduce false positives in future scans.

✅ No issues found

More information about PoliCheck

Information: PoliCheck | Severity Guidance | Term
For any questions: Try searching the learn.microsoft.com contributor guides or post your question in the Learn support channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants