Articles by Stephen McQuistin
Analysis of IETF mailing lists in 2020
by Stephen McQuistin • Tuesday 20 April 2021 • PermalinkIn 2020, there were 118,537 unique e-mails sent across IETF mailing lists by 3,616 addresses. In this blog post, we briefly outline highlights and trends in the dataset, analyse the ietf@ietf.org list, and describe the raw dataset.
Highlights and trends
In 2020, we saw the following highlights:
- There was activity (i.e., at least one e-mail sent) across 335 mailing lists, with 16,612 e-mails to the quic-issues list, making up 12.5% of all e-mails in the dataset
- 17,045 e-mails were sent from notifications@github.com and noreply@github.com: these addresses were the first and third biggest contributers of e-mail respectively. This highlights the growing role that GitHub plays in managing working group activity.
- Addresses were active on an average of 3.3 mailing lists
- Active addresses sent 33 e-mails on average
Figures 1 and 2 show CDFs of the e-mail volume and list participation, respectively, for each address. These figures show that participation across IETF mailing lists is heavy-tailed: 93.4% of addresses contributed fewer than 100 e-mails each, with 71.2% sending fewer than 10 each. Similarly, approximately 80% of addresses were active in 3 or fewer mailing lists.
Matching e-mail addresses to Datatracker profiles
The raw IETF mail archive dataset can be noisy, with automated e-mails from notification addresses, and participants splitting their activity across multiple e-mail accounts. To tidy up the dataset, we attempted to match e-mail senders with Datatracker profiles. Names and e-mail addresses were extracted from the From header of the e-mail, and these were matched against the Datatracker. Many exact matches were found using e-mail addresses, but were this was not possible, a number of heuristics were applied to the name to find a match. In some cases, no Datatracker account could be matched to a given sender.
In total, 88,662 e-mails were sent by participants matched to their Datatracker profile, representing 74.8% of the total e-mails in the dataset. We found that 2,088 people (i.e., Datatracker users) sent e-mails across 319 mailing lists.
Name | E-mail count (percentage of all e-mail) |
---|---|
Martin Thomson | 6577 (7.42%) |
Jana Iyengar | 2024 (2.28%) |
Michael Richardson | 1629 (1.84%) |
Mike Bishop | 1492 (1.68%) |
Carsten Bormann | 1253 (1.41%) |
List name | E-mail count (percentage of all e-mail) |
---|---|
quic-issues | 12884 (13.06%) |
ietf | 5356 (5.43%) |
ipv6 | 4247 (4.31%) |
last-call | 2736 (2.77%) |
dmarc | 2008 (2.04%) |
Tables 1 and 2 show the top 5 e-mail senders and mailing lists, respectively. Table 1 highlights the purpose of matching senders to Datatracker profiles: the significant portion of e-mail assigned to the top participants was sent via GitHub issues, previously aggregated under @github.com addresses.
Figures 3 and 4 replot Figures 1 and 2, but across the Datatracker-mapped e-mail dataset. Figure 4 highlights that 80% of people are active (i.e., send at least one e-mail) on 5 or fewer mailing lists.
Analysing ietf@ietf.org
The dataset also provides insight into the ietf@ietf.org mailing list. As the general IETF discussion list, the list should ideally be representative of the broader IETF community. We found that 322 people, 15.4% of the 2088 people in the dataset, sent at least one e-mail to the ietf@ietf.org list. In total, 5356 e-mails were sent to the ietf@ietf.org list in the Datatracker-mapped dataset. Of those, 29.6%, or 1589 e-mails, were sent by the top 10 most active participants; more than 50% of the e-mails sent to the ietf@ietf.org were sent by the top 25 contributors.
Figure 5 highlights this difference in behaviour, as compared with wider contribution trends. As shown, people that participate in the ietf@ietf.org list tend to send greater volumes of e-mail to that list.
Raw data tables
In order to allow for further analysis, we provide the raw data tables used in the analysis above. Each data table is a tab-separated file. Data tables are provided for each of the address-based and Datatracker-mapped person-based datasets described.
All of the raw data tables can be downloaded here. Each data table is described below.