Unix 电子邮件#

从 unix 邮箱创建一个有向图,允许有多条边和自环。节点是电子邮件地址,链接从发送者指向接收者。边数据是 Python 的 email.Message 对象,其中包含所有电子邮件消息数据。

本示例展示了 DiGraph 存储任意 Python 对象(在此示例中为电子邮件消息列表)作为边数据的强大能力。

名为“unix_email.mbox”的示例 unix 电子邮件邮箱可在以下位置找到

plot unix email
From: alice@edu To: bob@gov Subject: NetworkX
From: bob@gov To: alice@edu Subject: Re: NetworkX
From: bob@gov To: ted@com Subject: Re: Graph package in Python?
From: ted@com To: bob@gov Subject: Graph package in Python?
From: ted@com To: bob@gov Subject: get together for lunch to discuss Networks?
From: ted@com To: carol@gov Subject: get together for lunch to discuss Networks?
From: ted@com To: alice@edu Subject: get together for lunch to discuss Networks?

from email.utils import getaddresses, parseaddr
import mailbox

import matplotlib.pyplot as plt
import networkx as nx

# unix mailbox recipe
# see https://docs.pythonlang.cn/3/library/mailbox.html


def mbox_graph():
    mbox = mailbox.mbox("unix_email.mbox")  # parse unix mailbox

    G = nx.MultiDiGraph()  # create empty graph

    # parse each messages and build graph
    for msg in mbox:  # msg is python email.Message.Message object
        (source_name, source_addr) = parseaddr(msg["From"])  # sender
        # get all recipients
        # see https://docs.pythonlang.cn/3/library/email.html
        tos = msg.get_all("to", [])
        ccs = msg.get_all("cc", [])
        resent_tos = msg.get_all("resent-to", [])
        resent_ccs = msg.get_all("resent-cc", [])
        all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
        # now add the edges for this mail message
        for target_name, target_addr in all_recipients:
            G.add_edge(source_addr, target_addr, message=msg)

    return G


G = mbox_graph()

# print edges with message subject
for u, v, d in G.edges(data=True):
    print(f"From: {u} To: {v} Subject: {d['message']['Subject']}")

pos = nx.spring_layout(G, iterations=10, seed=227)
nx.draw(G, pos, node_size=0, alpha=0.4, edge_color="r", font_size=16, with_labels=True)
ax = plt.gca()
ax.margins(0.08)
plt.show()

脚本总运行时间: (0 分 0.133 秒)

画廊由 Sphinx-Gallery 生成