可视化对于大家来说确实是有关的,因为确实是直观的,每一组大数据如果可以用可视化进行展示的话可以让大家豁然开朗。但在另外一些场景中,辅之以少量的文字提示(textual cue)
和标签是必不可少的。虽然最基本的注释(annotation)
类型可能只是坐标轴标题与图标题,但注释可远远不止这些。让我们可视化一些数据,看看如何通过添加注释来更恰当地表达信息。
首先导入画图需要用到的一些函数:
import matplotlib.pyplot as plt import matplotlib as mpl plt.style.use('seaborn-whitegrid') import numpy as np import pandas as pd
1 案例:节假日对美国出生率的影响
数据可以在 https://github.com/jakevdp/data-CDCbirths 下载,数据类型如下:
用清洗方法处理数据,然后画出结果。
日均出生人数统计图
births = pd.read_csv('births.csv') quartiles = np.percentile(births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) births.index = pd.to_datetime(10000 * births.year + 100 * births.month + births.day, format='%Y%m%d'<strong>本文来源gaodai#ma#com搞@@代~&码网</strong>) births_by_date = births.pivot_table('births', [births.index.month, births.index.day]) births_by_date.index = [pd.datetime(2012, month, day) for (month, day) in births_by_date.index] fig, ax = plt.subplots(figsize=(12, 4)) births_by_date.plot(ax=ax);
import matplotlib.pyplot as plt import matplotlib as mpl plt.style.use('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\births.csv') quartiles = np.percentile(births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) births.index = pd.to_datetime(10000 * births.year + 100 * births.month + births.day, format='%Y%m%d') births_by_date = births.pivot_table('births', [births.index.month, births.index.day]) births_by_date.index = [pd.datetime(2012, month, day) for (month, day) in births_by_date.index] fig, ax = plt.subplots(figsize=(12, 4)) births_by_date.plot(ax=ax); plt.show()
为日均出生人数统计图添加注释
在用这样的图表达观点时,如果可以在图中增加一些注释,就更能吸引读者的注意了。可以通过 plt.text / ax.text
命令手动添加注释,它们可以在具体的 x / y 坐标点上放上文字
fig, ax = plt.subplots(figsize=(12, 4)) births_by_date.plot(ax=ax) # 在图上增加文字标签 style = dict(size=10, color='gray') ax.text('2012-1-1', 3950, "New Year's Day", **style) ax.text('2012-7-4', 4250, "Independence Day", ha='center', **style) ax.text('2012-9-4', 4850, "Labor Day", ha='center', **style) ax.text('2012-10-31', 4600, "Halloween", ha='right', **style) ax.text('2012-11-25', 4450, "Thanksgiving", ha='center', **style) ax.text('2012-12-25', 3850, "Christmas ", ha='right', **style) # 设置坐标轴标题 ax.set(title='USA births by day of year (1969-1988)', ylabel='average daily births') # 设置x轴刻度值,让月份居中显示 ax.xaxis.set_major_locator(mpl.dates.MonthLocator()) ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15)) ax.xaxis.set_major_formatter(plt.NullFormatter()) ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));
import matplotlib.pyplot as plt import matplotlib as mpl plt.style.use('seaborn-whitegrid') import numpy as np import pandas as pd births = pd.read_csv('C:\\Users\\Y\\Desktop\\data-CDCbirths-master\\births.csv') quartiles = np.percentile(births['births'], [25, 50, 75]) mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0]) births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)') births['day'] = births['day'].astype(int) births.index = pd.to_datetime(10000 * births.year + 100 * births.month + births.day, format='%Y%m%d') births_by_date = births.pivot_table('births', [births.index.month, births.index.day]) births_by_date.index = [pd.datetime(2012, month, day) for (month, day) in births_by_date.index] fig, ax = plt.subplots(figsize=(12, 4)) births_by_date.plot(ax=ax) # 在图上增加文字标签 style = dict(size=10, color='gray') ax.text('2012-1-1', 3950, "New Year's Day", **style) ax.text('2012-7-4', 4250, "Independence Day", ha='center', **style) ax.text('2012-9-4', 4850, "Labor Day", ha='center', **style) ax.text('2012-10-31', 4600, "Halloween", ha='right', **style) ax.text('2012-11-25', 4450, "Thanksgiving", ha='center', **style) ax.text('2012-12-25', 3850, "Christmas ", ha='right', **style) # 设置坐标轴标题 ax.set(title='USA births by day of year (1969-1988)', ylabel='average daily births') # 设置x轴刻度值,让月份居中显示 ax.xaxis.set_major_locator(mpl.dates.MonthLocator()) ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15)) ax.xaxis.set_major_formatter(plt.NullFormatter()) ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h')); plt.show()
ax.text
方法需要一个 x 轴坐标、一个 y 轴坐标、一个字符串和一些可选参数,比如文字的颜色、字号、风格、对齐方式以及其他文字属性。这里用了 ha='right'
与 ha='center'
,ha
是水平对齐方式(horizonal alignment
)的缩写。关于配置参数的更多信息,请参考plt.text()
与 mpl.text.Text()
的程序文档。