Performance and Measurement part 1

Measurement

example 1

Producing Wrong Data Without Doing Anything Obviously Wrong! Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. ASPLOS 2009.

445 references


A sample blog post about this paper blog

violin plots

data

A dataset with 2,185 CPUs and 2,668 GPUs to help researchers understand the development trend of CPUs and GPUs. Setup by Kaggle

# library & dataset
import seaborn as sns
import pandas as pd
import numpy as np


df = pd.read_csv('images/chip_dataset.csv')
print(df.head())

sns.set_palette("pastel")

sns.violinplot(x=df["Vendor"], y=np.log(df["Freq (MHz)"]), hue=df['Type'])
   Unnamed: 0                  Product Type Release Date  Process Size (nm)  \
0           0      AMD Athlon 64 3500+  CPU   2007-02-20               65.0   
1           1         AMD Athlon 200GE  CPU   2018-09-06               14.0   
2           2     Intel Core i5-1145G7  CPU   2020-09-02               10.0   
3           3    Intel Xeon E5-2603 v2  CPU   2013-09-01               22.0   
4           4  AMD Phenom II X4 980 BE  CPU   2011-05-03               45.0   

   TDP (W)  Die Size (mm^2)  Transistors (million)  Freq (MHz)  Foundry  \
0     45.0             77.0                  122.0      2200.0  Unknown   
1     35.0            192.0                 4800.0      3200.0  Unknown   
2     28.0              NaN                    NaN      2600.0    Intel   
3     80.0            160.0                 1400.0      1800.0    Intel   
4    125.0            258.0                  758.0      3700.0  Unknown   

  Vendor  FP16 GFLOPS  FP32 GFLOPS  FP64 GFLOPS  
0    AMD          NaN          NaN          NaN  
1    AMD          NaN          NaN          NaN  
2  Intel          NaN          NaN          NaN  
3  Intel          NaN          NaN          NaN  
4    AMD          NaN          NaN          NaN  

apply np.log to the Freq (MHz) column. Because of wide range of values

Set a pastel palette with sns.set_palette(“pastel”). These colors make it easier to see the parts of the violin plot

Hue Parameter: I’m using the hue parameter to differentiate between types of chips. Hue sets a color within the palette

Data Source and Preparation: Include a brief note on where the data comes from (you’ve provided a link, but a sentence or two summarizing the dataset would be helpful) and any preprocessing steps taken before visualization.

I might want to take date into account in these plots


A violin plot shows density curves. The width is the approximate frequency of data points at that value

Best for comparing distributions

consider ordering the groups

The details

  1. the white dot represents the median
  2. the thick gray bar in the center represents the inter-quartile range
  3. the thin gray line represents the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
  4. On each side of the gray line is a kernel density estimation to show the distribution shape of the data. Wider sections of the violin plot represent a higher probability that members of the population will take on the given value; the skinnier sections represent a lower probability.

error bars

print(df.describe())

import pandas as pd

# Assuming df is your DataFrame and 'Release Date' is the column
df['Release Date'] = pd.to_datetime(df['Release Date'], errors='coerce')
df['Release Year'] = ((df['Release Date'].dt.year) // 5) * 5

# Now df['Release Year'] contains the year extracted from 'Release Date'


# plot a bar chart
ax = sns.barplot(x=df['Release Year'], y=df["TDP (W)"], hue =df['Type'], estimator=np.mean, errorbar=("sd"))
        Unnamed: 0  Process Size (nm)      TDP (W)  Die Size (mm^2)  \
count  4854.000000        4845.000000  4228.000000      4139.000000   
mean   2426.500000          55.109598    81.359981       188.440445   
std    1401.373433          44.998676    76.807808       126.189383   
min       0.000000           0.000000     1.000000         1.000000   
25%    1213.250000          22.000000    33.000000       104.000000   
50%    2426.500000          40.000000    65.000000       148.000000   
75%    3639.750000          90.000000   100.000000       239.000000   
max    4853.000000         250.000000   900.000000       826.000000   

       Transistors (million)   Freq (MHz)    FP16 GFLOPS   FP32 GFLOPS  \
count            4143.000000  4854.000000     536.000000   1948.000000   
mean             1929.922279  1484.406057    8397.459851   2134.756653   
std              4044.891098  1066.701523   13799.551131   3898.431487   
min                 8.000000   100.000000      10.020000     12.800000   
25%               154.000000   590.000000     768.800000    257.300000   
50%               624.000000  1073.500000    2965.500000    696.000000   
75%              1550.000000  2400.000000   10600.000000   2116.750000   
max             54200.000000  4700.000000  184600.000000  40000.000000   

        FP64 GFLOPS  
count   1306.000000  
mean     363.670511  
std     1145.931856  
min        3.600000  
25%       38.295000  
50%       89.280000  
75%      220.000000  
max    11540.000000  

Back to top