This workshop uses custom functions from the functions/ directory. You may need both: – functions_distributions.py – for reliability and distribution functions – functions_process_control.py – for statistical process control functions
To use these functions, you need to acquire them from the repository at github.com/timothyfraser/sigma/tree/main/functions.
import sys import os # Add the functions directory to Python path sys.path.append(‘functions’) # or path to wherever you placed the functions folder
Figure 12.1: Obanazawa City, Yamagata Prefecture – A Hot Springs Economy. Photo credit and more here.
Onsen-goers often seek out specific types of hot springs, so it’s important for an onsen to actually provide what it advertises! Serbulea and Payyappallimana (2012) describe some of these benchmarks.
These are decent examples of quality control metrics that onsen operators might want to keep tabs on!
The process overview chart is one of the most important tools in SPC. It shows us how our process behaves over time, helping us identify patterns, trends, and potential issues. We’ll create a visualization that shows individual measurements, subgroup means, and the overall process average.
g1 = (ggplot(water, aes(x=’time’, y=’temp’, group=’time’)) + geom_hline(aes(yintercept=water[‘temp’].mean()), color=’lightgrey’, size=3) + geom_jitter(height=0, width=0.25) + geom_boxplot() + labs(x=’Time (Subgroup)’, y=’Temperature (Celsius)’, subtitle=’Process Overview’, caption=tab[‘caption’][0])) # Save the plot g1.save(‘images/05_process_overview.png’, width=8, height=6, dpi=100)
g2 = (ggplot(water, aes(x=’temp’)) + geom_histogram(bins=15, color=’white’, fill=’grey’) + theme_void() + coord_flip()) # Save the plot g2.save(‘images/05_process_histogram.png’, width=8, height=6, dpi=100)
The histogram shows us the distribution of all temperature measurements, giving us insight into the overall process variation. This helps us understand if our process is centered and how much variation we’re seeing.
Control charts are the heart of SPC. They help us monitor process stability over time and detect when the process is out of control. We’ll create charts for both the subgroup means (X-bar chart) and standard deviations (S chart).
Points outside the control limits or showing non-random patterns indicate the process may be out of control and requires investigation.
When we have individual measurements rather than subgroups, we use moving range charts. The moving range is the absolute difference between consecutive measurements, which helps us estimate process variation when we can’t calculate within-subgroup statistics.
indiv = water.iloc[[0,20,40,60,80,100,120,140]] mr = (indiv[‘temp’].diff().abs().dropna()) mrbar = mr.mean() import numpy as np d2 = np.mean(np.abs(np.diff(np.random.normal(0,1,10000)))) sigma_s = mrbar / d2 se = sigma_s / (1**0.5) upper = mrbar + 3*se lower = 0
This chart helps us monitor process variation when we have individual measurements rather than subgroups.
labels = pd.DataFrame({ ‘time’: [stat_s[‘time’].max()]*3, ‘type’: [‘xbbar’,’upper’,’lower’], ‘name’: [‘mean’,’+3 s’,’-3 s’], ‘value’: [stat_s[‘xbar’].mean(), stat_s[‘upper’].iloc[0], stat_s[‘lower’].iloc[0]] }) control_chart = (ggplot(stat_s, aes(x=’time’, y=’xbar’)) + geom_hline(aes(yintercept=stat_s[‘xbar’].mean()), color=’lightgrey’, size=3) + geom_ribbon(aes(ymin=’lower’, ymax=’upper’), fill=’steelblue’, alpha=0.2) + geom_line(size=1) + geom_point(size=5) + geom_label(data=labels, mapping=aes(x=’time’, y=’value’, label=’name’), ha=’right’) + labs(x=’Time (Subgroups)’, y=’Average’, subtitle=’Average and Standard Deviation Chart’)) # Save the plot control_chart.save(‘images/05_control_chart.png’, width=8, height=6, dpi=100)
istat = pd.DataFrame({‘time’: indiv[‘time’].iloc[1:], ‘mr’: mr, ‘mrbar’: mrbar, ‘upper’: upper, ‘lower’: lower}) mr_chart = (ggplot(istat, aes(x=’time’, y=’mr’)) + geom_ribbon(aes(ymin=’lower’, ymax=’upper’), fill=’steelblue’, alpha=0.25) + geom_hline(aes(yintercept=mr.mean()), size=3, color=’darkgrey’) + geom_line(size=1) + geom_point(size=5) + labs(x=’Time (Subgroup)’, y=’Moving Range’, subtitle=’Moving Range Chart’)) # Save the plot mr_chart.save(‘images/05_moving_range_chart.png’, width=8, height=6, dpi=100)
In this workshop, we will learn how to perform statistical process control in Python, using statistical tools and plotnine visualizations! Statistical Process Control refers to using statistics to (1) measure variation in product quality over time and (2) identify benchmarks to know when intervention is needed. Let’s get started!
Packages # Remember to install these packages using a terminal, if you haven’t already! !pip install pandas plotnine scipy We’ll be using pandas for data manipulation, plotnine for visualization, and scipy for statistical functions. import pandas as pd from plotnine import *
# Remember to install these packages using a terminal, if you haven’t already! !pip install pandas plotnine scipy
We’ll be using pandas for data manipulation, plotnine for visualization, and scipy for statistical functions.
Figure 12.1: Obanazawa City, Yamagata Prefecture – A Hot Springs Economy. Photo credit and more here.
Temperature: Onsen are divided into “Extra Hot Springs” (>42°C), “Hot Springs” (41~34°C), and “Warm Springs” (33~25°C).
pH: Onsen are classified into “Acidic” (pH < 3), “Mildly Acidic” (pH 3~6), “Neutral” (pH 6~7.5), “Mildly alkaline” (pH 7.5~8.5), and “Alkaline” (pH > 8.5).
You’ve been hired to evaluate quality control at a local onsen in sunny Kagoshima prefecture! Every month, for 15 months, you systematically took 20 random samples of hot spring water and recorded its temperature, pH, and sulfur levels. How might you determine if this onsen is at risk of slipping out of one sector of the market (eg. Extra Hot!) and into another (just normal Hot Springs?).
# Add functions directory to path if not already there import sys if ‘functions’ not in sys.path: sys.path.append(‘functions’) from functions_distributions import density, tidy_density, approxfun water = pd.read_csv(‘workshops/onsen.csv’) water.head(3)
First, let’s get a sense of our process by calculating some basic descriptive statistics. We’ll create a simple function to calculate the mean and standard deviation, which are fundamental to evaluating process variation.
from pandas import Series def describe(x: Series): x = Series(x) out = pd.DataFrame({ ‘mean’: [x.mean()], ‘sd’: [x.std()], }) out[‘caption’] = (“Process Mean: ” + out[‘mean’].round(2).astype(str) + ” | SD: ” + out[‘sd’].round(2).astype(str)) return out tab = describe(water[‘temp’]) tab
In SPC, we often work with subgroups – small samples taken at regular intervals. This allows us to distinguish between common cause variation (inherent to the process) and special cause variation (due to specific events). Let’s calculate statistics for each subgroup to see how the process behaves over time.
stat_s = (water.groupby(‘time’).apply(lambda d: pd.Series({ ‘xbar’: d[‘temp’].mean(), ‘r’: d[‘temp’].max() – d[‘temp’].min(), ‘sd’: d[‘temp’].std(), ‘nw’: len(d) })).reset_index()) stat_s[‘df’] = stat_s[‘nw’] – 1 stat_s[‘sigma_s’] = ( (stat_s[‘df’] * (stat_s[‘sd’]**2)).sum() / stat_s[‘df’].sum() )**0.5 stat_s[‘se’] = stat_s[‘sigma_s’] / (stat_s[‘nw’]**0.5) stat_s[‘upper’] = stat_s[‘xbar’].mean() + 3*stat_s[‘se’] stat_s[‘lower’] = stat_s[‘xbar’].mean() – 3*stat_s[‘se’] stat_s.head(3)
Now let’s calculate the overall process statistics that summarize the behavior across all subgroups:
stat_t = pd.DataFrame({ ‘xbbar’: [stat_s[‘xbar’].mean()], ‘rbar’: [stat_s[‘r’].mean()], ‘sdbar’: [stat_s[‘sd’].mean()], ‘sigma_s’: [(stat_s[‘sd’]**2).mean()**0.5], ‘sigma_t’: [water[‘temp’].std()] }) stat_t
Custom Functions This workshop uses custom functions from the functions/ directory. You may need both: – functions_distributions.py – for reliability and distribution functions – functions_process_control.py – for statistical process control functions To use these functions, you need to acquire them from the repository at github.com/timothyfraser/sigma/tree/main/functions. Add the functions directory to your Python path import sys import os # Add the functions directory to Python path sys.path.append(‘functions’) # or path to wherever you placed the functions folder Once you have the functions available, you can import them: from functions_distributions import density, tidy_density, approxfun # from functions_process_control import ggprocess, ggsubgroup, ggmoving, ggcapability # if needed
Once you have the functions available, you can import them:
from functions_distributions import density, tidy_density, approxfun # from functions_process_control import ggprocess, ggsubgroup, ggmoving, ggcapability # if needed
Our Data You’ve been hired to evaluate quality control at a local onsen in sunny Kagoshima prefecture! Every month, for 15 months, you systematically took 20 random samples of hot spring water and recorded its temperature, pH, and sulfur levels. How might you determine if this onsen is at risk of slipping out of one sector of the market (eg. Extra Hot!) and into another (just normal Hot Springs?). Let’s read in our data from workshops/onsen.csv! # Add functions directory to path if not already there import sys if ‘functions’ not in sys.path: sys.path.append(‘functions’) from functions_distributions import density, tidy_density, approxfun water = pd.read_csv(‘workshops/onsen.csv’) water.head(3) ## id time temp ph sulfur ## 0 1 1 43.2 5.1 0.0 ## 1 2 1 45.3 4.8 0.4 ## 2 3 1 45.5 6.2 0.9
16.1 Process Descriptive Statistics First, let’s get a sense of our process by calculating some basic descriptive statistics. We’ll create a simple function to calculate the mean and standard deviation, which are fundamental to evaluating process variation. from pandas import Series def describe(x: Series): x = Series(x) out = pd.DataFrame({ ‘mean’: [x.mean()], ‘sd’: [x.std()], }) out[‘caption’] = (“Process Mean: ” + out[‘mean’].round(2).astype(str) + ” | SD: ” + out[‘sd’].round(2).astype(str)) return out tab = describe(water[‘temp’]) tab ## mean sd caption ## 0 44.85 1.989501 Process Mean: 44.85 | SD: 1.99 Now let’s apply this to our temperature data to see the overall process mean and variation.



You must be logged in to post a comment.