Probabilistic Programming and Bayesian Methods for Hackers

Probabilistic Programming and Bayesian Methods for Hackers

Version 0.1

Original content created by Cam Davidson-Pilon

Ported to Python 3 and PyMC3 by Max Margenot (@clean_utensils) and Thomas Wiecki (@twiecki) at Quantopian (@quantopian)


Welcome to Bayesian Methods for Hackers. The full Github repository is available at github/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers. The other chapters can be found on the project’s homepage. We hope you enjoy the book, and we encourage any contributions!

Chapter 1


The Philosophy of Bayesian Inference

You are a skilled programmer, but bugs still slip into your code. After a particularly difficult implementation of an algorithm, you decide to test your code on a trivial example. It passes. You test the code on a harder problem. It passes once again. And it passes the next, even more difficult, test too! You are starting to believe that there may be no bugs in this code…

If you think this way, then congratulations, you already are thinking Bayesian! Bayesian inference is simply updating your beliefs after considering new evidence. A Bayesian can rarely be certain about a result, but he or she can be very confident. Just like in the example above, we can never be 100% sure that our code is bug-free unless we test it on every possible problem; something rarely possible in practice. Instead, we can test it on a large number of problems, and if it succeeds we can feel more confident about our code, but still not certain. Bayesian inference works identically: we update our beliefs about an outcome; rarely can we be absolutely sure unless we rule out all other alternatives.

The Bayesian state of mind

Bayesian inference differs from more traditional statistical inference by preserving uncertainty. At first, this sounds like a bad statistical technique. Isn’t statistics all about deriving certainty from randomness? To reconcile this, we need to start thinking like Bayesians.

The Bayesian world-view interprets probability as measure of believability in an event, that is, how confident we are in an event occurring. In fact, we will see in a moment that this is the natural interpretation of probability.

For this to be clearer, we consider an alternative interpretation of probability: Frequentist, known as the more classical version of statistics, assume that probability is the long-run frequency of events (hence the bestowed title). For example, the probability of plane accidents under a frequentist philosophy is interpreted as the long-term frequency of plane accidents. This makes logical sense for many probabilities of events, but becomes more difficult to understand when events have no long-term frequency of occurrences. Consider: we often assign probabilities to outcomes of presidential elections, but the election itself only happens once! Frequentists get around this by invoking alternative realities and saying across all these realities, the frequency of occurrences defines the probability.

Bayesians, on the other hand, have a more intuitive approach. Bayesians interpret a probability as measure of belief, or confidence, of an event occurring. Simply, a probability is a summary of an opinion. An individual who assigns a belief of 0 to an event has no confidence that the event will occur; conversely, assigning a belief of 1 implies that the individual is absolutely certain of an event occurring. Beliefs between 0 and 1 allow for weightings of other outcomes. This definition agrees with the probability of a plane accident example, for having observed the frequency of plane accidents, an individual’s belief should be equal to that frequency, excluding any outside information. Similarly, under this definition of probability being equal to beliefs, it is meaningful to speak about probabilities (beliefs) of presidential election outcomes: how confident are you candidate A will win?

Notice in the paragraph above, I assigned the belief (probability) measure to an individual, not to Nature. This is very interesting, as this definition leaves room for conflicting beliefs between individuals. Again, this is appropriate for what naturally occurs: different individuals have different beliefs of events occurring, because they possess different information about the world. The existence of different beliefs does not imply that anyone is wrong. Consider the following examples demonstrating the relationship between individual beliefs and probabilities:

  • I flip a coin, and we both guess the result. We would both agree, assuming the coin is fair, that the probability of Heads is 1/2. Assume, then, that I peek at the coin. Now I know for certain what the result is: I assign probability 1.0 to either Heads or Tails (whichever it is). Now what is your belief that the coin is Heads? My knowledge of the outcome has not changed the coin’s results. Thus we assign different probabilities to the result.
  • Your code either has a bug in it or not, but we do not know for certain which is true, though we have a belief about the presence or absence of a bug.
  • A medical patient is exhibiting symptoms xx , yy and zz . There are a number of diseases that could be causing all of them, but only a single disease is present. A doctor has beliefs about which disease, but a second doctor may have slightly different beliefs.

This philosophy of treating beliefs as probability is natural to humans. We employ it constantly as we interact with the world and only see partial truths, but gather evidence to form beliefs. Alternatively, you have to be trained to think like a frequentist.

To align ourselves with traditional probability notation, we denote our belief about event AA as P(A)P(A) . We call this quantity the prior probability.

John Maynard Keynes, a great economist and thinker, said “When the facts change, I change my mind. What do you do, sir?” This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even — especially — if the evidence is counter to what was initially believed, the evidence cannot be ignored. We denote our updated belief as P(A|X)P(A|X) , interpreted as the probability of AA given the evidence XX . We call the updated belief the posterior probability so as to contrast it with the prior probability. For example, consider the posterior probabilities (read: posterior beliefs) of the above examples, after observing some evidence XX :

1. P(A):P(A): the coin has a 50 percent chance of being Heads. P(A|X):P(A|X): You look at the coin, observe a Heads has landed, denote this information XX , and trivially assign probability 1.0 to Heads and 0.0 to Tails.

2. P(A):P(A): This big, complex code likely has a bug in it. P(A|X):P(A|X): The code passed all XX tests; there still might be a bug, but its presence is less likely now.

3. P(A):P(A): The patient could have any number of diseases. P(A|X):P(A|X): Performing a blood test generated evidence XX , ruling out some of the possible diseases from consideration.

It’s clear that in each example we did not completely discard the prior belief after seeing new evidence XX , but we re-weighted the prior to incorporate the new evidence (i.e. we put more weight, or confidence, on some beliefs versus others).

By introducing prior uncertainty about events, we are already admitting that any guess we make is potentially very wrong. After observing data, evidence, or other information, we update our beliefs, and our guess becomes less wrong. This is the alternative side of the prediction coin, where typically we try to be more right.

Bayesian Inference in Practice

If frequentist and Bayesian inference were programming functions, with inputs being statistical problems, then the two would be different in what they return to the user. The frequentist inference function would return a number, representing an estimate (typically a summary statistic like the sample average etc.), whereas the Bayesian function would return probabilities.

For example, in our debugging problem above, calling the frequentist function with the argument “My code passed all XX tests; is my code bug-free?” would return a YES. On the other hand, asking our Bayesian function “Often my code has bugs. My code passed all XX tests; is my code bug-free?” would return something very different: probabilities of YES and NO. The function might return:

YES, with probability 0.8; NO, with probability 0.2

This is very different from the answer the frequentist function returned. Notice that the Bayesian function accepted an additional argument: “Often my code has bugs”. This parameter is the prior. By including the prior parameter, we are telling the Bayesian function to include our belief about the situation. Technically this parameter in the Bayesian function is optional, but we will see excluding it has its own consequences.

Incorporating evidence

As we acquire more and more instances of evidence, our prior belief is washed out by the new evidence. This is to be expected. For example, if your prior belief is something ridiculous, like “I expect the sun to explode today”, and each day you are proved wrong, you would hope that any inference would correct you, or at least align your beliefs better. Bayesian inference will correct this belief.

Denote NN as the number of instances of evidence we possess. As we gather an infinite amount of evidence, say as NN→∞ , our Bayesian results (often) align with frequentist results. Hence for large NN , statistical inference is more or less objective. On the other hand, for small NN , inference is much more unstable: frequentist estimates have more variance and larger confidence intervals. This is where Bayesian analysis excels. By introducing a prior, and returning probabilities (instead of a scalar estimate), we preserve the uncertainty that reflects the instability of statistical inference of a small NN dataset.

One may think that for large NN , one can be indifferent between the two techniques since they offer similar inference, and might lean towards the computationally-simpler, frequentist methods. An individual in this position should consider the following quote by Andrew Gelman (2005)[1], before making such a decision:

Sample sizes are never large. If NN is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once NN is “large enough,” you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc.). NN is never enough because if it were “enough” you’d already be on to the next problem for which you need more data.

Are frequentist methods incorrect then?

No.

Frequentist methods are still useful or state-of-the-art in many areas. Tools such as least squares linear regression, LASSO regression, and expectation-maximization algorithms are all powerful and fast. Bayesian methods complement these techniques by solving problems that these approaches cannot, or by illuminating the underlying system with more flexible modeling.

A note on Big Data

Paradoxically, big data’s predictive analytic problems are actually solved by relatively simple algorithms [2][4]. Thus we can argue that big data’s prediction difficulty does not lie in the algorithm used, but instead on the computational difficulties of storage and execution on big data. (One should also consider Gelman’s quote from above and ask “Do I really have big data?”)

The much more difficult analytic problems involve medium data and, especially troublesome, really small data. Using a similar argument as Gelman’s above, if big data problems are big enough to be readily solved, then we should be more interested in the not-quite-big enough datasets.

Our Bayesian framework

We are interested in beliefs, which can be interpreted as probabilities by thinking Bayesian. We have a prior belief in event AA , beliefs formed by previous information, e.g., our prior belief about bugs being in our code before performing tests.

Secondly, we observe our evidence. To continue our buggy-code example: if our code passes XX tests, we want to update our belief to incorporate this. We call this new belief the posterior probability. Updating our belief is done via the following equation, known as Bayes’ Theorem, after its discoverer Thomas Bayes:

P(A|X)=P(X|A)P(A)P(X)P(X|A)P(A)(is proportional to )(1)(2)(3)(1)P(A|X)=P(X|A)P(A)P(X)(2)(3)∝P(X|A)P(A)(∝is proportional to )

The above formula is not unique to Bayesian inference: it is a mathematical fact with uses outside Bayesian inference. Bayesian inference merely uses it to connect prior probabilities P(A)P(A) with an updated posterior probabilities P(A|X)P(A|X) .

Example: Mandatory coin-flip example

Every statistics text must contain a coin-flipping example, I’ll use it here to get it out of the way. Suppose, naively, that you are unsure about the probability of heads in a coin flip (spoiler alert: it’s 50%). You believe there is some true underlying ratio, call it pp , but have no prior opinion on what pp might be.

We begin to flip a coin, and record the observations: either HH or TT . This is our observed data. An interesting question to ask is how our inference changes as we observe more and more data? More specifically, what do our posterior probabilities look like when we have little data, versus when we have lots of data.

Below we plot a sequence of updating posterior probabilities as we observe increasing amounts of data (coin flips).

In [1]:
"""
The book uses a custom matplotlibrc file, which provides the unique styles for
matplotlib plots. If executing this book, and you wish to use the book's
styling, provided are two options:
    1. Overwrite your own matplotlibrc file with the rc-file provided in the
       book's styles/ dir. See http://matplotlib.org/users/customizing.html
    2. Also in the styles is  bmh_matplotlibrc.json file. This can be used to
       update the styles in only this notebook. Try running the following code:

        import json
        s = json.load(open("../styles/bmh_matplotlibrc.json"))
        matplotlib.rcParams.update(s)

"""

# The code below can be passed over, as it is currently not important, plus it
# uses advanced topics we have not covered yet. LOOK AT PICTURE, MICHAEL!
%matplotlib inline
from IPython.core.pylabtools import figsize
import numpy as np
from matplotlib import pyplot as plt
figsize(11, 9)

import scipy.stats as stats

dist = stats.beta
n_trials = [0, 1, 2, 3, 4, 5, 8, 15, 50, 500]
data = stats.bernoulli.rvs(0.5, size=n_trials[-1])
x = np.linspace(0, 1, 100)

# For the already prepared, I'm using Binomial's conj. prior.
for k, N in enumerate(n_trials):
    sx = plt.subplot(len(n_trials)/2, 2, k+1)
    plt.xlabel("$p$, probability of heads") \
        if k in [0, len(n_trials)-1] else None
    plt.setp(sx.get_yticklabels(), visible=False)
    heads = data[:N].sum()
    y = dist.pdf(x, 1 + heads, 1 + N - heads)
    plt.plot(x, y, label="observe %d tosses,\n %d heads" % (N, heads))
    plt.fill_between(x, 0, y, color="#348ABD", alpha=0.4)
    plt.vlines(0.5, 0, 4, color="k", linestyles="--", lw=1)

    leg = plt.legend()
    leg.get_frame().set_alpha(0.4)
    plt.autoscale(tight=True)


plt.suptitle("Bayesian updating of posterior probabilities",
             y=1.02,
             fontsize=14)

plt.tight_layout()

The posterior probabilities are represented by the curves, and our uncertainty is proportional to the width of the curve. As the plot above shows, as we start to observe data our posterior probabilities start to shift and move around. Eventually, as we observe more and more data (coin-flips), our probabilities will tighten closer and closer around the true value of p=0.5p=0.5 (marked by a dashed line).

Notice that the plots are not always peaked at 0.5. There is no reason it should be: recall we assumed we did not have a prior opinion of what pp is. In fact, if we observe quite extreme data, say 8 flips and only 1 observed heads, our distribution would look very biased away from lumping around 0.5 (with no prior opinion, how confident would you feel betting on a fair coin after observing 8 tails and 1 head?). As more data accumulates, we would see more and more probability being assigned at p=0.5p=0.5 , though never all of it.

The next example is a simple demonstration of the mathematics of Bayesian inference.

Example: Bug, or just sweet, unintended feature?

Let AA denote the event that our code has no bugs in it. Let XX denote the event that the code passes all debugging tests. For now, we will leave the prior probability of no bugs as a variable, i.e. P(A)=pP(A)=p .

We are interested in P(A|X)P(A|X) , i.e. the probability of no bugs, given our debugging tests XX . To use the formula above, we need to compute some quantities.

What is P(X|A)P(X|A) , i.e., the probability that the code passes XX tests given there are no bugs? Well, it is equal to 1, for a code with no bugs will pass all tests.

P(X)P(X) is a little bit trickier: The event XX can be divided into two possibilities, event XX occurring even though our code indeed has bugs (denoted A∼A , spoken not AA ), or event XX without bugs (AA ). P(X)P(X) can be represented as:

P(X)=P(X and A)+P(X and A)=P(X|A)P(A)+P(X|A)P(A)=P(X|A)p+P(X|A)(1p)(4)(5)(6)(7)(8)(4)P(X)=P(X and A)+P(X and ∼A)(5)(6)=P(X|A)P(A)+P(X|∼A)P(∼A)(7)(8)=P(X|A)p+P(X|∼A)(1−p)

We have already computed P(X|A)P(X|A) above. On the other hand, P(X|A)P(X|∼A) is subjective: our code can pass tests but still have a bug in it, though the probability there is a bug present is reduced. Note this is dependent on the number of tests performed, the degree of complication in the tests, etc. Let’s be conservative and assign P(X|A)=0.5P(X|∼A)=0.5 . Then

P(A|X)=1p1p+0.5(1p)=2p1+p(9)(10)(11)(9)P(A|X)=1⋅p1⋅p+0.5(1−p)(10)(11)=2p1+p

This is the posterior probability. What does it look like as a function of our prior, p[0,1]p∈[0,1] ?

In [2]:
figsize(12.5, 4)
p = np.linspace(0, 1, 50)
plt.plot(p, 2*p/(1+p), color="#348ABD", lw=3)
#plt.fill_between(p, 2*p/(1+p), alpha=.5, facecolor=["#A60628"])
plt.scatter(0.2, 2*(0.2)/1.2, s=140, c="#348ABD")
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.xlabel("Prior, $P(A) = p$")
plt.ylabel("Posterior, $P(A|X)$, with $P(A) = p$")
plt.title("Are there bugs in my code?");

We can see the biggest gains if we observe the XX tests passed when the prior probability, pp , is low. Let’s settle on a specific value for the prior. I’m a strong programmer (I think), so I’m going to give myself a realistic prior of 0.20, that is, there is a 20% chance that I write code bug-free. To be more realistic, this prior should be a function of how complicated and large the code is, but let’s pin it at 0.20. Then my updated belief that my code is bug-free is 0.33.

Recall that the prior is a probability: pp is the prior probability that there are no bugs, so 1p1−p is the prior probability that there are bugs.

Similarly, our posterior is also a probability, with P(A|X)P(A|X) the probability there is no bug given we saw all tests pass, hence 1P(A|X)1−P(A|X) is the probability there is a bug given all tests passed. What does our posterior probability look like? Below is a chart of both the prior and the posterior probabilities.

In [3]:
figsize(12.5, 4)
colours = ["#348ABD", "#A60628"]

prior = [0.20, 0.80]
posterior = [1./3, 2./3]
plt.bar([0, .7], prior, alpha=0.70, width=0.25,
        color=colours[0], label="prior distribution",
        lw="3", edgecolor=colours[0])

plt.bar([0+0.25, .7+0.25], posterior, alpha=0.7,
        width=0.25, color=colours[1],
        label="posterior distribution",
        lw="3", edgecolor=colours[1])

plt.xticks([0.20, .95], ["Bugs Absent", "Bugs Present"])
plt.title("Prior and Posterior probability of bugs present")
plt.ylabel("Probability")
plt.legend(loc="upper left");

Notice that after we observed XX occur, the probability of bugs being absent increased. By increasing the number of tests, we can approach confidence (probability 1) that there are no bugs present.

This was a very simple example of Bayesian inference and Bayes rule. Unfortunately, the mathematics necessary to perform more complicated Bayesian inference only becomes more difficult, except for artificially constructed cases. We will later see that this type of mathematical analysis is actually unnecessary. First we must broaden our modeling tools. The next section deals with probability distributions. If you are already familiar, feel free to skip (or at least skim), but for the less familiar the next section is essential.


Probability Distributions

Let’s quickly recall what a probability distribution is: Let ZZ be some random variable. Then associated with ZZ is a probability distribution function that assigns probabilities to the different outcomes ZZ can take. Graphically, a probability distribution is a curve where the probability of an outcome is proportional to the height of the curve. You can see examples in the first figure of this chapter.

We can divide random variables into three classifications:

  • ZZ is discrete: Discrete random variables may only assume values on a specified list. Things like populations, movie ratings, and number of votes are all discrete random variables. Discrete random variables become more clear when we contrast them with…
  • ZZ is continuous: Continuous random variable can take on arbitrarily exact values. For example, temperature, speed, time, color are all modeled as continuous variables because you can progressively make the values more and more precise.
  • ZZ is mixed: Mixed random variables assign probabilities to both discrete and continuous random variables, i.e. it is a combination of the above two categories.

Discrete Case

If ZZ is discrete, then its distribution is called a probability mass function, which measures the probability ZZ takes on the value kk , denoted P(Z=k)P(Z=k) . Note that the probability mass function completely describes the random variable ZZ , that is, if we know the mass function, we know how ZZ should behave. There are popular probability mass functions that consistently appear: we will introduce them as needed, but let’s introduce the first very useful probability mass function. We say ZZ is Poisson-distributed if:

P(Z=k)=λkeλk!,k=0,1,2,P(Z=k)=λke−λk!,k=0,1,2,…

λλ is called a parameter of the distribution, and it controls the distribution’s shape. For the Poisson distribution, λλ can be any positive number. By increasing λλ , we add more probability to larger values, and conversely by decreasing λλ we add more probability to smaller values. One can describe λλ as the intensity of the Poisson distribution.

Unlike λλ , which can be any positive number, the value kk in the above formula must be a non-negative integer, i.e., kk must take on values 0,1,2, and so on. This is very important, because if you wanted to model a population you could not make sense of populations with 4.25 or 5.612 members.

If a random variable ZZ has a Poisson mass distribution, we denote this by writing

ZPoi(λ)Z∼Poi(λ)

One useful property of the Poisson distribution is that its expected value is equal to its parameter, i.e.:

E[Z|λ]=λE[Z|λ]=λ

We will use this property often, so it’s useful to remember. Below, we plot the probability mass distribution for different λλ values. The first thing to notice is that by increasing λλ , we add more probability of larger values occurring. Second, notice that although the graph ends at 15, the distributions do not. They assign positive probability to every non-negative integer.

In [4]:
figsize(12.5, 4)

import scipy.stats as stats
a = np.arange(16)
poi = stats.poisson
lambda_ = [1.5, 4.25]
colours = ["#348ABD", "#A60628"]

plt.bar(a, poi.pmf(a, lambda_[0]), color=colours[0],
        label="$\lambda = %.1f$" % lambda_[0], alpha=0.60,
        edgecolor=colours[0], lw="3")

plt.bar(a, poi.pmf(a, lambda_[1]), color=colours[1],
        label="$\lambda = %.1f$" % lambda_[1], alpha=0.60,
        edgecolor=colours[1], lw="3")

plt.xticks(a + 0.4, a)
plt.legend()
plt.ylabel("probability of $k$")
plt.xlabel("$k$")
plt.title("Probability mass function of a Poisson random variable; differing \
$\lambda$ values");

Continuous Case

Instead of a probability mass function, a continuous random variable has a probability density function. This might seem like unnecessary nomenclature, but the density function and the mass function are very different creatures. An example of continuous random variable is a random variable with exponential density. The density function for an exponential random variable looks like this:

fZ(z|λ)=λeλz,z0fZ(z|λ)=λe−λz,z≥0

Like a Poisson random variable, an exponential random variable can take on only non-negative values. But unlike a Poisson variable, the exponential can take on any non-negative values, including non-integral values such as 4.25 or 5.612401. This property makes it a poor choice for count data, which must be an integer, but a great choice for time data, temperature data (measured in Kelvins, of course), or any other precise and positive variable. The graph below shows two probability density functions with different λλ values.

When a random variable ZZ has an exponential distribution with parameter λλ , we say ZZ is exponential and write

ZExp(λ)Z∼Exp(λ)

Given a specific λλ , the expected value of an exponential random variable is equal to the inverse of λλ , that is:

E[Z|λ]=1λE[Z|λ]=1λ
In [5]:
a = np.linspace(0, 4, 100)
expo = stats.expon
lambda_ = [0.5, 1]

for l, c in zip(lambda_, colours):
    plt.plot(a, expo.pdf(a, scale=1./l), lw=3,
             color=c, label="$\lambda = %.1f$" % l)
    plt.fill_between(a, expo.pdf(a, scale=1./l), color=c, alpha=.33)

plt.legend()
plt.ylabel("PDF at $z$")
plt.xlabel("$z$")
plt.ylim(0,1.2)
plt.title("Probability density function of an Exponential random variable;\
 differing $\lambda$");

But what is λλ ?

This question is what motivates statistics. In the real world, λλ is hidden from us. We see only ZZ , and must go backwards to try and determine λλ . The problem is difficult because there is no one-to-one mapping from ZZ to λλ . Many different methods have been created to solve the problem of estimating λλ , but since λλ is never actually observed, no one can say for certain which method is best!

Bayesian inference is concerned with beliefs about what λλ might be. Rather than try to guess λλ exactly, we can only talk about what λλ is likely to be by assigning a probability distribution to λλ .

This might seem odd at first. After all, λλ is fixed; it is not (necessarily) random! How can we assign probabilities to values of a non-random variable? Ah, we have fallen for our old, frequentist way of thinking. Recall that under Bayesian philosophy, we can assign probabilities if we interpret them as beliefs. And it is entirely acceptable to have beliefs about the parameter λλ .

Example: Inferring behaviour from text-message data

Let’s try to model a more interesting example, one that concerns the rate at which a user sends and receives text messages:

You are given a series of daily text-message counts from a user of your system. The data, plotted over time, appears in the chart below. You are curious to know if the user’s text-messaging habits have changed over time, either gradually or suddenly. How can you model this? (This is in fact my own text-message data. Judge my popularity as you wish.)

In [6]:
figsize(12.5, 3.5)
count_data = np.loadtxt("data/txtdata.csv")
n_count_data = len(count_data)
plt.bar(np.arange(n_count_data), count_data, color="#348ABD")
plt.xlabel("Time (days)")
plt.ylabel("count of text-msgs received")
plt.title("Did the user's texting habits change over time?")
plt.xlim(0, n_count_data);

Before we start modeling, see what you can figure out just by looking at the chart above. Would you say there was a change in behaviour during this time period?

How can we start to model this? Well, as we have conveniently already seen, a Poisson random variable is a very appropriate model for this type of count data. Denoting day ii ‘s text-message count by CiCi ,

CiPoisson(λ)Ci∼Poisson(λ)

We are not sure what the value of the λλ parameter really is, however. Looking at the chart above, it appears that the rate might become higher late in the observation period, which is equivalent to saying that λλ increases at some point during the observations. (Recall that a higher value of λλ assigns more probability to larger outcomes. That is, there is a higher probability of many text messages having been sent on a given day.)

How can we represent this observation mathematically? Let’s assume that on some day during the observation period (call it ττ ), the parameter λλ suddenly jumps to a higher value. So we really have two λλ parameters: one for the period before ττ , and one for the rest of the observation period. In the literature, a sudden transition like this would be called a switchpoint:

λ={λ1λ2if t<τif tτλ={λ1if t<τλ2if t≥τ

If, in reality, no sudden change occurred and indeed λ1=λ2λ1=λ2 , then the λλ s posterior distributions should look about equal.

We are interested in inferring the unknown λλ s. To use Bayesian inference, we need to assign prior probabilities to the different possible values of λλ . What would be good prior probability distributions for λ1λ1 and λ2λ2 ? Recall that λλ can be any positive number. As we saw earlier, the exponential distribution provides a continuous density function for positive numbers, so it might be a good choice for modeling λiλi . But recall that the exponential distribution takes a parameter of its own, so we’ll need to include that parameter in our model. Let’s call that parameter αα .

 λ1Exp(α)λ2Exp(α)(12)(13)(12)λ1∼Exp(α)(13) λ2∼Exp(α)

αα is called a hyper-parameter or parent variable. In literal terms, it is a parameter that influences other parameters. Our initial guess at αα does not influence the model too strongly, so we have some flexibility in our choice. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data. Since we’re modeling λλ using an exponential distribution, we can use the expected value identity shown earlier to get:

1Ni=0NCiE[λ|α]=1α1N∑i=0NCi≈E[λ|α]=1α

An alternative, and something I encourage the reader to try, would be to have two priors: one for each λiλi . Creating two exponential distributions with different αα values reflects our prior belief that the rate changed at some point during the observations.

What about ττ ? Because of the noisiness of the data, it’s difficult to pick out a priori when ττ might have occurred. Instead, we can assign a uniform prior belief to every possible day. This is equivalent to saying

τDiscreteUniform(1,70) P(τ=k)=170(14)(15)(16)(14)τ∼DiscreteUniform(1,70) (15)(16)⇒P(τ=k)=170

So after all this, what does our overall prior distribution for the unknown variables look like? Frankly, it doesn’t matter. What we should understand is that it’s an ugly, complicated mess involving symbols only a mathematician could love. And things will only get uglier the more complicated our models become. Regardless, all we really care about is the posterior distribution.

We next turn to PyMC3, a Python library for performing Bayesian analysis that is undaunted by the mathematical monster we have created.

Introducing our first hammer: PyMC3

PyMC3 is a Python library for programming Bayesian analysis [3]. It is a fast, well-maintained library. The only unfortunate part is that its documentation is lacking in certain areas, especially those that bridge the gap between beginner and hacker. One of this book’s main goals is to solve that problem, and also to demonstrate why PyMC3 is so cool.

We will model the problem above using PyMC3. This type of programming is called probabilistic programming, an unfortunate misnomer that invokes ideas of randomly-generated code and has likely confused and frightened users away from this field. The code is not random; it is probabilistic in the sense that we create probability models using programming variables as the model’s components. Model components are first-class primitives within the PyMC3 framework.

B. Cronin [5] has a very motivating description of probabilistic programming:

Another way of thinking about this: unlike a traditional program, which only runs in the forward directions, a probabilistic program is run in both the forward and backward direction. It runs forward to compute the consequences of the assumptions it contains about the world (i.e., the model space it represents), but it also runs backward from the data to constrain the possible explanations. In practice, many probabilistic programming systems will cleverly interleave these forward and backward operations to efficiently home in on the best explanations.

Because of the confusion engendered by the term probabilistic programming, I’ll refrain from using it. Instead, I’ll simply say programming, since that’s what it really is.

PyMC3 code is easy to read. The only novel thing should be the syntax. Simply remember that we are representing the model’s components (τ,λ1,λ2τ,λ1,λ2 ) as variables.

In [7]:
import pymc3 as pm
import theano.tensor as tt

with pm.Model() as model:
    alpha = 1.0/count_data.mean()  # Recall count_data is the
                                   # variable that holds our txt counts
    lambda_1 = pm.Exponential("lambda_1", alpha)
    lambda_2 = pm.Exponential("lambda_2", alpha)
    
    tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1)
Applied log-transform to lambda_1 and added transformed lambda_1_log_ to model.
Applied log-transform to lambda_2 and added transformed lambda_2_log_ to model.

In the code above, we create the PyMC3 variables corresponding to λ1λ1 and λ2λ2 . We assign them to PyMC3’s stochastic variables, so-called because they are treated by the back end as random number generators.

In [8]:
with model:
    idx = np.arange(n_count_data) # Index
    lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)

This code creates a new function lambda_, but really we can think of it as a random variable: the random variable λλ from above. The switch() function assigns lambda_1 or lambda_2 as the value of lambda_, depending on what side of tau we are on. The values of lambda_ up until tau are lambda_1 and the values afterwards are lambda_2.

Note that because lambda_1, lambda_2 and tau are random, lambda_ will be random. We are not fixing any variables yet.

In [9]:
with model:
    observation = pm.Poisson("obs", lambda_, observed=count_data)

The variable observation combines our data, count_data, with our proposed data-generation scheme, given by the variable lambda_, through the observed keyword.

The code below will be explained in Chapter 3, but I show it here so you can see where our results come from. One can think of it as a learning step. The machinery being employed is called Markov Chain Monte Carlo (MCMC), which I also delay explaining until Chapter 3. This technique returns thousands of random variables from the posterior distributions of λ1,λ2λ1,λ2 and ττ . We can plot a histogram of the random variables to see what the posterior distributions look like. Below, we collect the samples (called traces in the MCMC literature) into histograms.

In [10]:
### Mysterious code to be explained in Chapter 3.
with model:
    step = pm.Metropolis()
    trace = pm.sample(10000, tune=5000,step=step)
100%|██████████| 10000/10000 [00:02<00:00, 4511.50it/s]
In [11]:
lambda_1_samples = trace['lambda_1']
lambda_2_samples = trace['lambda_2']
tau_samples = trace['tau']
In [12]:
figsize(12.5, 10)
#histogram of the samples:

ax = plt.subplot(311)
ax.set_autoscaley_on(False)

plt.hist(lambda_1_samples, histtype='stepfilled', bins=30, alpha=0.85,
         label="posterior of $\lambda_1$", color="#A60628", normed=True)
plt.legend(loc="upper left")
plt.title(r"""Posterior distributions of the variables
    $\lambda_1,\;\lambda_2,\;\tau$""")
plt.xlim([15, 30])
plt.xlabel("$\lambda_1$ value")

ax = plt.subplot(312)
ax.set_autoscaley_on(False)
plt.hist(lambda_2_samples, histtype='stepfilled', bins=30, alpha=0.85,
         label="posterior of $\lambda_2$", color="#7A68A6", normed=True)
plt.legend(loc="upper left")
plt.xlim([15, 30])
plt.xlabel("$\lambda_2$ value")

plt.subplot(313)
w = 1.0 / tau_samples.shape[0] * np.ones_like(tau_samples)
plt.hist(tau_samples, bins=n_count_data, alpha=1,
         label=r"posterior of $\tau$",
         color="#467821", weights=w, rwidth=2.)
plt.xticks(np.arange(n_count_data))

plt.legend(loc="upper left")
plt.ylim([0, .75])
plt.xlim([35, len(count_data)-20])
plt.xlabel(r"$\tau$ (in days)")
plt.ylabel("probability");

Interpretation

Recall that Bayesian methodology returns a distribution. Hence we now have distributions to describe the unknown λλ s and ττ . What have we gained? Immediately, we can see the uncertainty in our estimates: the wider the distribution, the less certain our posterior belief should be. We can also see what the plausible values for the parameters are: λ1λ1 is around 18 and λ2λ2 is around 23. The posterior distributions of the two λλ s are clearly distinct, indicating that it is indeed likely that there was a change in the user’s text-message behaviour.

What other observations can you make? If you look at the original data again, do these results seem reasonable?

Notice also that the posterior distributions for the λλ s do not look like exponential distributions, even though our priors for these variables were exponential. In fact, the posterior distributions are not really of any form that we recognize from the original model. But that’s OK! This is one of the benefits of taking a computational point of view. If we had instead done this analysis using mathematical approaches, we would have been stuck with an analytically intractable (and messy) distribution. Our use of a computational approach makes us indifferent to mathematical tractability.

Our analysis also returned a distribution for ττ . Its posterior distribution looks a little different from the other two because it is a discrete random variable, so it doesn’t assign probabilities to intervals. We can see that near day 45, there was a 50% chance that the user’s behaviour changed. Had no change occurred, or had the change been gradual over time, the posterior distribution of ττ would have been more spread out, reflecting that many days were plausible candidates for ττ . By contrast, in the actual results we see that only three or four days make any sense as potential transition points.

Why would I want samples from the posterior, anyways?

We will deal with this question for the remainder of the book, and it is an understatement to say that it will lead us to some amazing results. For now, let’s end this chapter with one more example.

We’ll use the posterior samples to answer the following question: what is the expected number of texts at day t,0t70t,0≤t≤70 ? Recall that the expected value of a Poisson variable is equal to its parameter λλ . Therefore, the question is equivalent to what is the expected value of λλ at time tt ?

In the code below, let ii index samples from the posterior distributions. Given a day tt , we average over all possible λiλi for that day tt , using λi=λ1,iλi=λ1,i if t<τit<τi (that is, if the behaviour change has not yet occurred), else we use λi=λ2,iλi=λ2,i .

In [13]:
figsize(12.5, 5)
# tau_samples, lambda_1_samples, lambda_2_samples contain
# N samples from the corresponding posterior distribution
N = tau_samples.shape[0]
expected_texts_per_day = np.zeros(n_count_data)
for day in range(0, n_count_data):
    # ix is a bool index of all tau samples corresponding to
    # the switchpoint occurring prior to value of 'day'
    ix = day < tau_samples
    # Each posterior sample corresponds to a value for tau.
    # for each day, that value of tau indicates whether we're "before"
    # (in the lambda1 "regime") or
    #  "after" (in the lambda2 "regime") the switchpoint.
    # by taking the posterior sample of lambda1/2 accordingly, we can average
    # over all samples to get an expected value for lambda on that day.
    # As explained, the "message count" random variable is Poisson distributed,
    # and therefore lambda (the poisson parameter) is the expected value of
    # "message count".
    expected_texts_per_day[day] = (lambda_1_samples[ix].sum()
                                   + lambda_2_samples[~ix].sum()) / N


plt.plot(range(n_count_data), expected_texts_per_day, lw=4, color="#E24A33",
         label="expected number of text-messages received")
plt.xlim(0, n_count_data)
plt.xlabel("Day")
plt.ylabel("Expected # text-messages")
plt.title("Expected number of text-messages received")
plt.ylim(0, 60)
plt.bar(np.arange(len(count_data)), count_data, color="#348ABD", alpha=0.65,
        label="observed texts per day")

plt.legend(loc="upper left");

Our analysis shows strong support for believing the user’s behavior did change (λ1λ1 would have been close in value to λ2λ2 had this not been true), and that the change was sudden rather than gradual (as demonstrated by ττ ‘s strongly peaked posterior distribution). We can speculate what might have caused this: a cheaper text-message rate, a recent weather-to-text subscription, or perhaps a new relationship. (In fact, the 45th day corresponds to Christmas, and I moved away to Toronto the next month, leaving a girlfriend behind.)

Exercises

1. Using lambda_1_samples and lambda_2_samples, what is the mean of the posterior distributions of λ1λ1 and λ2λ2 ?

In [14]:
#type your code here.

2. What is the expected percentage increase in text-message rates? hint: compute the mean of lambda_1_samples/lambda_2_samples. Note that this quantity is very different from lambda_1_samples.mean()/lambda_2_samples.mean().

In [15]:
#type your code here.

3. What is the mean of λ1λ1 given that we know ττ is less than 45. That is, suppose we have been given new information that the change in behaviour occurred prior to day 45. What is the expected value of λ1λ1 now? (You do not need to redo the PyMC3 part. Just consider all instances where tau_samples < 45.)

In [16]:
#type your code here.

References

In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()
Out[1]:

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by Rackspace

nbviewer GitHub repository.

nbviewer version: 38e181a

nbconvert version: 5.3.1

Rendered a few seconds ago

Medical Device Clinical Trials: What You Need To Know

Introduction

Medical device research is important in itself, as well as part of overall pharmaceutical research. It serves as a simple tool in the aid of diagnostic testing or as an alternative life-saving option in treating certain health conditions. Examples range from as simple as elastic bandages and tongue depressors to more complex pieces like blood screening instruments that reveal the presence of HIV or other diseases, or heart stents that save people’s lives. This paper will cover the differences between medical devices and pharmaceutical products, outline the clinical research process as compared to the pharmaceutical industry, and state some commonly used procedures for analyzing medical device data.

Differences between Medical Devices and Pharmaceutical Industries

There are some noted differences between medical devices and pharmaceutical industries in the literature. They are different in a number of ways as noted by Greg Campbell (2006). A medical device is anything that is not either a drug or a biologic product. Medical devices usually work physically, while pharmaceutical products usually work chemically or biologically. Medical devices can be therapeutic, diagnostic or something else, whereas pharmaceutical products are usually therapeutic. Medical devices are invented, while drugs are usually discovered.

Medical devices can be changed during clinical development and once on the market a newer, improved version may be in development. Thus, the life cycle of a medical device may only be as short as a couple of years. In contrast, drugs are usually on the market for many years. Medical devices are approved through the Premarket Approval (PMA) application process and a single confirmatory study is often sufficient for approval. In contrast, drugs are approved through the New Drug Application (NDA) process and drug development is characterized by Phases I through IV clinical trials. There are numerous medical device companies registered with the FDA, while in comparison there are only relatively few pharmaceutical companies. Medical device companies are usually small (the median size is less than 50 employees), whereas pharmaceutical companies tend to be large.

Approval Process for Devices

The FDA approval process for medical devices is different compared to drugs from pharmaceutical industries. Medical devices are submitted for approval to the Center for Devices and Radiologic Health (CDRH) or Center for Biologics Evaluation and Research (CBER) at the FDA, while drugs from pharmaceutical companies are submitted for approval to the Center for Drug Evaluation and Research (CDER) or CBER at the FDA. Not all devices need to go through controlled clinical trials to gain regulatory approval. If a device needs a confirmatory study to support a premarket approval (PMA), this does not rely on randomized concurrent control but on historical controls showing evidence that the device is “safe and effective.” This confirmatory study is usually enough to support a PMA application while pharmaceutical applications generally require two adequate, well-controlled, confirmatory clinical trials.

In an effort to ensure safety, medical device regulations are continuously under criticism. Different scientific groups are conducting their own research to propose a tougher approval process for a wide range of device that experience recalls because of failure to perform in thousands of patients causing several injuries. But the medical device industry and its allies argue that “more regulations slow innovation, harm patients and cost jobs.” While the FDA guarantees continuous awareness on the safety surrounding medical devices and keeping their regulations in check, the innovation and advancement of medical devices are moving so fast that regulatory changes and/or improvements would face challenges and protests.

Medical Device Classification

Medical devices are classified into Class I, II and III. In terms of regulatory requirement, Class I is controlled the least while Class III is controlled the most. Class I are generally simple devices that pose minimal risk to the users like enemas, bedpans and elastic bandages. Most Class I devices are exempt from Premarket Notifi cation 510(k). A 510(k) must establish that the device is significantly equivalent to one legally in commercial distribution in the United States before May 28, 1976 or to a device that has been determined by FDA to be substantially equivalent. Most Class II devices require Premarket Notification 510(k). Class II are devices that pose a moderate level of risk like intravenous administration sets, sutures and inflatable blood pressure cuffs. On the other hand, Class III devices are high-risk devices that may cause a significant risk of illness or injury, or devices found not significantly equivalent to Class I and II establish through the 510(k) process. Some examples are implantable pacemakers, blood vessel stents and breast implants. Most Class III devices require Premarket Approval (PMA) process which is more involved and comprises the submission of clinical data to support claims made for the device.

Medical Device Development

The development of a medical device follows a different route than of a drug. While clinical trials on drugs focus on dose response study, medical device clinical trials give attention to prototype development. Drug development follows an extensive Phase I, II, III and IV clinical trialing process to test for safety, efficacy and toxicity, whereas medical device has feasibility, pilot and pivotal study models. Some medical device research involves substantial bench and animal testing for reliability and biocompatibility like in implants, but there are no studies for toxicity on devices like the Phase I or animal studies required for pharmaceutical research. Pilot and feasibility studies on medical devices are considered first-in-man studies. Device development is iterative and designs may be refined or improved as device development progresses. While user feedback, adverse events or difficulties in deploying or delivering a device can all lead to changes to the device, second or third generation designs do not always require a new clinical trial. Bridging the new to the old design may require additional bench studies or small confirmatory post-market study.

Statistical Analysis

There are two types of studies in medical device research — reproducibility and clinical utility (Smoak, 2009). To prove the accuracy and precision of a device, a reproducibility study is conducted. For example, a qualitative diagnostic assay uses hit rates while a quantitative diagnostic assay utilizes coefficient of variation and precision analysis using linear mixed models (proc mixed in SAS®) to verify reproducibility. On the other hand, to demonstrate the real-life use of device in clinical practice, a clinical utility study is performed. For example, a diagnostic assay may be used to monitor subjects given either a treatment or placebo in a clinical trial. Typical analyses might include measures of sensitivity, specificity, positive predictive value and negative predictive value. Since diagnostic studies for medical devices can be much shorter in duration than pharmaceutical studies, SAS programmers may have a less time to program (Smoak 2008a).

Measures of Diagnostic Accuracy.
The accuracy of any test is measured by comparing the results from a diagnostic test (positive or negative) to the true condition (presence or absence of disease) of the patient. The two basic measures are sensitivity and specificity. Sensitivity is the ability of a test to detect the disease status or condition when it is truly present, i.e., it is the probability of a positive test result given that the patient has the disease or condition of interest. Specificity is the ability of a test to exclude the condition or disease in patients who do not have the condition or the disease i.e., it is the probability of a negative test result given that the patient does not have the disease or condition of interest. In clinical practice, it is also essential to know how good the test is at predicting the true positives, i.e., the probability that the test will give the correct diagnosis, through their predictive values. The positive predictive value (PPV) is the probability that a patient has the disease or condition given that the test results are positive, and the negative predictive value (NPV) is the probability that a patient does not have the disease or condition given that the test results are indeed negative.

Handling Missing Data.
One of the main challenges in clinical trial analysis is addressing how to handle missing data. Missing data may be caused by patients dropping out or withdrawing their consent, patients who are lost to follow up due to relocation or their living condition, or centers that are closing even before the study is completed. Since missing data can result in biased treatment comparisons and affect the interpretation of study results, it is important to run sensitivity analyses to evaluate the robustness of study results. In medical device clinical trials, one of the methods used to handle missing data is tipping-point analysis. A tipping-point analysis replaces the missing value with some values so that the resulting p-value of the hypothesis is equal to or larger than a prespecified significance level. These outcomes, called tipping points, may convey some questionably poor outcomes that may aid clinical reviewers in making a judgment about treatment effect in the study.

Propensity Score Analysis.
Propensity score analysis is a versatile statistical method used mainly in observational studies for improving treatment comparison by adjusting for up to a relatively large number of potentially confounding covariates. A propensity score is the conditional probability of a patient receiving the active treatment rather than the control, given a collection of observed covariates. The purpose of a propensity score analysis is to attempt to simultaneously balance many covariates in the two treatment groups, in an effort to reduce bias. There has lately been an increased interest in applying this method to nonrandomized medical device clinical studies, which could present some statistical and regulatory issues in both the design and analysis of study results. A high degree of statistical expertise is required in handling issues like pre-specification of clinically relevant covariates to be measured, suitable patient populations, planning of sample size in the context of propensity score methodology, handling missing covariates in generating propensity scores, and assessing the success of the propensity score method by evaluating treatment group overlap in terms of the distributions of propensity scores. In general, devices go through continuous improvement in short intervals. It is not uncommon for a clinical trial to start with one device and end with an improved version of the device. Because of the knowledge accumulated over the years on some devices (e.g. pacemaker), it is possible to establish an objective performance criterion that is then imposed on a new device for the same purpose. In addition, the accumulated experience on the control has led many device companies to propose hierarchical models when designing and analyzing a device trial. The latter has led to the FDA guidance on the use of Bayesian statistics in medical device clinical trials (2010).

Bayesian Clinical Trial.
Bayesian statistics is an approach for learning from evidence as it accumulates. While information from previous studies may serve as a supplemental idea in traditional (frequentist) statistical methods, it is not part of the formal analysis. On the contrary, the Bayesian approach uses this prior information and combines it with the current information about the data to conduct the analysis. The Bayesian idea takes into account the prior information and the trial results as part of a continual data fl ow where new and up to date inferences are done every time new data become available. An FDA document is available detailing the design and analysis of clinical trials for medical devices that use Bayesian statistical methods (Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials). Because of the mechanism and evolutionary development of medical devices, good prior information is often available. Through the Bayesian approach, this good information may be incorporated into the statistical analysis of a trial. In some situation, the prior information for a device may be a justification for a smaller-sized or shorter-duration pivotal trial. Additionally, the mechanism of action of a medical device is usually physical which results to local, not systemic, device effects that can sometimes be predictable from prior information.

Adaptive Design.
Adaptive designs use accumulating data to determine how to adjust certain aspects of a trial according to a pre-specified plan, the most common being the notion of early stopping of a trial. For this possibility of early stopping, one or more interim analyses are performed prior to the final analysis. The plan is to assess if the accumulated evidence at the interim is sufficient to draw a suitable inference and make a sound decision. Another useful application of adaptive design is the ability to change sample size during the course of the clinical trial. The need for sample size re-estimation comes about because all sample size calculations make key assumptions about the primary study outcome. Adaptive trial designs can sometimes be easier to implement using Bayesian methods than frequentist methods. By adhering to the Likelihood Principle, a Bayesian approach can offer flexibility in the design and analysis of adaptive trials. From design to analysis of medical device data, varied technical expertise may be required. While most medical device data are acquired over a shorter duration of time and take less time for programmers to program, more sophisticated designs and analysis require highly technically trained statisticians and programmers especially when handling adaptive design and Bayesian clinical trials. These designs require extensive pre-planning and model-building from the prior information to mathematical modeling and combining the information being gathered.

Conclusion
The medical device industry is inevitably growing and becoming more important. Its clinical research is very essential in assessing the safety and effectiveness of numerous medical devices in the market or in the development process. It is also a very important element in pharmaceutical research, like devices used to deliver drugs or diagnostic imaging to monitor therapies. Medical devices are different from pharmaceutical products in terms of FDA approval process, pace or duration of study, and the types of studies and corresponding statistical analysis being employed.

 

REFERENCES
Campbell G. “The Role of Statistics in Medical Devices – The Contrast with Pharmaceuticals.” Biopharmaceutical Report 14:1-8, 2006.

Mandrekar JN, Mandrekar SJ. “Statistical Methods in Diagnostic Medicine using SAS® Software.” Proceedings of the SAS® Users Group International, April 2005. < http://www2.sas.com/proceedings/sugi30/211-30.pdf >

Meier B. “Study of Medical Device Rules Is Attacked, Unseen” The New York Times, July 2011.

“Medical Devices Industry Assessment”, Medical Market Fact Book 2008 < http://www.trade.gov/td/health/Medical%20Device%20Industry%20 Assessment%20FINAL%20II%203-24-10.pdf >

Smoak C. “Medical Device and Diagnostic Industry 101.” 2010. < http://www.pharmasug.org/cd/papers/IB/IB01.pdf >

Yan X, et.al “Missing data handling methods in medical device clinical trials.” Journal of Biopharmaceutical Statistics, Nov 2009; 19(6):1085-98.

Yu Shu, et.al “Tipping-point Analysis in Medical Device Clinical Trials” 4th Annual FDA/MTLI Medical Device and IVD Statistics Workshop, April 2011.

< http://www.advamedmtli.org/download/fi le/Shu_Tipping-point%20Analysis%20 Day1.pdf >

Yue LQ. “Statistical and regulatory issues with the application of propensity score analysis to nonrandomized medical device clinical studies.” Journal of Biopharmaceutical Statistics, 2007; 17(1):1-13

Medical Device Design Control Implementation Summary List

1.      Identify Total Design Control Requirement
Review of specific design control requirements, determine their source and identify implementation methods.

·         Design Control

·         US – Food Drug and Cosmetics Act & 21 CFR Part 820

·         EU – Medical Device Directive (MDD) & EN ISO 13485:2012

·         Canada – Canadian Medical Device Regulations (CMDR) & ISO 13485:2003

·         Conformity Assessment Paths

·         US – Premarket Submissions

·         EU – MDD Annex II Technical Files & Design Dossiers

·         Canada – Medical Device License

·         Identify Total Design Control Requirements

2.      Developing a Project Plan
Analysis of a small example project, identify stages and develop a Work Breakdown Structure (WBS) and a Gantt chart for the project.

·         Planning Stages

·         Interfaces

·         Project Management Techniques

·         Action List

·         Work Breakdown Structure (WBS)

·         Gantt Chart

·         Critical Path Method (CPM)

·         Program Evaluation and Review Technique (PERT)

·         Developing the Design History File (DHF)

3.      Developing and Resolving Input Requirements
Develop a set of design inputs for an example product and use the results to identify missing, incomplete or ambiguous requirements.

·         Requirements for the Procedure

·         Resolving design input issues (incomplete, ambiguous, or conflicting requirements)

·         Identifying the design input requirements

·         Performance

·         Functional

·         Safety

·         Regulatory

·         Market

·         Starting a trace matrix

·         Developing and Resolving Input Requirements

4.      Design Output Completeness
Review design outputs for an example product and identify essential requirements, acceptance criteria and gaps in the requirements.

·         Requirements for the Procedure

·         Total finished design output

·         The device

·         Packaging and labeling

·         Device Master Record (DMR)

·         Acceptance criteria

·         Essential product characteristics

·         Continuing the trace matrix

5.      Identifying and Resolving Problems
Design reviews systematically, and examine a design to evaluate its adequacy and capability with the intent to identify problems. Participants critique a design review to determine if it is sufficient.

·         Requirements for the Procedure

·         What needs to be covered

·         Who needs to attend

·         How to document results

·         Integrating risk management

·         Design Review at each Stage

·         Creating and Closing Action Items

·         Identifying and Resolving Problems

6.      Design Verification Methods
Examine paired design inputs and design outputs and determine the best tool for design verification. In some cases the analysis is extended to look specific aspects of the tools.

·         Requirements for the Procedure

·         Design Verification Tools

·         Failure Modes and Effects Analysis (FMEA)

·         Fault Tree Analysis (FTA)

·         Inspections and Tests

·         Document Review

·         Alternate Calculations

·         Similar Designs

·         The sample size question

·         Continuing the trace matrix

·         Exercise – Design Verification Methods

7.      Examining a Design Validation Plan
Critique a design validation that starts with user needs and intended uses. The plan uses production equivalents and simulated use conditions.

·         Requirements for the Procedure

·         Initial Products or Equivalents

·         Defined conditions or simulation

·         Software Validation

·         Risk Management (ISO 14971:2007)

·         Usability Engineering

·         Continuing the trace matrix

·         Examining a Design Validation Plan

8. Determining When a Process Must be Validated
Some production processes require process validation, while others do not. You’ll determine analyze processes transferred to production and document whether they require process validation.

·         Requirements for the Procedure

·         Process Controls, 820.70(a)

·         Purchasing Data, 820.50(b)

·         Process Validation, 820.75

·         Determining When a Process Must be Validated

9. Classify Changes as a Design Change or a Production Process Change.

QSIT informs the FDA Investigator that Production and Process Changes could be Design Changes. This exercise gives attendees an opportunity to classify changes and provides insight into the decisions to make in the QMS.

  • Requirements for the Procedure
  • Design change interrelationships — the five important considerations
  • When a production change is a design change
  • Does the design change create a new Device Identifier?
  • Does the design change require an updated 510(k)?
  • Does the design change impact the Risk Management File?
  • Is the design change an enhancement or a recall?
  • The design change flow chart shows the picture

Design change records

Classify changes as a design change or a production process change