Hypothesis Testing for Gangsters

Okay. Okay. OKAY. Look. I know you have a problem. You've been screwed by someone and now want your money back. Totally agree.

But first take a big breath and relax - you don't want to get into bigger trouble. Let's do it another way. I want to help to go one step further and do it like a PRO. And believe this makes a huge difference.

So go, grab your drink, and read this 5 tips.

How to do this

Read each step carefully. After the end, you will find what should you have after accomplishing it.

  1. Formulate hypothesis you want to validate
    A null hypothesis (H_{0}) is a statement we want to validate. Unless we will find sufficient evidence, there will be no reasons to reject it.

    A drug dealer states that cocaine is pure in 90%.

    A null hypothesis is (H_{0}\colon\ p = 0.9)An alternative hypothesis (H_{1}) is a statement that automatically becomes "true" (not rejected) if null hypothesis gets discarded.

    A customer doubts drug's purity. He states that it contains more than 10% additives. The alternative hypothesis can be (H_{1}\colon\ p < 0.9)

    After this step, you should have formulated (H_{0}) and (H_{1})

  2. Choose test statisticsOur overall aim is to validate the null hypothesis. We have to assure that it is true and then look for arguments to demolish it. Yeah.In more scientific speech we have to come up with probabilistic distribution ensuring that null hypothesis is correct.

    A customer bought 15 decks of a drug. After hosting a big party he realized that ONLY 11 decks were meeting the norm guaranteed by the dealer (test statistics). Remembering the wise words of a dealer, his test distribution can be (X \sim B(15; 0.9)). Someone will have a problem.

    After this step, you should have figured the test statistics (based on the experience) and the test distribution

  3. Choose a critical region (one-tail or two-tail test)Right now we have our probability distribution of test statistics, but still need to choose which values the null hypothesis get rejected (critical region) and for which accepted (acceptance region).We use a term of significance level (\alpha) which is a parameter describing certain probability, that for an event the likelihood of it's occurrence is small enough to agree that the null hypothesis gets rejected.

    A customer have chosen a value of significance level (\alpha = 5\%) meaning that the critical region (when we reject the null hypothesis) can be described as: (P(X < c) < 0.05)

    Depending on the form of (H_{1}) we can also specify whether the critical region is one-tailed or two-tailed.

    One-tail critical region occurs when the alternative hypothesis is expressed with inequities. For example if (H_{1}\colon p <\ c ) we should use left one-tailed critical region, and for (H_{1}\colon p >\ c right one-tailed.

    When the (H_{1}) is expressed with the (\neq) sign we are dealing with two-tail critical region. In this case, the critical region is placed in both tails of the distribution, where each side corresponds to the (\frac{\alpha}{2} ) probability.

    Because the alternative hypothesis is (H_{1}\colon\ p < 0.9 ) the scammed customer is dealing with one-tailed critical region.

    After this step, you should specify the significance level (\alpha ) and know whether the critical region is one-tailed or two-tailed.

  4. Calculate the probability (p-value)P-value is a probability of getting the same (or worse) results from the perspective of a null hypothesis.It's value depend on two things:
    • form of alternative hypothesis (H_{1}) (one or two tails),
    • a value of test statistics (based on the test distribution)

    In the case of our customer the test statistics is 11 (doses of pure drugs) and the critical region is located in left tail. The formula for p-value is ( P(X < 11) ). Taking into consideration (X \sim B(15; 0.9) ) it's value is (P(X < 11) = 0.55). To calculate this he used this snippet.

    After this step, you should obtain p-value

  5. Make a decisionIn this last step, we are finally deciding if the null hypothesis gets rejected or not (i.e. dealer was right or not).The null hypothesis will get rejected if the p-value will get into critical region.For example if the critical region is in the left tail the (H_{0}) will get rejected if ( \alpha < P_{value}).

    Customer has to reject his hypothesis (H_{1} ). In this case the P-value (( P_{value} = 0.055)) is greater than the significance level ( \alpha = 0.05 ), which means that the drug dealer was right ( H_{0}) is true). DAMN.

    After this step you finally know if there are reasons to reject (H_{0})

Q&A

Question: What value of significance level should I choose?

Answer: It all depends on how sure you want to be that you are making no mistake when rejecting a null hypothesis. For example, choosing ( \alpha = 1\% ) gives you more certainty that your decision about rejecting ( H_{0}) was correct than ( \alpha = 5\%).

Summary

I have to admit it. I'm a bit scared. You have received a powerful tool. Tool that help to prove you that you're RIGHT in many cases.

But please, remember about other that still might need some help. Share it with them, and make them your debtors.

Norbert

Let's combine software craftsmanship and data engineering skills results to produce some clean and understandable code.