Hypothesis Testing for Gangsters

Okay. Okay. OKAY. Look. I know you have a problem. You've been screwed by someone and now want your money back. Totally agree.

But first take a big breath and relax - you don't want to get into bigger trouble. Let's do it another way. I want to help to go one step further and do it like a PRO. And believe this makes a huge difference.

So go, grab your drink, and read this 5 tips.

How to do this

Read each step carefully. After the end, you will find what should you have after accomplishing it.

  1. Formulate hypothesis you want to validate
    A null hypothesis (H_{0}) is a statement we want to validate. Unless we will find sufficient evidence, there will be no reasons to reject it.

    A drug dealer states that cocaine is pure in 90%.

    A null hypothesis is (H_{0}\colon\ p = 0.9)An alternative hypothesis (H_{1}) is a statement that automatically becomes "true" (not rejected) if null hypothesis gets discarded.

    A customer doubts drug's purity. He states that it contains more than 10% additives. The alternative hypothesis can be (H_{1}\colon\ p < 0.9)

    After this step, you should have formulated (H_{0}) and (H_{1})

  2. Choose test statisticsOur overall aim is to validate the null hypothesis. We have to assure that it is true and then look for arguments to demolish it. Yeah.In more scientific speech we have to come up with probabilistic distribution ensuring that null hypothesis is correct.

    A customer bought 15 decks of a drug. After hosting a big party he realized that ONLY 11 decks were meeting the norm guaranteed by the dealer (test statistics). Remembering the wise words of a dealer, his test distribution can be (X \sim B(15; 0.9)). Someone will have a problem.

    After this step, you should have figured the test statistics (based on the experience) and the test distribution

  3. Choose a critical region (one-tail or two-tail test)Right now we have our probability distribution of test statistics, but still need to choose which values the null hypothesis get rejected (critical region) and for which accepted (acceptance region).We use a term of significance level (\alpha) which is a parameter describing certain probability, that for an event the likelihood of it's occurrence is small enough to agree that the null hypothesis gets rejected.

    A customer have chosen a value of significance level (\alpha = 5\%) meaning that the critical region (when we reject the null hypothesis) can be described as: (P(X < c) < 0.05)

    Depending on the form of (H_{1}) we can also specify whether the critical region is one-tailed or two-tailed.

    One-tail critical region occurs when the alternative hypothesis is expressed with inequities. For example if (H_{1}\colon p <\ c ) we should use left one-tailed critical region, and for (H_{1}\colon p >\ c right one-tailed.

    When the (H_{1}) is expressed with the (\neq) sign we are dealing with two-tail critical region. In this case, the critical region is placed in both tails of the distribution, where each side corresponds to the (\frac{\alpha}{2} ) probability.

    Because the alternative hypothesis is (H_{1}\colon\ p < 0.9 ) the scammed customer is dealing with one-tailed critical region.

    After this step, you should specify the significance level (\alpha ) and know whether the critical region is one-tailed or two-tailed.

  4. Calculate the probability (p-value)P-value is a probability of getting the same (or worse) results from the perspective of a null hypothesis.It's value depend on two things:
    • form of alternative hypothesis (H_{1}) (one or two tails),
    • a value of test statistics (based on the test distribution)

    In the case of our customer the test statistics is 11 (doses of pure drugs) and the critical region is located in left tail. The formula for p-value is ( P(X < 11) ). Taking into consideration (X \sim B(15; 0.9) ) it's value is (P(X < 11) = 0.55). To calculate this he used this snippet.

    After this step, you should obtain p-value

  5. Make a decisionIn this last step, we are finally deciding if the null hypothesis gets rejected or not (i.e. dealer was right or not).The null hypothesis will get rejected if the p-value will get into critical region.For example if the critical region is in the left tail the (H_{0}) will get rejected if ( \alpha < P_{value}).

    Customer has to reject his hypothesis (H_{1} ). In this case the P-value (( P_{value} = 0.055)) is greater than the significance level ( \alpha = 0.05 ), which means that the drug dealer was right ( H_{0}) is true). DAMN.

    After this step you finally know if there are reasons to reject (H_{0})


Question: What value of significance level should I choose?

Answer: It all depends on how sure you want to be that you are making no mistake when rejecting a null hypothesis. For example, choosing ( \alpha = 1\% ) gives you more certainty that your decision about rejecting ( H_{0}) was correct than ( \alpha = 5\%).


I have to admit it. I'm a bit scared. You have received a powerful tool. Tool that help to prove you that you're RIGHT in many cases.

But please, remember about other that still might need some help. Share it with them, and make them your debtors.

How to perform networking match-making?


Networking events are becoming more and more popular. People are attending them expecting to find someone capable of solving their concerns.

The problem of telling "who should talk to who" is relatively easy when the number of attendees is low. In this case, conveyors can do the matching manually. However when the number of people (or a variety of skills) increases things beginning to complicate.

Desired scenario

Let's consider the following plot:

Before an event, each person fills-in the form describing his skills and needs

For example:

Later on, each person receives the information telling which people are somehow especially valuable and should talk to.


The main question is: "Who should each person talk to?"

To estimate the difficulty of the problem let's assume that there are (n) people and each one receives only three recommendations (k=3).

This gives a total number of combinations expressed with:

n \cdot C_{n-1}^{k}

That is estimated like follows:

It is obvious that the number of possible combinations is growing faster for the bigger amount of participants.

In naive approach (i.e. brute-force) each combination should be evaluated and compared with the rest (lots of computation).

Additionally, to compare the solutions (to know whether they are good or bad) they have to be somehow measured - this leads to a concept of a fitness function.

Fitness function

A fitness function (f) is generally a function taking a possible combination as an argument and returning a numeric value.

For example a fitness function for a very bad combination (lot's of mismatches) will have a very low score.

In our case the function can return values in the range from 0 (worst) to 1 (best):

0 \leq f(combination) \leq 1

In this case, a fitness function will check two things:

  • do the matched candidates have skills matching our needs?
  • how often is the analyzing person being recommended to others?

The last conditions deal with a problem of rare competencies. There is a possible scenario with a person with rare skills, and lot's of others who are needing it.

Finally, each condition has a weight assigned to it. A fact of fulfilling needs will be more important than the popularity.

 f = 0.7 \cdot needsFulfiled + 0.3 \cdot popularity

Having a way of evaluating the solution we can proceed with the algorithm.


To approach this problem a variation of an evolutionary algorithm (EA) will be used. EA are sort of meta-heuristics based on Darwinian principles of evolution.

Intuitively their workflow looks like follows:

Description of process:

  1. Randomly assign 3 people to each participant
  2. Randomly pick-up two participant
  3. Make the connection if the second one is interesting for the first one (cross-over)
  4. Randomly pick-up two participant and connect them (mutation, happens very rarely)
  5. Go to step 2

Each course from 2-5 is called an epoch or generation. An epoch is represented by the possible solution (list of all participants with their matchings) that can be also represented using a fitness function (average of all individuals).

In step 3, we are randomly selecting two people (cross-over) and trying to match the second one to the first one. A possible individual is being put either as the first, second or third match. In all cases, the overall fitness function is calculated. If a better solution than the current one if found - the matching is performed.

Step 4 represent a mutation - a very rare possibility of accepting a worse solution. The main idea is to introduce some diversity into the solution.

Each generation is evolved with a cross-over followed by a mutation:

The whole source code (written in Groovy) is available here.


The testing was performed in 3 cases:

  • 20 people,
  • 50 people,
  • 100 people

In each one, the initial population (skills, needs, and matches) was randomly generated. In real case scenario, you will have to load this data from other sources (like Excel, DB, or CSV file). Also, there was an upper limit of 5000 generations (all took about 5 seconds to complete).

After looking at the plot you can see some interesting facts:

  1. All cases are starting random solution, which is generally bad (the worst one is for 50 participants ( f \approx 0.52),
  2. All cases are getting very close the perfect matching after 3000 generations,
  3. There was one case of mutation (red line drop, near 5000-th generation),
  4. Learning is slower when the greater number of participants


This experiment concludes that evolutionary algorithm provides a very efficient way of solving match-making problems. They are easy to implement and very extendible to custom restrictions and limitations.

The usage of EA can also provide an extra value for generating online recommendations during the event. It's common that some of the participants are absent which is disturbing others expectations.