How to perform networking match-making?

Overview

Networking events are becoming more and more popular. People are attending them expecting to find someone capable of solving their concerns.

The problem of telling "who should talk to who" is relatively easy when the number of attendees is low. In this case, conveyors can do the matching manually. However when the number of people (or a variety of skills) increases things beginning to complicate.

Desired scenario

Let's consider the following plot:

Before an event, each person fills-in the form describing his skills and needs

For example:

Later on, each person receives the information telling which people are somehow especially valuable and should talk to.

Problem

The main question is: "Who should each person talk to?"

To estimate the difficulty of the problem let's assume that there are (n) people and each one receives only three recommendations (k=3).

This gives a total number of combinations expressed with:

n \cdot C_{n-1}^{k}

That is estimated like follows:

It is obvious that the number of possible combinations is growing faster for the bigger amount of participants.

In naive approach (i.e. brute-force) each combination should be evaluated and compared with the rest (lots of computation).

Additionally, to compare the solutions (to know whether they are good or bad) they have to be somehow measured - this leads to a concept of a fitness function.

Fitness function

A fitness function (f) is generally a function taking a possible combination as an argument and returning a numeric value.

For example a fitness function for a very bad combination (lot's of mismatches) will have a very low score.

In our case the function can return values in the range from 0 (worst) to 1 (best):

0 \leq f(combination) \leq 1

In this case, a fitness function will check two things:

  • do the matched candidates have skills matching our needs?
  • how often is the analyzing person being recommended to others?

The last conditions deal with a problem of rare competencies. There is a possible scenario with a person with rare skills, and lot's of others who are needing it.

Finally, each condition has a weight assigned to it. A fact of fulfilling needs will be more important than the popularity.

 f = 0.7 \cdot needsFulfiled + 0.3 \cdot popularity

Having a way of evaluating the solution we can proceed with the algorithm.

Algorithm

To approach this problem a variation of an evolutionary algorithm (EA) will be used. EA are sort of meta-heuristics based on Darwinian principles of evolution.

Intuitively their workflow looks like follows:

Description of process:

  1. Randomly assign 3 people to each participant
  2. Randomly pick-up two participant
  3. Make the connection if the second one is interesting for the first one (cross-over)
  4. Randomly pick-up two participant and connect them (mutation, happens very rarely)
  5. Go to step 2

Each course from 2-5 is called an epoch or generation. An epoch is represented by the possible solution (list of all participants with their matchings) that can be also represented using a fitness function (average of all individuals).

In step 3, we are randomly selecting two people (cross-over) and trying to match the second one to the first one. A possible individual is being put either as the first, second or third match. In all cases, the overall fitness function is calculated. If a better solution than the current one if found - the matching is performed.

Step 4 represent a mutation - a very rare possibility of accepting a worse solution. The main idea is to introduce some diversity into the solution.

Each generation is evolved with a cross-over followed by a mutation:

The whole source code (written in Groovy) is available here.

Results

The testing was performed in 3 cases:

  • 20 people,
  • 50 people,
  • 100 people

In each one, the initial population (skills, needs, and matches) was randomly generated. In real case scenario, you will have to load this data from other sources (like Excel, DB, or CSV file). Also, there was an upper limit of 5000 generations (all took about 5 seconds to complete).

After looking at the plot you can see some interesting facts:

  1. All cases are starting random solution, which is generally bad (the worst one is for 50 participants ( f \approx 0.52),
  2. All cases are getting very close the perfect matching after 3000 generations,
  3. There was one case of mutation (red line drop, near 5000-th generation),
  4. Learning is slower when the greater number of participants

Conclusion

This experiment concludes that evolutionary algorithm provides a very efficient way of solving match-making problems. They are easy to implement and very extendible to custom restrictions and limitations.

The usage of EA can also provide an extra value for generating online recommendations during the event. It's common that some of the participants are absent which is disturbing others expectations.