Today’s puzzle - معماى امروز

spinhead

Elite Member
Oct 24, 2002
2,124
201
United States of Amnesia
This sounds like a home work problem in machine learning or information theory or some field like that. My suggestion would be to define information as entropy and ask the question (place splitting thresholds) so that after splitting the set you gain maximum information (greatest reduction in entropy).

PS.
Entropy is defined as:

E = -\sum (1/p_i) * log(p_i)
 

IEI

Administrator
Staff member
Nov 10, 2002
14,508
3,341
This sounds like a home work problem in machine learning or information theory or some field like that. My suggestion would be to define information as entropy and ask the question (place splitting thresholds) so that after splitting the set you gain maximum information (greatest reduction in entropy).

PS.
Entropy is defined as:

E = -\sum (1/p_i) * log(p_i)
You might be able to formulate this as a likelihood or entropy but this might be of your interest:

1646774172908.png
 

spinhead

Elite Member
Oct 24, 2002
2,124
201
United States of Amnesia
Cool. It would be interesting to see if the entropy approach would be equivalent to their solutions. The entropy reduction strategy is pretty standard these days in building decision trees in machine learning.
P.S. If Huffman's method is not the same and the entropy method, it would be worth investigating to see if there are any advantages in using Huffman's method for building decision trees.
 
Dec 30, 2014
899
356
This question is a bit vague. There are two different ways to interpret this:

1- What is the "no more than" number of questions needed regardless of the number picked.

2- What is the average number of questions needed.

I may be wrong on this, but in case one, the answer is the same : 4. The reason is simple: As the previous paragraph in the question states, if a person picks a number between 0-9, you would need no more than 4 questions. it could be less, depending on the number picked, but will be no more than 4. Now just imagine that instead of a person picking the number, you are picking the number out of the deck. Yes, the probability of picking a specific number will be different, but once the number is picked, it is a number, and you will need no more than 4 questions.

2- Case 2 is when you are looking at the "expected" number of questions you have to ask. The question phrases it a bit vaguely as it says "minimize the number of questions you probably have to ask. This then related to the "expected" or probability weighed number of questions. In this case, one will need to sum up the probabilities for each number, i.e. 9: 9/45 8: 8/45 7.... and then arrange them in two buckets that will give more or less even probabilities, and then repeat again. Each time phrasing the question in a way that you have two buckets of equal (or near equal) probabilities.

So for example, for the first question, you will ask is the number 9, 8, or 6? The probability of YES is 23/45. The probability of being NO is 22/45. Next questions will have similar approach.
 

IEI

Administrator
Staff member
Nov 10, 2002
14,508
3,341
This question is a bit vague. There are two different ways to interpret this:

1- What is the "no more than" number of questions needed regardless of the number picked.

2- What is the average number of questions needed.

I may be wrong on this, but in case one, the answer is the same : 4. The reason is simple: As the previous paragraph in the question states, if a person picks a number between 0-9, you would need no more than 4 questions. it could be less, depending on the number picked, but will be no more than 4. Now just imagine that instead of a person picking the number, you are picking the number out of the deck. Yes, the probability of picking a specific number will be different, but once the number is picked, it is a number, and you will need no more than 4 questions.

2- Case 2 is when you are looking at the "expected" number of questions you have to ask. The question phrases it a bit vaguely as it says "minimize the number of questions you probably have to ask. This then related to the "expected" or probability weighed number of questions. In this case, one will need to sum up the probabilities for each number, i.e. 9: 9/45 8: 8/45 7.... and then arrange them in two buckets that will give more or less even probabilities, and then repeat again. Each time phrasing the question in a way that you have two buckets of equal (or near equal) probabilities.

So for example, for the first question, you will ask is the number 9, 8, or 6? The probability of YES is 23/45. The probability of being NO is 22/45. Next questions will have similar approach.
Pursuit number 2.
 

spinhead

Elite Member
Oct 24, 2002
2,124
201
United States of Amnesia
You might be able to formulate this as a likelihood or entropy but this might be of your interest:

View attachment 45282
I believe there are (2^(n-1) - 1) ways to divide a set of n objects into two nonempty disjoint sets:

Example 1: For n=3, there are (2^2-1)=3 ways:
{1} U {2,3}
{2} U {1,3}
{3} U {1,2}

Example 2: For n=4, there are (2^3 - 1) = 7 ways:

{1} U {2,3,4}
{2} U {1,3,4}
{3} U {1,2,4}
{4} U {1,2,3}
{1,2} U {3,4}
{1,3} U {2,4}
{1,4} U {2,3}

So for n=9, there are (2^8 - 1)=255 ways. So one of the possible 255 questions could be:

Is the number in {1, 3, 7} or {2, 4, 5, 6, 8, 9}?

I believe the issue here is that the number of questions grows exponentially!!!!

So, while for n=9, it may be easy to compute the entropy reduction for ALL 255 questions and choose the one which maximizes the entropy reduction. This would be the brute force approach and not the most efficient.

It appears that Huffman came of with the most efficient method for doing this, which would matter a lot for large n's.
 

IEI

Administrator
Staff member
Nov 10, 2002
14,508
3,341
I believe there are (2^(n-1) - 1) ways to divide a set of n objects into two nonempty disjoint sets:

Example 1: For n=3, there are (2^2-1)=3 ways:
{1} U {2,3}
{2} U {1,3}
{3} U {1,2}

Example 2: For n=4, there are (2^3 - 1) = 7 ways:

{1} U {2,3,4}
{2} U {1,3,4}
{3} U {1,2,4}
{4} U {1,2,3}
{1,2} U {3,4}
{1,3} U {2,4}
{1,4} U {2,3}

So for n=9, there are (2^8 - 1)=255 ways. So one of the possible 255 questions could be:

Is the number in {1, 3, 7} or {2, 4, 5, 6, 8, 9}?

I believe the issue here is that the number of questions grows exponentially!!!!

So, while for n=9, it may be easy to compute the entropy reduction for ALL 255 questions and choose the one which maximizes the entropy reduction. This would be the brute force approach and not the most efficient.

It appears that Huffman came of with the most efficient method for doing this, which would matter a lot for large n's.
I love the way you thinking, of course, Entropy and information coding are directly related.
 

spinhead

Elite Member
Oct 24, 2002
2,124
201
United States of Amnesia
I love the way you thinking, of course, Entropy and information coding are directly related.
Thanks. I have a feeling if the probabilities are order, such as in this case: p1<p2<p3<...<p9
Then the maximum (expected) reduction in entropy would be achieved by one of the 8 questions:
{1} vs {2 ... 9}
{1,2} vs {3 ... 9}
(1,2,3} vs {4 ... 9}
etc.

But this is conjecture and has to be proven. If true, then out of the possible 255 question, we only have to try 8 and see which was results in the maximum expected reduction in entropy.

I'm just thinking out loud here.