Maximise the number of new encounters between guests at a meeting - sql

I am hosting a series of meetings. There are 24 guests. Two or three times each meeting, we break out into different subgroups of between 2-6 people. I would like to maximise the new encounters so I am looking for an algorithm to help me make new matches, so everyone can meet everyone else.
My current idea is to record the data in Google Sheets and then use the QUERY function to analyse the data. (QUERY is very similar to SQL syntax.)
Here's the 'Round1' table:
subgroup1 | subgroup2
==========|==========
Adam | Edith
Ben | Fran
Chris | Gary
Dave |
And the table for 'Round2' looks like this:
subgroup1 | subgroup2 | subgroup3
==========|============|===========
Adam | Ben | Dave
Gary | Fran | Edith
Chris | |
What I want to do is consume that data and output a chart like this which shows me who has met whom:
Adam Ben Chris Dave Edith Fran Gary
Adam X TRUE TRUE TRUE FALSE FALSE TRUE
Ben TRUE X TRUE TRUE FALSE TRUE FALSE
Chris TRUE TRUE X TRUE FALSE FALSE TRUE
Dave etc...
Edith etc...
Fran etc..
Gary etc...
Could anyone help me to think through how I can use QUERY/SQL to turn those input tables into that output chart?

If you look in cell B3 on the new tab on your shared sheet called MK.Help, you'll find this single formula:
=ARRAYFORMULA(IF((B2:2="")+(ROW(A3:A)>COLUMN(B2:2)),,COUNTIF(QUERY(TRANSPOSE('One table'!A:E),,9),"*"&A3:A&"*"&B2:2&"*")+COUNTIF(QUERY(TRANSPOSE('One table'!A:E),,9),"*"&B2:2&"*"&A3:A&"*")))
That is telling you the counts for a heat map based on the tab called OneTable and nothing else. It will self populate indefinitely as you add groups to the OneTable tab.
Is that what you're going for?

I have come up with an answer, with a very laborious process! Maybe if I explain it here, someone can help me think how to make this simpler.
I start with the example data in this sheet.
In the original question I described one table per round, but I don't care in which round someone met, so I can combine all that data into one table. So in this table, each row shows one of the subgroups:
Then I do a long sequence of match(), transpose(), filter(), join(), split(), until I get this table which shows who has met whom:
From there, it is not so hard to generate the output chart I was looking for
This works! But it is not very elegant. I will love to make this simpler.

Related

End user column sort - PowerBi

Wanted to know if there is a way where the end user can custom sort a column
For example we can add a conditional column sort like below and can use 'sort by column' option to custom sort a column. Is there a way where we can add a parameter in M and end user can switch between different sorting for the column? For example if he
Zone |Sort
North | 1
South | 2
Central | 3
East | 4
West | 5
There is a new Dynamic M query parameter in preview in power bi march update but couldn't make it to work
If it cannot be achieved via parameters then what would be the best approach?

Vlookup to Make a list?

This site has been super helpful, thank you to everyone who has answered my questions. Here is the next one I am working on. Not sure if I should use vlookup, hlookup, a combination of both or something else.
So I have a list of teams with lineups
Team
Player
A
Sam
A
Chris
A
Tom
A
Scott
B
Mark
B
Dan
B
Greg
B
Ben
C
Sara
C
Beth
C
Luara
C
Britt
On a separate page I am trying to fill in a line up "IF" a team is selected.
For reference this is the current formula I have been trying:
=IFERROR(INDEX('Team LineUps'!$B:$B,Match(0,COUNTIF($C$16,IF('Team LineUps'!$A:$A=$C$16,'Team LineUps'!$B:$B,$C$16)),0)),"")
This will get me The first player on the list for a team. If I change the 0 to a 1 it will get me the last player on the team. How can I/ Can I? get the entire list 1-4? Or is it only a "true" OR "False"
Answer:
Use a QUERY.
Formula:
=QUERY('Team LineUps'!A2:B13, "SELECT B WHERE A='"&B4&"'")
Example Usage:

Returning a value based on multiple conditions in excel

Consider the following data:
Item | Overall | Individual | newColumn
A | Fail | Pass | blank
A | Fail | Fail | blank
B | Fail | Pass | issue
B | Fail | Pass | issue
C | Pass | Pass | blank
I have the logic built out for the first 3 columns already. There are two levels of fails in this data:
overall, and
individual.
If any of the individual fail, the overall fails. Sometimes the overall can fail even though all the individuals are fine. This logic is already built out.
I am trying to find a formula for the newColumn. If all the individuals are a pass for a given item (example item B), but the overall is still a fail, the cell should return the text "issue". It is ok if it returns issue twice, not sure if you can non-dupe that part. I've tried various forms of countifs/and/ors and creating columns that count distinct values but I always find a scenario where it will break the logic.
Try this:
=IF(COUNTIFS($A$2:$A$6,A2,$C$2:$C$6,"Fail"),"blank",IF(B2="Fail","Issue","blank"))
As required
If you add a new column with the formula:
=IF(B2="Fail",IF(COUNTIFS(A:A,A2,C:C,"fail")=0,"issue",""),"")
Then this should work on the assumptions:
For each item if one of the overalls are false they are all false
The only two possible values are "Pass" and "Fail" for columns B & C
If you require the word blank instead of a blank cell then use:
=IF(B2="Fail",IF(COUNTIFS(A:A,A2,C:C,"fail")=0,"issue","blank"),"blank")

Value to table header in Pentaho

Hi I'm quite new in Pentaho Spoon and I have a problem:
I have a table like this:
model | type | color| q
--1---| --1-- | blue | 1
--1---| --2-- | blue | 2
--1---| --1-- | red | 1
--1---| --2-- | red | 3
--2---| --1-- | blue | 4
--2---| --2-- | blue | 5
And I would like to create a single table (to export in csv or excel) for each model grouped by type with the value of the group as header and as value the q value:
table-1.csv
type | blue | red
--1--| -1-- | -1-
--2--| -2-- | -3-
table-2.csv
type | blue
--1--| -4-
--2--| -5-
I tried with row denormalizer but nothing.
Any suggestion?
Typically it's helpful to see what you have done in order to offer help, but I know how counterintuitive the "help" on this step is.
Make sure you sort the rows on Model and Type before sending them to the denormalizer step. Then give this a try:
As for splitting the output into files, there are a few ways to handle that. Take a look at the Switch/Case step using the Model field.
Also, if you haven't found them already, take a look at the sample files that come with the PDI download. They should be in ...pdi-ce-6.1.0.1-196\data-integration\samples. They can be more helpful than the online documentation sometimes.
Row denormalizer can't be used here if number of colors is unknown, also, you can't define text output fields dynamically.
There are few ways that I can see without using java and js steps. One of them is based on the following idea: we can prepare rows with two columns:
Row Model
type|blue|red 1
1|1|1 1
2|2|3 1
type|blue 2
1|4 2
2|5 2
Then we can prepare filename for each row using Model field and then easily output all rows using text output where file name is taken from filename field. In this case all records will be exported into two files without additional efforts.
Here you can find sample transformation: copy-paste me into new transformation
Please note that it's a sample solution that works only with csv. Also it works only if you have the same number of colors for each type inside model. It's just a hint how to use spoon, it's not a complete solution.

Nearest Neighbor Search on large database table - SQL and/or ArcGis

Sorry for posting something that's probably obvious, but I don't have much database experience. Any help would be greatly appreciated - but remember, I'm a beginner :-)
I have a table like this:
Table.fruit
ID type Xcoordinate Ycoordinate Taste Fruitiness
1 Apple 3 3 Good 1,5
2 Orange 5 4 Bad 2,9
3 Apple 7 77 Medium 1,4
4 Banana 4 69 Bad 9,5
5 Pear 9 15 Medium 0,1
6 Apple 3 38 Good -5,8
7 Apple 1 4 Good 3
8 Banana 15 99 Bad 6,8
9 Pear 298 18789 Medium 10,01
… … … … … …
1000 Apple 1344 1388 Bad 5
… … … … … …
1958 Banana 759 1239 Good 1
1959 Banana 3 4 Medium 5,2
I need:
A table that gives me
The n (eg.: n=5) closest points to EACH point in the original table, including distance
Table.5nearest (please note that the distances are fake). So the resulting table has ID1, ID2 and distance between ID1 and ID2 (can't post images yet, unfortunately).
ID.Fruit1 ID.Fruit2 Distance
1 1959 1
1 7 2
1 2 2
1 5 30
1 14 50
2 1959 1
2 1 2
… … …
1000 1958 400
1000 Xxx Xxx
… … …
How can I do this (ideally with SQL/database management) or in ArcGis or similar? Any ideas?
Unfortunately, my table contains 15000 datasets, so the resulting table will have 75000 datasets if I choose n=5.
Any suggestions GREATLY appreciated.
EDIT:
Thank you very much for your comments and suggestions so far. Let me expand on it a little:
The first proposed method is sort of a brute-force scan of the whole table rendering huge filesizes or, likely, crashes, correct?
Now, the fruit is just a dummy, the real table contains a fix ID, nominal attributes ("fruit types" etc), X and Y spatial columns (in Gauss-Krueger) and some numeric attributes.
Now, I guess there is a way to code a "bounding box" into this, so the distances calculation is done for my point in question (let's say 1) and every other point within a square with a certain edge length. I can imagine (remotely) coding or querying for that, but how do I get the script to do that for EVERY point in my ID column. The way I understand it, this should either create a "subtable" for each record/point in my "Table.Fruit" containing all points within the square around the record/point with a distance field added - or, one big new table ("Table.5nearest"). I hope this makes some kind of sense. Any ideas? THanks again
To get all the distances between all fruit is fairly straightforward. In Access SQL (although you may need to add parentheses everywhere to get it to work :P):
select fruit1.id,
fruit2.id,
sqr(((fruit2.xcoordinate - fruit1.xcoordinate)^2) + ((fruit2.ycoordinate - fruit1.ycoordinate)^2)) as distance
from fruit as fruit1
join fruit as fruit2
on fruit2.id <> fruit1.id
order by distance;
I don't know if Access has the necessary sophistication to limit this to the "top n" records for each fruit; so this query, on your recordset, will return 225 million records (or, more likely, crash while trying)!
Thank you for your comments so far; in the meantime, I have gone for a pre-fabricated solution, an add-in for ArcGis called Hawth's Tools. This really works like a breeze to find the n closest neighbors to any point feature with an x and y value. So I hope it can help someone with similar problems and questions.
However, it leaves me with a more database-related issue now. Do you have an idea how I can get any DBMS (preferably Access), to give me a list of all my combinations? That is, if I have a point feature with 15000 fruits arranged in space, how do I get all "pure banana neighborhoods" (apple, lemon, etc.) and all other combinations?
Cheers and best wishes.