QBO3 Matrix "All" values - fuzzy-logic

I have a QBO3 Matrix with a single Investor input, and a bunch of outputs:
| Investor | Output1 | Output2 | Output3 |
| -------- | ------- | ------- | ------- |
| 1,2,3 | A | B | C |
| 4,5 | D | E | F |
| 6,7,8 | D | B | G |
...
We now want to add a new "LoanType" input.
Is there a way to have the investor Code field count for ‘ALL’?
If I leave the field blank and select ‘Do Not Match’ will that apply to anything but NULL in that field?
Is there a way to only account for the Investors already listed in the Matrix without having to list them all in the Investor Code field?

TLDR; use input weights to get your desired result.
Assuming you added a LoanType input with the following rows:
| Investor | LoanType | Output1 | Output2 | Output3 |
| -------- | -------- | ------- | ------- | ------- |
| 1,2,3 | | A | B | C | row 1
| 4,5 | | D | E | F | row 2
| 6,7,8 | | D | B | G | row 3
| | Jumbo | H | I | F | row 4
...
Is there a way to have the Investor field count for 'ALL'?
Yes. Row 4 has no Investor specified, so all Jumbo loans would consider row 4 a possible match, regardless of investor.
If I leave the field blank and select ‘Do Not Match’ will that apply
to anything but NULL in that field?
Yes, but this is not recommended. If you do this, it will match a NULL investor, and it will match any "valid" investor value. This is equivalent to simply not entering anything at all.
Is there a way to only account for the Investors already listed in the
Matrix without having to list them all in the Investor Code field?
Yes, using weights.
I'm inferring from your question that if you have:
Investor = 27 (where Investor 27 does not appear in any other rows in the Matrix)
LoanType = "Jumbo"
you want row 4, but if you have:
Investor = 8 (where Investor 8 appears in another row in the Matrix), and
LoanType = "Jumbo"
you want to match row 3.
If this assumption is correct, you just need to set the Investor input weight to be higher than the LoanType weight. For example:
Investor.Weight = 10
LoanType.Weight = 5
In this scenario, given the inputs Investor = 8 and LoanType = "Jumbo", you would have:
| Investor | LoanType | Output1 | Output2 | Output3 | Weight |
| -------- | -------- | ------- | ------- | ------- | ------ |
| 6,7,8 | | D | B | G | 10 |
| | Jumbo | H | I | F | 5 |
| 1,2,3 | | A | B | C | 0 |
| 4,5 | | D | E | F | 0 |
...
Thus, your Investor match outweighs your LoanType match.
Lastly, if you had a rare use case where Investor = 2 and LoanType = "Jumbo" should result in Jumbo results, you can just add a row for that use case:
| Investor | LoanType | Output1 | Output2 | Output3 |
| -------- | -------- | ------- | ------- | ------- |
| 1,2,3 | | A | B | C | row 1
| 4,5 | | D | E | F | row 2
| 6,7,8 | | D | B | G | row 3
| | Jumbo | H | I | F | row 4
| 2 | Jumbo | H | I | F | row 5
...

Related

Inserting set of rows for every ID in another table

this is an initial table (this is just a part of a larger table where Article ID's can vary), database is MS Sql.
-----------------------------------
|ArticleID | GroupID |
-----------------------------------
| 1 | NULL |
-----------------------------------
| 2 | NULL |
-----------------------------------
| 3 | NULL |
-----------------------------------
| 4 | NULL |
-----------------------------------
Set of rows that should be entered for each ArticleID looks something like this:
------------------------
| GroupID |
------------------------
| A |
------------------------
| B |
------------------------
| C |
------------------------
| D |
------------------------
Result table should look something like this:
-----------------------------------
|ArticleID | GroupID |
-----------------------------------
| 1 | NULL |
-----------------------------------
| 1 | A |
-----------------------------------
| 1 | B |
-----------------------------------
| 1 | C |
-----------------------------------
| 1 | D |
-----------------------------------
| 2 | NULL |
-----------------------------------
| 2 | A |
-----------------------------------
| 2 | B |
-----------------------------------
| 2 | C |
-----------------------------------
| 2 | D |
-----------------------------------
| 3 | NULL |
-----------------------------------
| 3 | A |
-----------------------------------
| 3 | B |
-----------------------------------
| 3 | C |
-----------------------------------
| 3 | D |
-----------------------------------
| 4 | NULL |
-----------------------------------
| 4 | A |
-----------------------------------
| 4 | B |
-----------------------------------
| 4 | C |
-----------------------------------
| 4 | D |
-----------------------------------
Any suggestion how to insert it efficiently?
Thanks a lot for you suggestion.
Regards
This is a cross join between two sets.
with a as (
select * from(values (1),(2),(3),(4))v(ArticleId)
), g as (
select * from(values (null),('A'),('B'),('C'),('D'))v(GroupId)
)
select *
from a cross join g;
To insert into the original table you could do:
with g as (select * from(values('A'),('B'),('C'),('D'))v(GroupId))
insert into t
select t.ArticleId, g.GroupId
from t cross join g;
See Example Fiddle

How a list can enter into a python dataframe's cell?

I have some dataframes (df, tmp_df and final_df) and I want to enter two columns of tmp_df into two differenrt cells of final_df as list type. My code and dataframes are presented as follow. However, the loop part is not working correctly. The other questions in stackoverflow or other websites, answer this question if there is an available dictionary for the lists from the beginning of the program. But here, the tmp_df dataframe changes during the for loop and at each iteration suitable prices are calculated. Also, the most related data are founded and they must locate as a realted cell of final_df.
import pandas as pd
df = pd.read_csv('myfile.csv')
tmp_df = pd.DataFrame()
final_df = pd.DataFrame()
tmp_df = df[df['Type'] == True]
cnt = 0
for c in tmp_df['Category']:
#################
# Apply some calculations and call different methods to do some changes on Price column of tmp_df.
#################
final_df.at[cnt,'Data'] = list(set(tmp_sub['Data']))
final_df ['Category'], final_df['Acceptable'], final_df['Rank'],final_df['Price'] = \
tmp_df['Rank'], list(tmp_sub['Price'])
cnt +=1
df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275 | A | Yes | 1 | 52787 |
| 35881 | C | No | 2 | 14804 |
| 28129 | C | Yes | 3 | 180543|
| 30274 | D | No | 2 | 8066 |
| 30351 | D | Yes | 3 | 273478|
| 35886 | A | Yes | 2 | 10808 |
| 39900 | C | Yes | 1 | 21893 |
| 35887 | A | No | 2 | 2244 |
| 35883 | A | Yes | 1 | 10066 |
| 35856 | D | Yes | 3 | 19011 |
| 35986 | C | No | 2 | 6895 |
| 30350 | D | No | 3 | 5243 |
| 28129 | C | Yes | 1 | 112859|
| 31571 | C | Yes | 1 | 20701 |
tmp_df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275 | A | Yes | 1 | 52787 |
| 38129 | C | Yes | 3 | 180543|
| 30351 | D | Yes | 3 | 273478|
| 35886 | A | Yes | 2 | 10808 |
| 39900 | C | Yes | 1 | 21893 |
| 35883 | A | Yes | 1 | 10066 |
| 35856 | D | Yes | 3 | 19011 |
| 28129 | C | Yes | 1 | 112859|
| 31571 | C | Yes | 1 | 20701 |
The prices in the final dataframe (final_df) are changed because of the calculations over the tmp_df. Now, what should I do if I want the following result?
final_df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| [30275,35886,35883] | A | Yes | [1,2]| 195543|
| [28129,39900,38129,31571] | C | Yes | [1,3]| 210089|
| [30351,35856] | D | Yes | 3 | 113859|
You can aggregate list and for Price another aggregation function, e.g. sum, mean...:
#generate custom function for Price
def func(x):
return x.sum()
d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':func}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
.agg(d)
.reindex(tmp_df.columns, axis=1))
d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':'max'}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
.agg(d)
.reindex(tmp_df.columns, axis=1))
print (final_df)
Data Category Acceptable Rank Price
0 [30275, 35886, 35883] A Yes [1, 2] 52787
1 [38129, 39900, 28129, 31571] C Yes [1, 3] 180543
2 [30351, 35856] D Yes [3] 273478
Solution with custom function:
def func1(x):
return x.sum()
def f(x):
a = list(x['Data'])
b = list(set(x['Rank']))
c = func1(x['Price'])
return pd.Series({'Data':a,'Rank':b,'Price':c})
final_df = (tmp_df.groupby(['Category','Acceptable'])
.apply(f)
.reset_index()
.reindex(tmp_df.columns, axis=1))

SQL Group related strings when overlap (transitive relationship?)

I have the following result sets:
Those values come from a relational table of
ProductId, GroupId
1 | 4
2 | 4
2 | 5
3 | 4
3 | 5
CategoryId | ProductId
1 | 1
1 | 2
1 | 3
All the following "Id" are from the category of those produtcts
Example 1: Example 2: Example 3:
|Id |Group| |Id |Group | |Id |Group |
----------- --------------- ---------------
| 1 | 4 | | 1 | 4,5 | | 1 | 3,5 |
| 1 | 4,5 | | 1 | 3,4,5,6 | | 1 | 3,4,5,6 |
| 1 | 5,7 | | 1 | 5,7 | | 1 | 4,5 |
----------- --------------- ---------------
I need to process those tables to get the following results
Result 1: Result 2: Result 3:
|Id |Group| |Id |Group | |Id |Group |
----------- --------------- ---------------
| 1 | 4,5 | | 1 | 3,4,5,6 | | 1 | 3,4,5,6 |
| 1 | 4,5 | | 1 | 3,4,5,6 | | 1 | 3,4,5,6 |
| 1 | 5,7 | | 1 | 5,7 | | 1 | 3,4,5,6 |
----------- --------------- ---------------
Explanation for that, those columns indicate where the price of some item should be placed, and all related prices should be in the same table if possible, so when a group can be joined with other it should result in empty spaces for the columns that weren't originally for that product so:
Using example 1 this is the final result:
| G4 | G5 |
--------------------
Product1 | 10 | |
Product2 | | 15 |
Product3 | 14 | 18 |
--------------------
| G5 | G7 |
--------------------
Product1 | 10 | 25 |
Product2 | | 15 |
--------------------
Using the example 3 this is the final result:
| G3 | G4 | G5 | G6 |
------------------------------
Product1 | 10 | | 15 | 20 |
Product2 | | | 17 | |
Product3 | 14 | 18 | | |
------------------------------
But I'm completly clueless on how to do those group joins (the empty spaces in the result set is not a problem.
I don't fully understand the problem yet but this might help you determine which sets of groups are subsets of larger ones. As I best I can tell that's the transitive relationship you indicated in the title.
select t1.id as id1, t2.id as id2
from T t1 full outer join T t2 on t2.grp = t1.grp
group by t1.id, t2.id
having count(distinct t1.grp) < count(distinct t2.grp) and count(t1.id) = count(*)
Now that you have an adjacency list or a hierarchy you could try some approaches as in this question to find all the "maximal" or top-level sets: Finding a Top Level Parent in SQL
If you have a limit on the number of possible groups then Gordon's answer there may be sufficient without all the recursive complications.

Joining to tables while linking with a crosswalk table

I am trying to write a query to de identify one of my tables. To make distinct ids for people, I used name, age and sex. However in my main table, the data has been collected for years and the sex code changed from 1 meaning male and 2 meaning female to M meaning male and F meaning female. To make this uniform in my distinct individuals table I used a crosswalk table to convert the sexcode into to the correct format before placing it into the distinct patients table.
I am now trying to write the query to match the distinct patient ids to their correct the rows from the main. table. The issue is that now the sexcode for some has been changed. I know I could use an update statement on my main table and changes all of the 1 and 2 to the m and f. However, I was wondering if there was a way to match the old to the new sexcodes so I would not have to make the update. I did not know if there was a way to join the main and distinct ids tables in the query while using the sexcode table to convert the sexcodes again. Below are the example tables I am currently using.
This is my main table that I want to de identify
----------------------------
| Name | age | sex | Toy |
----------------------------
| Stacy| 30 | 1 | Bat |
| Sue | 21 | 2 | Ball |
| Jim | 25 | 1 | Ball |
| Stacy| 30 | M | Ball |
| Sue | 21 | F | glove |
| Stacy| 18 | F | glove |
----------------------------
Sex code crosswalk table
-------------------
| SexOld | SexNew |
-------------------
| M | M |
| F | F |
| 1 | M |
| 2 | F |
-------------------
This is the table I used to to populate IDs for people I found to be distinct in my main table
--------------------------
| ID | Name | age | sex |
--------------------------
| 1 | Stacy| 30 | M |
| 2 | Jim | 25 | M |
| 3 | Stacy| 18 | F |
| 4 | Sue | 21 | F |
--------------------------
This what I want my de identified table to look like
---------------
| ID | Toy |
---------------
| 1 | Bat |
| 4 | Ball |
| 2 | Ball |
| 1 | Ball |
| 4 | glove |
| 3 | glove |
---------------
select c.ID, a.Toy
from maintable a
left join sexcodecrosswalk b on b.sexold = a.sex
left join peopleids c on c.Name = a.Name and c.age = a.age and c.Sex = b.sexNew
Here's a demonstration that this works:
http://sqlfiddle.com/#!3/a2d26/1

Map column data to matching rows

I have a sheet like this:
| A | B | C | D | E | F | G | H | ...
---------------------------------
| a | 1 | | b | 2 | | c | 7 |
---------------------------------
| b | 2 | | c | 8 | | b | 4 |
---------------------------------
| c |289| | a | 3 | | a |118|
---------------------------------
| d | 6 | | e | 3 | | e |888|
---------------------------------
| e | 8 | | d |111| | d |553|
---------------------------------
I want the sheet to become like this:
| A | B | C | D | E | F | G | H | ...
---------------------------------
| a | 1 | 3 |118| | | | |
---------------------------------
| b | 2 | 2 | 4 | | | | |
---------------------------------
| c |289| 8 | 7 | | | | |
---------------------------------
| d | 6 |111|553| | | | |
---------------------------------
| e | 8 | 3 |888| | | | |
---------------------------------
Col A, Col B and Col G have letters which are unique, and in the col next to it it has weights.
To make it even more clear,
| A | B |
---------
| a | 1 |
---------
| b | 2 |
---------
| c |289|
...
are the weights of a,b,c... in January
Similarly | D | E | are weights of a,b,c... in July and | G | H | are weights of a,b,c... in December
I need to put them side-by-side for comparison, the thing is they are NOT in order.
How do I approach this?
UPDATE
There are thousands of a,b,c, aa, bb, cc, aaa, avb, as, saf, sfa etc.. and some of them MAY be present in January (Col A) and not in July (Col D)
Something like this
code
Sub Squeeze()
[c1:c5] = Application.Index([E1:E5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,D1:D5,0),A1:A5)"), 1)
[d1:d5] = Application.Index([H1:h5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,G1:G5,0),A1:A5)"), 1)
[e1:h5].ClearContents
End Sub
Explanation of first line
Application.Index([E1:E5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,D1:D5,0),A1:A5)"), 1)
The MATCH returns a VBA array matching the positions (5) of A1:A5 against D1:D5
INDEX then returns the corresponding values from E1:E5
So to use the key column of A1:A100 against M1:100 with values in N1:100
Application.Index([N1:N100], Evaluate("IF(A1:A100<>"""",MATCH(A1:A100,M1:M100,0),A1:A100)"), 1)
Extend as necessary: Sort D:E by D ascending, sort G:H by G ascending, delete G,F,D,C. If you want VBA, do this with Record Macro selected.