Im having a hard time looping using sql, the table Im using resources references itself by having a parent column so I want to do the following logic while creating the view in sql but I'm stuck...
since im using a left join between my auth_permissions table and resource table, we are always looking at a row of type 'table'
- for the current row in table resources which is always of type 'table'
- give me the rows parent which is in position resources[4]
- then go to the row where id is equal to resources[4] which is of type 'schema'. spit out the 'name' column
- then go to that rows parent which is of type 'database' and spit out the 'name' column
What my current sql query looks like
CREATE OR REPLACE VIEW dev.permissions
AS select
left(rls.name, length(rls.name) - 5) AS "users",
res.name as "table",
ap.p_create as "insert",
ap.p_read as "select",
ap.p_update as "update",
ap.p_delete as "delete"
FROM dev.auth_permissions ap
LEFT JOIN dev.roles rls ON rls.id = ap.role_id
LEFT JOIN dev.resources res ON res.id = ap.resource_id
The result of my SQL query
users table insert select update delete
student midterm false true false false
student midterm false true false false
teacher midterm true true true true
teacher midterm false true true true
sub midterm false true false false
sub midterm false true false false
This is what I want my view to look like / the final SQL query result
users database schema table insert select update delete
student PrivateDatabase English midterm false true false false
student PrivateDatabase Math midterm false true false false
teacher PrivateDatabase English midterm true true true true
teacher PrivateDatabase Math midterm false true true true
sub PrivateDatabase English midterm false true false false
sub PrivateDatabase Math midterm false true false false
The tables im using to create the view look like so
roles Table
id code name description
1 245 student null
2 411 teacher null
3 689 sub null
resources Table, the parent column references itself so that I can track the parent of a table or schema. But a database has no parent since its at the top of the hierarchy
id origin type name parent
1 PrivateDatabase database summer2020 null
2 PrivateDatabase schema English 1
3 PrivateDatabase table midterm 2
4 PrivateDatabase schema Math 1
5 PrivateDatabase table midterm 4
auth_permissions Table, role_id reference the roles table and resources reference the resources table
id role_id resource_id p_create p_read p_update p_delete
1 1 3 false true false false
2 1 5 false true false false
3 2 3 true true true true
4 2 5 true true true true
5 3 3 false true false false
6 3 5 false true false false
THANK YOU IN ADVANCE! Any feedback is appreciated!
You need to join the resources table twice more in your query to fetch the schema and database names. In each case you join the id to the previous instantiations parent value:
CREATE OR REPLACE VIEW permissions AS
SELECT
rls.name AS "users",
res3.name as "database",
res2.name as "schema",
res.name as "table",
ap.p_create as "insert",
ap.p_read as "select",
ap.p_update as "update",
ap.p_delete as "delete"
FROM auth_permissions ap
LEFT JOIN roles rls ON rls.id = ap.role_id
LEFT JOIN resources res ON res.id = ap.resource_id
LEFT JOIN resources res2 ON res2.id = res.parent
LEFT JOIN resources res3 ON res3.id = res2.parent
Sample query:
SELECT *
FROM permissions
Output (for your sample data)
users database schema table insert select update delete
student summer2020 English midterm false true false false
student summer2020 Math midterm false true false false
teacher summer2020 Math midterm true true true true
teacher summer2020 English midterm true true true true
sub summer2020 Math midterm false true false false
sub summer2020 English midterm false true false false
Note I've assumed you want the database name, not its origin. If you really want the origin value, just change res3.name in the query to res3.origin.
Related
I have a df with with several columns which have only True/False values.
I want to create another column whose value will tell me which column has a True value.
HEre's an example:
index
bol_1
bol_2
bol_3
criteria
1
True
False
False
bol_1
2
False
True
False
bol_2
3
True
True
False
[bol_1, bol_2]
My objective is to know which rows have True values(at least 1), and which columns are responsible for those True values. I want to be able to some basic statistics on this new column, e.g. for how many rows is bol_1 the unique column to have a True values.
Use DataFrame.select_dtypes for boolean columns, convert columns names to array and in list comprehension filter Trues values:
df1 = df.select_dtypes(bool)
cols = df1.columns.to_numpy()
df['criteria'] = [list(cols[x]) for x in df1.to_numpy()]
print (df)
bol_1 bol_2 bol_3 criteria
1 True False False [bol_1]
2 False True False [bol_2]
3 True True False [bol_1, bol_2]
If performance is not important use DataFrame.apply:
df['criteria'] = df1.apply(lambda x: cols[x], axis=1)
A possible solution:
df.assign(criteria=df.apply(lambda x: list(
df.columns[1:][x[1:] == True]), axis=1))
Output:
index bol_1 bol_2 bol_3 criteria
0 1 True False False [bol_1]
1 2 False True False [bol_2]
2 3 True True False [bol_1, bol_2]
How to get this desired output without using if statements ? and checking row by row
import pandas as pd
test = pd.DataFrame()
test['column1'] = [True, True, False]
test['column2']= [False,True,False]
index column1 column2
0 True False
1 True True
2 False False
desired output:
index column1 column2 column3
0 True False False
1 True True True
2 False False False
Your help is much appriciated.
Thank you in advance.
Use DataFrame.all for test if all values are Trues:
test['column3'] = test.all(axis=1)
If need filter columns add subset ['column1','column1']:
test['column3'] = test[['column1','column1']].all(axis=1)
If want test only 2 columns here is possible use & for bitwise AND:
test['column3'] = test['column1'] & test['column1']
I want to check whether my customer owns an iPhone, iPad, or Macbook.
I am not looking for how many iPhone or iPad or Macbook one customer has, I just want to identify if they own any of them, or if they own any two, or if they own all three.
I am thinking of using the CASE WHEN function and here is the query:
select customer
, case when AppleProduct = 'iPhone' THEN 'TRUE' ELSE 'FALSE END AS Owns_iPhone
, case when AppleProduct = 'iPad' THEN 'TRUE' ELSE 'FALSE AS Owns_iPad
, case when AppleProduct = 'Macbook' THEN 'TRUE' ELSE 'FALSE' AS Owns_Macbook
from Apple_Product_Ownership
This is the result that I am trying to get
customer | Owns_iPhone | Owns_iPad | Owns_Macbook
X TRUE TRUE FALSE
Y FALSE TRUE TRUE
Z TRUE FALSE FALSE
But this is what I am getting instead
customer | Owns_iPhone | Owns_iPad | Owns_Macbook
X TRUE FALSE FALSE
X FALSE TRUE FALSE
Y FALSE TRUE FALSE
Y FALSE FALSE TRUE
Z TRUE FALSE FALSE
You are looking for conditional aggregation. I would phrase your query as:
select
customer,
logical_or(AppleProduct = 'iPhone') Owns_iPhone,
logical_or(AppleProduct = 'iPad') Owns_iPad,
logical_or(AppleProduct = 'Macbook') Owns_Macbook,
from Apple_Product_Ownership
group by customer
I have a df
Name Param_1 Param_2 Param_3
John True False False
Mary False False False
Peter True False False
Linda False False True
I want to create two new dataframes based on the True or False values across the range of colunms (Params_1, Params_2 and Params_3). Something like this...
df_1 =
Name Param_1 Param_2 Param_3
John True False False
Peter True False False
Linda False False True
df_2 =
Name Param_1 Param_2 Param_3
Mary False False False
however, I won't know the names of Param_1 etc. each time the code is run, so I want to use positinal indexing (slicing). In this case, [:, 1:]
I have seen examples of selecting rows based on the values within one column, when the column has a known name, but not by slicing across multiple columns by position.
I tried ais_gdf.iloc[ais_gdf[:, 1:].isin(False)] but that didn't work.
Any help is appreciated.
Use DataFrame.iloc for select columns for mask and test if at least one True by DataFrame.any, then filter by boolean indexing, for df_2 invert mask by ~:
m = ais_gdf.iloc[:, 1:].any(axis=1)
#alternative for select only boolean columns
#m = ais_gdf.select_dtypes(bool).any(axis=1)
#alternative1 for columns names with with Param
#m = ais_gdf.filter(like='Param').any(axis=1)
df_1 = ais_gdf[m]
df_2 = ais_gdf[~m]
print (df_1)
Name Param_1 Param_2 Param_3
0 John True False False
2 Peter True False False
3 Linda False False True
print (df_2)
Name Param_1 Param_2 Param_3
1 Mary False False False
Use select_dtypes to get boolean columns. Create a mask and then use this mask to filter.
mask = (~df.select_dtypes(bool)).all(axis=1)
df1 = df[mask]
Name Param_1 Param_2 Param_3
1 Mary False False False
df2 = df[~mask]
Name Param_1 Param_2 Param_3
0 John True False False
2 Peter True False False
3 Linda False False True
I'm using Pandas DataFrames. I'm looking to identify all rows where both columns A and B == True, then represent in Column C the all points on other side of that intersection where only A or B is still true but not the other. For example:
A B C
0 False False False
1 True False True
2 True True True
3 True True True
4 False True True
5 False False False
6 True False False
7 True False False
I can find the direct overlaps quite easily:
df.loc[(df['A'] == True) & (df['B'] == True), 'C'] = True
... however this does not take into account the overlap need.
I considered creating column 'C' in this way, then grouping each column:
grp_a = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
grp_b = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
grp_c = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
From there I thought to iterate over the indexes in grp_c.indices and test the indices in grp_a and grp_b against those, find the min/max index of A and B and update column C. This feels like an inefficient way of getting to the result I want though.
Ideas?
Try this:
#Input df just columns 'A' and 'B'
df = df[['A','B']]
df['C'] = df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
print(df)
Output:
A B C
0 False False False
1 True False True
2 True True True
3 True True True
4 False True True
5 False False False
6 True False False
7 True False False
Explanation:
First, create column 'C' with the assignment of minimum value, what this does is to ass True to C where both A and B are True. Next, using
df[['A','B']].max(1) == 0
0 True
1 False
2 False
3 False
4 False
5 True
6 False
7 False
dtype: bool
We can find all of the records were A and B are both False. Then we use cumsum to create a count of those False False records. Allowing us to create grouping of records with the False False recording having a count up until the next False False record which gets incremented.
(df[['A','B']].max(1) == 0).cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
dtype: int32
Let's group the dataframe with the newly assigned column C by this grouping created with cumsum. Then take the maximum value of column C from that group. So, if the group has a True True record, assign True to all the records in that group. Lastly, use mask to turn the first False False record back to False.
df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
0 False
1 True
2 True
3 True
4 True
5 False
6 False
7 False
Name: C, dtype: bool
And, assign that series to df['C'] overwriting the temporarily assigned C in the statement.
df['C'] = df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)