Identifying products that a customer owns - sql

I want to check whether my customer owns an iPhone, iPad, or Macbook.
I am not looking for how many iPhone or iPad or Macbook one customer has, I just want to identify if they own any of them, or if they own any two, or if they own all three.
I am thinking of using the CASE WHEN function and here is the query:
select customer
, case when AppleProduct = 'iPhone' THEN 'TRUE' ELSE 'FALSE END AS Owns_iPhone
, case when AppleProduct = 'iPad' THEN 'TRUE' ELSE 'FALSE AS Owns_iPad
, case when AppleProduct = 'Macbook' THEN 'TRUE' ELSE 'FALSE' AS Owns_Macbook
from Apple_Product_Ownership
This is the result that I am trying to get
customer | Owns_iPhone | Owns_iPad | Owns_Macbook
X TRUE TRUE FALSE
Y FALSE TRUE TRUE
Z TRUE FALSE FALSE
But this is what I am getting instead
customer | Owns_iPhone | Owns_iPad | Owns_Macbook
X TRUE FALSE FALSE
X FALSE TRUE FALSE
Y FALSE TRUE FALSE
Y FALSE FALSE TRUE
Z TRUE FALSE FALSE

You are looking for conditional aggregation. I would phrase your query as:
select
customer,
logical_or(AppleProduct = 'iPhone') Owns_iPhone,
logical_or(AppleProduct = 'iPad') Owns_iPad,
logical_or(AppleProduct = 'Macbook') Owns_Macbook,
from Apple_Product_Ownership
group by customer

Related

Using if statement in string_agg function - postreSQL

The query is as follows
WITH notes AS (
SELECT 891090 Order_ID, False customer_billing, false commander, true agent
UNION ALL
SELECT 891091, false, true, true
UNION ALL
SELECT 891091, true, false, false)
SELECT
n.order_id,
string_Agg(distinct CASE
WHEN n.customer_billing = TRUE THEN 'AR (Customer Billing)'
WHEN n.commander = TRUE THEN 'AP (Commander)'
WHEN n.agent = TRUE THEN 'AP (Agent)'
ELSE NULL
END,', ') AS finance
FROM notes n
WHERE
n.order_id = 891091 AND (n.customer_billing = TRUE or n.commander = TRUE or n.agent = TRUE)
GROUP BY ORDER_ID
As you can see there are two records with order_id as 891091.
First 891091 record has commander and agent set as true
Second 891091 record has customer_billing set as true
Since switch case is used, it considers only the first true value and returns commander and does not consider agent.
So the output becomes
order_id finance
891091 AP (Commander), AR (Customer Billing)
dbfiddle.uk Example
I need all the true values in the record to be considered so that the output becomes
order_id finance
891091 AP (Commander), AP (Agent), AR (Customer Billing)
My initial thought is that using if statement instead of case statement may fix this. I am not sure how to do this inside string_agg function
How to achieve this?
EDIT 1:
The answer specified below works almost fine. But the issue is that the comma separated values are not distinct
Here is the updated fiddle
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=9647d92870e3944516172eda83a8ac6e
You can consider splitting your case into separate ones and using array to collect them. Then you can use array_to_string to format:
WITH notes AS (
SELECT 891090 Order_ID, False customer_billing, false commander, true agent UNION ALL
SELECT 891091, false, true, true UNION ALL
SELECT 891091, true, true, false),
tmp as (
SELECT
n.order_id id,
array_agg(
ARRAY[
CASE WHEN n.customer_billing = TRUE THEN 'AR (Customer Billing)' END,
CASE WHEN n.commander = TRUE THEN 'AP (Commander)' END,
CASE WHEN n.agent = TRUE THEN 'AP (Agent)' END
]) AS finance_array
FROM notes n
WHERE
n.order_id = 891091 AND (n.customer_billing = TRUE or n.commander = TRUE or n.agent = TRUE)
GROUP BY ORDER_ID )
select id, array_to_string(array(select distinct e from unnest(finance_array) as a(e)), ', ')
from tmp;
Here is db_fiddle.

Looping in SQL while creating a view

Im having a hard time looping using sql, the table Im using resources references itself by having a parent column so I want to do the following logic while creating the view in sql but I'm stuck...
since im using a left join between my auth_permissions table and resource table, we are always looking at a row of type 'table'
- for the current row in table resources which is always of type 'table'
- give me the rows parent which is in position resources[4]
- then go to the row where id is equal to resources[4] which is of type 'schema'. spit out the 'name' column
- then go to that rows parent which is of type 'database' and spit out the 'name' column
What my current sql query looks like
CREATE OR REPLACE VIEW dev.permissions
AS select
left(rls.name, length(rls.name) - 5) AS "users",
res.name as "table",
ap.p_create as "insert",
ap.p_read as "select",
ap.p_update as "update",
ap.p_delete as "delete"
FROM dev.auth_permissions ap
LEFT JOIN dev.roles rls ON rls.id = ap.role_id
LEFT JOIN dev.resources res ON res.id = ap.resource_id
The result of my SQL query
users table insert select update delete
student midterm false true false false
student midterm false true false false
teacher midterm true true true true
teacher midterm false true true true
sub midterm false true false false
sub midterm false true false false
This is what I want my view to look like / the final SQL query result
users database schema table insert select update delete
student PrivateDatabase English midterm false true false false
student PrivateDatabase Math midterm false true false false
teacher PrivateDatabase English midterm true true true true
teacher PrivateDatabase Math midterm false true true true
sub PrivateDatabase English midterm false true false false
sub PrivateDatabase Math midterm false true false false
The tables im using to create the view look like so
roles Table
id code name description
1 245 student null
2 411 teacher null
3 689 sub null
resources Table, the parent column references itself so that I can track the parent of a table or schema. But a database has no parent since its at the top of the hierarchy
id origin type name parent
1 PrivateDatabase database summer2020 null
2 PrivateDatabase schema English 1
3 PrivateDatabase table midterm 2
4 PrivateDatabase schema Math 1
5 PrivateDatabase table midterm 4
auth_permissions Table, role_id reference the roles table and resources reference the resources table
id role_id resource_id p_create p_read p_update p_delete
1 1 3 false true false false
2 1 5 false true false false
3 2 3 true true true true
4 2 5 true true true true
5 3 3 false true false false
6 3 5 false true false false
THANK YOU IN ADVANCE! Any feedback is appreciated!
You need to join the resources table twice more in your query to fetch the schema and database names. In each case you join the id to the previous instantiations parent value:
CREATE OR REPLACE VIEW permissions AS
SELECT
rls.name AS "users",
res3.name as "database",
res2.name as "schema",
res.name as "table",
ap.p_create as "insert",
ap.p_read as "select",
ap.p_update as "update",
ap.p_delete as "delete"
FROM auth_permissions ap
LEFT JOIN roles rls ON rls.id = ap.role_id
LEFT JOIN resources res ON res.id = ap.resource_id
LEFT JOIN resources res2 ON res2.id = res.parent
LEFT JOIN resources res3 ON res3.id = res2.parent
Sample query:
SELECT *
FROM permissions
Output (for your sample data)
users database schema table insert select update delete
student summer2020 English midterm false true false false
student summer2020 Math midterm false true false false
teacher summer2020 Math midterm true true true true
teacher summer2020 English midterm true true true true
sub summer2020 Math midterm false true false false
sub summer2020 English midterm false true false false
Note I've assumed you want the database name, not its origin. If you really want the origin value, just change res3.name in the query to res3.origin.

Pandas True False Matching

For this table:
I would like to generate the 'desired_output' column. One way to achieve this maybe:
All the True values from col_1 are transferred straight across to desired_output (red arrow)
In desired_output, place a True value above any existing True value (green arrow)
Code I have tried:
df['desired_output']=df.col_1.apply(lambda x: True if x.shift()==True else False)
Thankyou
You can chain by | for bitwise OR original with shifted values by Series.shift:
d = {"col1":[False,True,True,True,False,True,False,False,True,False,False,False]}
df = pd.DataFrame(d)
df['new'] = df.col1 | df.col1.shift(-1)
print (df)
col1 new
0 False True
1 True True
2 True True
3 True True
4 False True
5 True True
6 False False
7 False True
8 True True
9 False False
10 False False
11 False False
try this
df['desired_output'] = df['col_1']
df.loc[1:, 'desired_output'] = df.col_1[1:].values | df.col_1[:-1].values
print(df)
In case those are saved as string. all_caps (TRUE / FALSE)
Input:
col_1
0 True
1 True
2 False
3 True
4 True
5 False
6 Flase
7 True
8 False
Code:
df['desired']=df['col_1']
for i, e in enumerate(df['col_1']):
if e=='True':
df.at[i-1,'desired']=df.at[i,'col_1']
df = df[:(len(df)-1)]
df
Output:
col_1 desired
0 True True
1 True True
2 False True
3 True True
4 True True
5 False False
6 Flase True
7 True True
8 False False

Pandas selecting rows based on bool values over a range of columns by position

I have a df
Name Param_1 Param_2 Param_3
John True False False
Mary False False False
Peter True False False
Linda False False True
I want to create two new dataframes based on the True or False values across the range of colunms (Params_1, Params_2 and Params_3). Something like this...
df_1 =
Name Param_1 Param_2 Param_3
John True False False
Peter True False False
Linda False False True
df_2 =
Name Param_1 Param_2 Param_3
Mary False False False
however, I won't know the names of Param_1 etc. each time the code is run, so I want to use positinal indexing (slicing). In this case, [:, 1:]
I have seen examples of selecting rows based on the values within one column, when the column has a known name, but not by slicing across multiple columns by position.
I tried ais_gdf.iloc[ais_gdf[:, 1:].isin(False)] but that didn't work.
Any help is appreciated.
Use DataFrame.iloc for select columns for mask and test if at least one True by DataFrame.any, then filter by boolean indexing, for df_2 invert mask by ~:
m = ais_gdf.iloc[:, 1:].any(axis=1)
#alternative for select only boolean columns
#m = ais_gdf.select_dtypes(bool).any(axis=1)
#alternative1 for columns names with with Param
#m = ais_gdf.filter(like='Param').any(axis=1)
df_1 = ais_gdf[m]
df_2 = ais_gdf[~m]
print (df_1)
Name Param_1 Param_2 Param_3
0 John True False False
2 Peter True False False
3 Linda False False True
print (df_2)
Name Param_1 Param_2 Param_3
1 Mary False False False
Use select_dtypes to get boolean columns. Create a mask and then use this mask to filter.
mask = (~df.select_dtypes(bool)).all(axis=1)
df1 = df[mask]
Name Param_1 Param_2 Param_3
1 Mary False False False
df2 = df[~mask]
Name Param_1 Param_2 Param_3
0 John True False False
2 Peter True False False
3 Linda False False True

Find the min/max of rows with overlapping column values, create new column to represent the full range of both

I'm using Pandas DataFrames. I'm looking to identify all rows where both columns A and B == True, then represent in Column C the all points on other side of that intersection where only A or B is still true but not the other. For example:
A B C
0 False False False
1 True False True
2 True True True
3 True True True
4 False True True
5 False False False
6 True False False
7 True False False
I can find the direct overlaps quite easily:
df.loc[(df['A'] == True) & (df['B'] == True), 'C'] = True
... however this does not take into account the overlap need.
I considered creating column 'C' in this way, then grouping each column:
grp_a = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
grp_b = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
grp_c = df.loc[(df['A'] == True), 'A'].groupby(df['A'].astype('int').diff.ne(0).cumsum())
From there I thought to iterate over the indexes in grp_c.indices and test the indices in grp_a and grp_b against those, find the min/max index of A and B and update column C. This feels like an inefficient way of getting to the result I want though.
Ideas?
Try this:
#Input df just columns 'A' and 'B'
df = df[['A','B']]
df['C'] = df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
print(df)
Output:
A B C
0 False False False
1 True False True
2 True True True
3 True True True
4 False True True
5 False False False
6 True False False
7 True False False
Explanation:
First, create column 'C' with the assignment of minimum value, what this does is to ass True to C where both A and B are True. Next, using
df[['A','B']].max(1) == 0
0 True
1 False
2 False
3 False
4 False
5 True
6 False
7 False
dtype: bool
We can find all of the records were A and B are both False. Then we use cumsum to create a count of those False False records. Allowing us to create grouping of records with the False False recording having a count up until the next False False record which gets incremented.
(df[['A','B']].max(1) == 0).cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
dtype: int32
Let's group the dataframe with the newly assigned column C by this grouping created with cumsum. Then take the maximum value of column C from that group. So, if the group has a True True record, assign True to all the records in that group. Lastly, use mask to turn the first False False record back to False.
df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)
0 False
1 True
2 True
3 True
4 True
5 False
6 False
7 False
Name: C, dtype: bool
And, assign that series to df['C'] overwriting the temporarily assigned C in the statement.
df['C'] = df.assign(C=df.min(1)).groupby((df[['A','B']].max(1) == 0).cumsum())['C']\
.transform('max').mask(df.max(1)==0, False)