Merge where definition exists - sql

I have 3 tables I'm working with here. ATTRIBUTE_MAP, GROUP_DEFINITIONS, and GROUP_MAP.
ATTRIBUTE_MAP contains the CUST_ID and the associated ATTRIBUTE_ID.
GROUP_DEFINITIONS defines a group. Its columns are GROUP_ID, ATTRIBUTE_1, VALUE_1, ATTRIBUTE_2, VALUE_2, ATTRIBUTE_3, VALUE_3 A group consists of 1 to 3 attributes with values. For example, an attribute could be 'State' with its value being 'New York'. Values can also be null for boolean values like 'Owns Car'.
GROUP_MAP simply maps the CUST_ID to a GROUP_ID.
Now, I'm trying to write a script that will look at the ATTRIBUTE_MAP and see if a customer falls into one of the defined groups in GROUP_DEFINITIONS. If he (the customer) does, then insert/update a row into GROUP_MAP with the CUST_ID and GROUP_ID. The part I'm having trouble with, is matching the attribute values.
Here is what I have so far:
merge GROUP_MAP gm using
( select am.CUST_ID
,am.ATTRIBUTE_ID
,am.START_DATE
,gd.GROUP_ID
,gd.ATTRIBUTE_1
,gd.VALUE_1
,gd.ATTRIBUTE_2
,gd.VALUE_2
,gd.ATTRIBUTE_3
,gd.VALUE_3
from ATTRIBUTE_MAP am, GROUP_DEFINITIONS gd ) src
on gm.GROUP_ID=src.GROUP_ID
AND gm.CUST_ID=src.CUST_ID
when not matched then -- create association in GROUP_MAP
insert (CUST_ID, GROUP_ID, FROM_DATE)
values (src.CUST_ID, src.GROUP_ID, src.START_DATE);
Am I approaching this correctly? I'm guessing I need to just improve my nested select statement in my merge to join the ATTRIBUTE_MAP and PEER_GROUP_DEFINTIONS and then go from there. Any help/suggestions would be appreciated.
Here's an example for reference:
ATTRIBUTE_MAP:
+---------+--------------+------------+
| CUST_ID | ATTRIBUTE_ID | VALUE |
+---------+--------------+------------+
| 50 | 1 | 'New York' |
+---------+--------------+------------+
| 50 | 2 | |
+---------+--------------+------------+
GROUP_DEFINITIONS:
+----------+-------------+------------+-------------+---------+-------------+---------+
| GROUP_ID | ATTRIBUTE_1 | VALUE_1 | ATTRIBUTE_2 | VALUE_2 | ATTRIBUTE_3 | VALUE_3 |
+----------+-------------+------------+-------------+---------+-------------+---------+
| 10 | 1 | 'New York' | 2 | | | |
+----------+-------------+------------+-------------+---------+-------------+---------+
| 20 | 2 | | | | | |
+----------+-------------+------------+-------------+---------+-------------+---------+
and so the script should generate (in GROUP_MAP):
+---------+----------+--------+
| CUST_ID | GROUP_ID | DATE |
+---------+----------+--------+
| 50 | 10 | *date* |
+---------+----------+--------+
| 50 | 20 | *date* |
+---------+----------+--------+

I could be totally off, but it looks like your inner select needs to be something like this. If I understand what you are trying to do, this will return a unique list of CUST_ID, GROUP_ID, START_DATE where all of the customer attributes match all of the group attributes. Just wrote this fast, so might have some errors, but it might get you going the right direction.
with gd as (
SELECT GROUP_ID, ATTRIBUTE_1 as ATTRIBUTE_ID, VALUE_1 as VALUE from GROUP_DEFINITIONS
UNION
SELECT GROUP_ID, ATTRIBUTE_2, VALUE_2 from GROUP_DEFINITIONS
UNION
SELECT GROUP_ID, ATTRIBUTE_3, VALUE_3 from GROUP_DEFINITIONS
)
MERGE GROUP_MAP gm
USING
(
SELECT am.CUST_ID, gd.GROUP_ID, am.START_DATE
FROM ATTRIBUTE_MAP am
JOIN gd
ON am.ATTRIBUTE_ID = gd.ATTRIBUTE_ID AND coalesce(am.VALUE, '') = coalesce(gd.VALUE, '')
join (select GROUP_ID, count(*) as ATTR_COUNT from gd where ATTRIBUTE_ID is NOT NULL group by GROUP_ID) as gc
on gd.GROUP_ID = gc.GROUP_ID
GROUP BY am.CUST_ID, gd.GROUP_ID, am.START_DATE
HAVING count(am.ATTRIBUTE_ID) = max(gc.ATTR_COUNT)
) src
ON gm.GROUP_ID = src.GROUP_ID
AND gm.CUST_ID = src.CUST_ID
WHEN NOT MATCHED
THEN -- create association in GROUP_MAP
INSERT(CUST_ID,
GROUP_ID,
FROM_DATE) VALUES
(src.CUST_ID, src.GROUP_ID, src.START_DATE);

If I understood the problem correctly this should do it:
Please note I have used GETDATE() as I do not have the field [START_DATE] but you will need to substitute this within the code
SAMPLE DATA:
CREATE TABLE #ATTRIBUTE_MAP(CUST_ID INT,
ATTRIBUTE_ID INT,
VALUE VARCHAR(20));
INSERT INTO #ATTRIBUTE_MAP
VALUES
(50, 1, 'New York'),
(50, 2, NULL);
CREATE TABLE #GROUP_DEFINITIONS(GROUP_ID INT,
ATTRIBUTE_1 INT,
VALUE_1 VARCHAR(20),
ATTRIBUTE_2 INT,
VALUE_2 VARCHAR(20),
ATTRIBUTE_3 INT,
VALUE_3 VARCHAR(20));
INSERT INTO #GROUP_DEFINITIONS
VALUES
(10, 1, 'New York', 2, NULL, NULL, NULL),
(20, 2, NULL, NULL, NULL, NULL, NULL);
CREATE TABLE #GROUP_MAP(CUST_ID INT,
GROUP_ID INT,
[FROM_DATE] DATE);
QUERY:
MERGE #GROUP_MAP gm
USING
(SELECT DISTINCT
am.CUST_ID,
CAST(GETDATE() AS DATE) AS [START_DATE], --<-- you will need to change this
gd.GROUP_ID
FROM #ATTRIBUTE_MAP am
INNER JOIN
(
SELECT GROUP_ID,
ATTRIBUTE_1 AS ATTRIBUTE_ID,
VALUE_1
FROM #GROUP_DEFINITIONS
UNION ALL
SELECT GROUP_ID,
ATTRIBUTE_2,
VALUE_2
FROM #GROUP_DEFINITIONS
UNION ALL
SELECT GROUP_ID,
ATTRIBUTE_3,
VALUE_3
FROM #GROUP_DEFINITIONS) gd ON am.ATTRIBUTE_ID = gd.ATTRIBUTE_ID) src
ON gm.GROUP_ID = src.GROUP_ID
AND gm.CUST_ID = src.CUST_ID
WHEN NOT MATCHED
THEN -- create association in GROUP_MAP
INSERT(CUST_ID,
GROUP_ID,
FROM_DATE) VALUES
(src.CUST_ID, src.GROUP_ID, src.START_DATE);
VERIFY RESULT:
SELECT CUST_ID , GROUP_ID , FROM_DATE
FROM #GROUP_MAP;
RESULT:

Related

Finding only rows with non-duplicated values within a window partition

I want to look at why some descriptions are different for the same permit id. Here's the table (I'm using Snowflake):
create or replace table permits (permit varchar(255), description varchar(255));
// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc');
INSERT INTO permits VALUES ('1', 'abc');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1');
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);
INSERT INTO permits VALUES ('3', 'ghi1');
// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz');
What I want is to query this table and get out only the sets of rows that have duplicate permit ids but different descriptions.
The output I want is this:
+---------+-------------+
| PERMIT | DESCRIPTION |
+---------+-------------+
| 2 | def1 |
| 2 | def2 |
| 2 | def3 |
| 3 | |
| 3 | ghi1 |
+---------+-------------+
I've tried this:
with with_dupe_counts as (
select
count(permit) over (partition by permit order by permit) as permit_dupecount,
count(description) over (partition by permit order by permit) as description_dupecount,
permit,
description
from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1
and description_dupecount > 1
Which gives me permits 1 and 2 and counts descriptions whether they are unique or not:
+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
| 2 | 2 | 1 | abc |
| 2 | 2 | 1 | abc |
| 3 | 3 | 2 | def1 |
| 3 | 3 | 2 | def2 |
| 3 | 3 | 2 | def3 |
+------------------+-----------------------+--------+-------------+
What I think would work would be
count(unique description) over (partition by permit order by permit) as description_dupecount
But as I'm realizing there are lots of things that don't work in window functions. This question isn't necessarily "how do I get count(unique x) to work in a window function" because I don't know if that is the best way to solve this.
A simple group by I don't think will work because I want to get the original rows back.
One method uses min() and max() and count():
select *
from (select p.*,
min(description) over (partition by permit) as min_d,
max(description) over (partition by permit) as max_d,
count(description) over (partition by permit) as cnt_d,
count(*) over (partition by permit) as cnt,
count(permit) over (partition by permit order by permit) as permit_dupecount
from permits
)
where min_d <> max_d or cnt_d <> cnt;
I would just use exists:
select p.*
from permits p
where exists (
select 1
from permits p1
where p1.permit = p.permit and p1.description <> p.description
)
To handle the null values, we can use standard null-safe equality operator IS DISTINCT FROM, which Snowlake supports:
select p.*
from permits p
where exists (
select 1
from permits p1
where
p1.permit = p.permit
and p1.description is distinct from p.description
)
Should work
SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL
This is my go to:
with x as (
select permit, count(distinct description) cnt
from permits p1
group by permit
having cnt > 1
)
select p.*
from x
join permits p
on x.permit = p.permit;

How do I coalesce NULLs across multiple rows in BigQuery?

I have the following table:
Date |event_number| customer_id1 | customer_age | customer_gender
10/01/2020 | 1 | abc | NULL | NULL
10/01/2020 | 2 | abc | NULL | male
10/01/2020 | 3 | abc | 45 | NULL
10/01/2020 | 1 | def | 30 | NULL
I want to run a SQL query each day to look for new combinations of custom_id1, customer_age, customer_gender.
Output should look like this:
query_run_time | customer_id1 | customer_age | customer gender
11/01/2020 | abc | 45 | male
11/01/2020 | def | 30 | NULL
Query run time is the date the query was run. If the combination (customer_id, custmer_age, customer_gender) is already in the table I don't want to insert the row.
Thanks
You can use window functions to assign internal row numbers for merge multiple queries, e.g. like this:
SELECT COALESCE(a.customer_id, b.customer_id) as customer_id
, customer_age
, customer_gender
FROM (
SELECT customer_id, customer_age
, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY customer_age ) AS row_no
FROM customer_event
WHERE customer_age IS NOT NULL
) a
FULL JOIN (
SELECT customer_id, customer_gender
, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY customer_gender ) AS row_no
FROM customer_event
WHERE customer_gender IS NOT NULL
) b ON b.customer_id = a.customer_id
AND b.row_no = a.row_no
ORDER BY COALESCE(a.customer_id, b.customer_id)
, COALESCE(a.row_no, b.row_no)
Schema and Test Data
CREATE TABLE customer_event (
event_number INT NOT NULL,
customer_id VARCHAR(10) NOT NULL,
customer_age INT,
customer_gender VARCHAR(10)
);
INSERT INTO customer_event VALUES
( 1, 'abc', NULL, NULL ),
( 2, 'abc', NULL, 'male' ),
( 3, 'abc', 45 , NULL ),
( 4, 'abc', 50 , 'female' ),
( 5, 'abc', 27 , NULL ),
( 1, 'def', 30 , NULL );
Output
customer_id customer_age customer_gender
abc 27 female
abc 45 male
abc 50 (null)
def 30 (null)
The above is from testing with PostgreSQL 9.6 on SQL Fiddle.
Use Window function
SELECT query_run_time, customer_id, MAX(customer_age) customer_age,
MAX(customer_gender)customer_gender
FROM tbl
GROUP BY query_run_time, customer_id
FIDDLE DEMO
Output
query_run_time | customer_id1 | customer_age | customer gender
11/01/2010 | abc | 45 | male
11/01/2020 | def | 30 | NULL
I suspect that what you really want is the most recent value for each column. Here is one method:
select date, customerid1,
array_agg(customer_age ignore nulls order by event_number desc limit 1)[safe_ordinal(1) as age,
array_agg(customer_gender ignore nulls order by event_number desc limit 1)[safe_ordinal(1) as gender
from t
group by date, customerid1;

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.
This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;
You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);
Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable
I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

SQL Server - select every combination

We have two tables below, I am trying to write a query that will select EVERY Purchase for EVERY person on the team. For example, it should show PersonA being associated to PurchaseID 1 and 2 because they are on the same Team as TeamA.
Is this possible? I thought a cross join would work but it seemed to bring back too many columns. I am running SQL Server.
Thank you
Purchases
| PurchaseID | PersonID |
|------------ |---------- |
| 1 | TeamA |
| 2 | TeamA |
| 3 | PersonA |
| 4 | PersonB |
| 5 | TeamB |
Teams
| TeamID | PersonID |
|-------- |---------- |
| 1 | PersonA |
| 1 | TeamA |
| 1 | PersonC |
| 2 | PersonB |
| 2 | TeamB |
Expected results (when filtered on PurchaseID 1):
| PurchaseID | PersonID |
|------------ |---------- |
| 1 | TeamA |
| 1 | PersonA |
| 1 | PersonC |
Your data structure is a little odd, but I think I understand what you want.
If PersonA made a purchase, and PersonA is on TeamA, then everyone on TeamA should be shown as being associated with the purchase, right? Like "I bought these doughnuts for my team, so everyone on my team gets a doughnut".
What you're going to want is to join Purchase to Team on PersonID, as you probably guessed. But then use a CROSS APPLY function, which is in inline table value function, to return all the people on the same team as the person in the "current row".
I used two common table expressions to represent your tables so I could run it. You'll just want the SELECT part:
with Purchases as (
select 1 as PurchaseID, 'TeamA' as PersonID
union select 2 as PurchaseID, 'TeamA' as PersonID
union select 3 as PurchaseID, 'PersonA' as PersonID
union select 4 as PurchaseID, 'PersonB' as PersonID
union select 5 as PurchaseID, 'TeamB' as PersonID
)
, Teams as (
select 1 as TeamID, 'PersonA' as PersonID
union select 1 as TeamID, 'TeamA' as PersonID
union select 1 as TeamID, 'PersonC' as PersonID
union select 2 as TeamID, 'PersonB' as PersonID
union select 2 as TeamID, 'TeamB' as PersonID
)
select Purchases.PurchaseID
, EveryTeamMember.PersonID
from Purchases
join Teams
on Teams.PersonID = Purchases.PersonID
cross apply (
select PersonID
from Teams InnerTable
where InnerTable.TeamID = Teams.TeamID
) as EveryTeamMember
where Purchases.PurchaseID = 1
If you are looking ti get all Team persons when the PersonID starts with Team then i think you should do a CROSS APPLY over all PersonID who starts with Team and UNION (NOT UNION ALL) Single Person purchases:
DECLARE #Purchases TABLE (
PurchaseID INT,
PersonID Varchar(50)
)
INSERT INTO #Purchases(PersonID,PurchaseID) VALUES ('TeamA', 1);
INSERT INTO #Purchases(PersonID,PurchaseID) VALUES ('TeamA', 2);
INSERT INTO #Purchases(PersonID,PurchaseID) VALUES ('PersonA', 3);
INSERT INTO #Purchases(PersonID,PurchaseID) VALUES ('PersonB', 4);
INSERT INTO #Purchases(PersonID,PurchaseID) VALUES ('TeamB', 5);
DECLARE #Teams TABLE (
TeamID INT,
PersonID Varchar(50)
)
INSERT INTO #Teams(PersonID,TeamID) VALUES ('PersonA', 1);
INSERT INTO #Teams(PersonID,TeamID) VALUES ('TeamA', 1);
INSERT INTO #Teams(PersonID,TeamID) VALUES ('PersonC', 1);
INSERT INTO #Teams(PersonID,TeamID) VALUES ('PersonB', 2);
INSERT INTO #Teams(PersonID,TeamID) VALUES ('TeamB', 2);
SELECT T1.PurchaseID,TeamPersons.PersonID
FROM #Purchases T1
INNER JOIN #Teams T2
ON T2.PersonID = T1.PersonID AND T1.PersonID LIKE'Team%'
CROSS APPLY (
SELECT PersonID
FROM #Teams T3
WHERE T3.TeamID = T2.TeamID
) AS TeamPersons
UNION
SELECT T1.PurchaseID
, T1.PersonID
FROM #Purchases T1
WHERE T1.PersonID NOT LIKE 'Team%'
Result

Finding value with most common condition

I am having a lot of trouble with PostgreSQL trying to figure out how to find the most common value that fits a specific criteria. The ID is the ID number of the book, meaning repeating numbers means there are multiple copies of the book.
I have 2 tables here:
Table A:
=====+===================
ID | Condition
-------------------------
1 | Taken
1 |
1 | Taken
1 |
2 | Taken
3 | Taken
3 |
3 | Taken
3 | Taken
4 |
4 | Taken
etc.
Table B:
=====+===================
ID | Name
-------------------------
1 | BookA
2 | BookB
3 | BookC
4 | BookD
etc.
What I need is to simply find which book has the most copies taken and simply print out the name of the book. In this case all I need is:
BookC
The problem is that I can't figure out how to find how much each individual ID has books taken. I tried using a temp table something like so:
CREATE TEMP TABLE MostCommon AS
(SELECT ID
FROM TableA
WHERE SUM(CASE WHEN Condition>0 then 1 else 0 END)
)
SELECT NAME FROM TableB, MostCommon WHERE
MostCommon.ID = TableB.ID;
But it either throws an error or simply doesn't give me what I need. Any help would be greatly appreciated.
Ok, so firstly I assumed that your columns and tables names are case sensitive which means you must use duble quote marks. To print most "taken" book name with number of "taken" copies, you can use simple aggragete count(), then order the output descending and at the end limit the output to 1 row, like:
SELECT
b."ID",
b."Name",
count(*) as takenCount
FROM "TableA" a
JOIN "TableB" b ON a."ID" = b."ID"
WHERE a."Condition" = 'Taken'
GROUP BY b."ID", b."Name"
ORDER BY 3 DESC
LIMIT 1;
CREATE TEMP TABLE MostCommon AS
(SELECT id, (sum(ID)/id) book_taken FROM tableA where condition = 'Taken' group by id);
select name from tableB t2 join MostCommon mc on mc.id = t2.id where mc.id in (select max(book_taken) from MostCommon)
To make the data sensible (i.e. no duplicate records), I have to change the schema a little.
CREATE TABLE book_condition (
created TIMESTAMP,
book_id INTEGER,
condition VARCHAR,
PRIMARY KEY (created, book_id));
INSERT INTO book_condition (created, book_id, condition)
VALUES
('2016-01-01 08:30', 1, 'Taken'),
('2016-01-01 08:35', 1, ''),
('2016-01-01 08:40', 1, 'Taken'),
('2016-01-01 08:45', 1, ''),
('2016-01-01 08:50', 2, 'Taken'),
('2016-01-01 08:55', 3, 'Taken'),
('2016-01-01 09:00', 3, ''),
('2016-01-01 09:05', 3, 'Taken'),
('2016-01-01 09:10', 3, 'Taken'),
('2016-01-01 09:15', 4, ''),
('2016-01-01 09:20', 4, 'Taken');
CREATE TABLE book (
book_id INTEGER,
name VARCHAR,
PRIMARY KEY (book_id));
INSERT INTO book (book_id, name)
VALUES
(1, 'BookA'),
(2, 'BookB'),
(3, 'BookC'),
(4, 'BookD');
Then, the question breaks down into:
How many copies of each book were ever taken?
SELECT
book_id,
COUNT(book_id) AS total_taken
FROM book_condition
WHERE
condition = 'Taken'
GROUP BY book_id
;
book_id | total_taken
---------+-------------
1 | 2
2 | 1
3 | 3
4 | 1
(4 rows)
How to rank records by total_taken value?
SELECT
book_id,
total_taken,
RANK() OVER (
ORDER BY total_taken DESC
) AS total_taken_rank
FROM (
SELECT
book_id,
COUNT(book_id) AS total_taken
FROM book_condition
WHERE
condition = 'Taken'
GROUP BY book_id
) AS bt
ORDER BY total_taken_rank ASC
;
book_id | total_taken | total_taken_rank
---------+-------------+------------------
3 | 3 | 1
1 | 2 | 2
2 | 1 | 3
4 | 1 | 3
(4 rows)
How to get the name of the book in a query result containing its key (id) value?
SELECT
b.book_id,
b.name,
bt.total_taken,
RANK() OVER (
ORDER BY bt.total_taken DESC
) AS total_taken_rank
FROM
book AS b
LEFT JOIN (
SELECT
book_id,
COUNT(book_id) AS total_taken
FROM book_condition
WHERE
condition = 'Taken'
GROUP BY book_id
) AS bt
USING (book_id)
ORDER BY
total_taken_rank ASC,
book_id ASC
;
book_id | name | total_taken | total_taken_rank
---------+-------+-------------+------------------
3 | BookC | 3 | 1
1 | BookA | 2 | 2
2 | BookB | 1 | 3
4 | BookD | 1 | 3
(4 rows)
How to get only the highest-ranking records in the result?
SELECT
br.book_id,
br.name,
br.total_taken
FROM (
SELECT
b.book_id,
b.name,
bt.total_taken,
RANK() OVER (
ORDER BY bt.total_taken DESC
) AS total_taken_rank
FROM
book AS b
LEFT JOIN (
SELECT
book_id,
COUNT(book_id) AS total_taken
FROM book_condition
WHERE
condition = 'Taken'
GROUP BY book_id
) AS bt
USING (book_id)
) AS br
WHERE
total_taken_rank = 1
;
book_id | name | total_taken
---------+-------+-------------
3 | BookC | 3
(1 row)