Select Distinct with MAX() using SQL Server - sql

I am trying to get all distinct account numbers from 3 tables in SQL Server, but it seems that my way doesn't work, any suggestions?
SELECT distinct account, max(date_added) as date_added FROM table_one group by account
union
SELECT distinct account, max(date_added) as date_added, FROM table_two group by account
union
SELECT distinct account, max(date_added) as date_added, FROM table_three group by account order by account asc

1
SELECT account, MAX(date_added) AS date_added
FROM table_one
GROUP BY account
UNION
SELECT account, MAX(date_added) AS date_added
FROM table_two
GROUP BY account
UNION
SELECT account, MAX(date_added) AS date_added
FROM table_three
GROUP BY account
ORDER BY account ASC
2
SELECT account, MAX(date_added) AS date_added
FROM (
SELECT account, date_added
FROM table_one
UNION ALL
SELECT account, date_added
FROM table_two
UNION ALL
SELECT account, date_added
FROM table_three
) t
GROUP BY account
ORDER BY account ASC

Generate a set of data combining the results then get the max date for each account. This generates an inline view from which we can get a distinct account and max date added.
SELECT account, max(date_Added) as Date_Added from (
SELECT account date_Added FROM table_one
union
SELECT account, date_added FROM table_two
union
SELECT account, date_added FROM table_three) B
Group by account

If you want a list with all accounts and each account only appears once, then you could do something like:
select distinct account from table_one
union
select distinct account from table_two
union
select distinct account from table_three
By using union you will not get duplicate rows, and since you are only selecting account you will get each account only once.
It is however unclear what you are doing with the date_added column.

select account, MAX(date_added)
FROM (
SELECT account, date_added FROM table_one
union
SELECT account, date_added FROM table_two
union
SELECT account, date_added FROM table_three
) X
group by account
order by account asc
So what this query does, it unions all the data from all of the three tables in one recordset, so you can work with it as it is one table. Imagine that you have 3 accounts, 1, 2 and 3, and there is following data for them spread accross those 3 tables:
Table One
Account | Date Added
--------+-----------
1 | 01-01-2015
2 | 01-02-2015
Table Two
Account | Date Added
--------+-----------
3 | 01-03-2015
1 | 01-02-2015
Table Three
Account | Date Added
--------+-----------
2 | 01-04-2015
3 | 01-05-2015
So after union all of those records out of all the three tables we get following 'table' (actually it is recordset stored in memory):
Union Data
Account | Date Added
--------+-----------
1 | 01-01-2015
2 | 01-02-2015
3 | 01-03-2015
1 | 01-02-2015
2 | 01-04-2015
3 | 01-05-2015
Than we just select account and latest date_added for each of the distinct account found within this set, so we get following results:
Result Data
Account | Date Added
--------+-----------
1 | 01-02-2015
2 | 01-04-2015
3 | 01-05-2015
Feel free to ask any other questions you get related to this.

Related

Counting unique combinations of values across multiple columns regardless of order?

I have a table that looks a bit like this:
Customer_ID | Offer_1 | Offer_2 | Offer_3
------------|---------|---------|--------
111 | A01 | 001 | B01
222 | A01 | B01 | 001
333 | A02 | 001 | B01
I want to write a query to figure out how many unique combinations of offers there are in the table, regardless of what order the offers appear in.
So in the example above there are two unique combinations: customers 111 & 222 both have the same three offers so they count as one unique combination, and then customer 333 is the only customer to have the three orders that they have. So the desired output of the query would be 2.
For some additional context:
The customer_ID column is in integer format, and all the offer
columns are in varchar format.
There are 12 offer columns and over 3 million rows in the actual
table, with over 100 different values in the offer columns. I
simplified the example to better illustrate what I'm trying to do, but any solution needs to scale to this amount of
possible combinations.
I can concatenate all of the offer columns together and then run a count distinct statement on the result, but this doesn't account for customers who have the same unique combination of offers but ordered differently (like customers 111 & 222 in the example above).
Does anyone know how to solve this problem please?
Assuming the character / doesn't show up in any of the offer names, you can do:
select count(distinct offer_combo) as distinct_offers
from (
select listagg(offer, '/') within group (order by offer) as offer_combo
from (
select customer_id, offer_1 as offer from t
union all select customer_id, offer_2 from t
union all select customer_id, offer_3 from t
) x
group by customer_id
) y
Result:
DISTINCT_OFFERS
---------------
2
See running example at db<>fiddle.
One way to do it would be to union all the offers into one column, then use select distinct listagg... to get the combinations of offers. Try this:
with u as
(select Customer_ID, Offer_1 as Offer from table_name union all
select Customer_ID, Offer_2 as Offer from table_name union all
select Customer_ID, Offer_3 as Offer from table_name)
select distinct listagg(Offer, ',') within group(order by Offer) from u
group by Customer_ID
Fiddle
The solution without UNION ALLs. It should have better performance.
/*
WITH MYTAB (Customer_ID, Offer_1, Offer_2, Offer_3) AS
(
VALUES
(111, 'A01', '001', 'B01')
, (222, 'A01', 'B01', '001')
, (333, 'A02', '001', 'B01')
)
*/
SELECT COUNT (DISTINCT LIST)
FROM
(
SELECT LISTAGG (V.Offer, '|') WITHIN GROUP (ORDER BY V.Offer) LIST
FROM MYTAB T
CROSS JOIN TABLE (VALUES T.Offer_1, T.Offer_2, T.Offer_3) V (Offer)
GROUP BY T.CUSTOMER_ID
)

Calculate account balance history in PostgreSQL

I am trying to get a balance history on the account using SQL. My table in PostgreSQL looks like this:
id sender_id recipient_id amount_money
--- ----------- ---------------------- -----------------
1 1 2 60.00
2 1 2 15.00
3 2 1 35.00
so the user with id number 2 currently has 40 dollars in his account.
I would like to get this result using sql:
[60, 75, 40]
Is it possible to do something like this using sql in postgres?
To get a rolling balance, you can SUM the amounts (up to and including the current row) based on whether the id was the recipient or sender:
SELECT id, sender_id, recipient_id, amount_money,
SUM(CASE WHEN recipient_id = 2 THEN amount_money
WHEN sender_id = 2 THEN -amount_money
END) OVER (ORDER BY id) AS balance
FROM transactions
Output:
id sender_id recipient_id amount_money balance
1 1 2 60.00 60.00
2 1 2 15.00 75.00
3 2 1 35.00 40.00
If you want an array, you can use array_agg with the above query as a derived table:
SELECT array_agg(balance)
FROM (
SELECT SUM(CASE WHEN recipient_id = 2 THEN amount_money
WHEN sender_id = 2 THEN -amount_money
END) OVER (ORDER BY id) AS balance
FROM transactions
) t
Output:
[60,75,40]
Demo on dbfiddle
If you want to be more sophisticated and support balances for multiple accounts, you need to split the initial data into account ids, adding when the id is the recipient and subtracting when the sender. You can use CTEs to generate the appropriate data:
WITH trans AS (
SELECT id, sender_id AS account_id, -amount_money AS amount
FROM transactions
UNION ALL
SELECT id, recipient_id AS account_id, amount_money AS amount
FROM transactions
),
balances AS (
SELECT id, account_id, ABS(amount),
SUM(amount) OVER (PARTITION BY account_id ORDER BY id) AS balance
FROM trans
)
SELECT account_id, ARRAY_AGG(balance) AS bal_array
FROM balances
GROUP BY account_id
Output:
account_id bal_array
1 [-60,-75,-40]
2 [60,75,40]
Demo on dbfiddle

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

How to remove duplicate accounts in SQL?

I am using SQL Server 2008 and I was wondering how to remove duplicate customers either from the table or exclude it in my query. An Account_ID can only have 1 product associated with it. And the account with the most recent purchase date is what should be showing. An example is below:
Account_ID, Account_Purchase, Purchase_Date
1 Product 1 1/1/2016
2 Product 1 1/2/2016
3 Product 2 1/5/2016
1 Product 3 3/12/2016
4 Product 3 1/5/2016
Ideally I would only see:
Account_ID, Account_Purchase, Purchase_Date
2 Product 1 1/2/2016
3 Product 2 1/5/2016
1 Product 3 3/12/2016
4 Product 3 1/5/2016
This should not show up because it is not the most recent purchase from account 1
Account_ID, Account_Purchase, Purchase_Date
1 Product 1 1/1/2016
Thank you all for help, folks!
Simply acquire the latest purchase_date using max and group by account_id. Then use inner join to get the other details from the acquired details.
SELECT TABLE_NAME.* FROM TABLE_NAME
INNER JOIN(
SELECT Account_ID, MAX(Purchase_Date) AS Purchase_Date
GROUP BY Account_ID
) LatestPurchases
ON TABLE_NAME.Account_ID = LatestPurchases.Account_ID
AND TABLE_NAME.Purchase_Date = LatestPurchases.Purchase_Date
Try below query, please replace TABLENAME with your table
WITH CTE
AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Purchase_Date DESC) AS RN
FROM TABLENAME
)
SELECT
*
FROM CTE
WHERE RN = 1
Here is another query
SELECT
t.Account_id,
t.Account_Purchase,
t.Purchase_Date
FROM
tablename t
WHERE
t.Purchase_Date = (SELECT MAX(Purchase_date) FROM Tablename WHERE Account_ID = t.Account_ID)
ORDER BY
t.Purchase_Date DESC

How to find distinct users in multiple tables

I have a table called users that holds users ids, as well as a few tables like cloud_storage_a, cloud_storage_b and cloud_storage_c. If a user exists in cloud_storage_a, that means they are a connected to cloud storage a. A user can exist in many cloud storages too. Here's an example:
users table:
user_id | address | name
-------------------------------
123 | 23 Oak Ave | Melissa
333 | 18 Robson Rd | Steve
421 | 95 Ottawa St | Helen
555 | 12 Highland | Amit
192 | 39 Anchor Rd | Oliver
cloud_storage_a:
user_id
-------
421
333
cloud_storage_b:
user_id
-------
555
cloud_storage_c:
user_id
-------
192
555
Etc.
I want to create a query that grabs all users connected on any cloud storage. So for this example, users 421, 333, 555, 192 should be returned. I'm guessing this is some sort of join but I'm not sure which one.
You are close. Instead of a JOIN that merges tables next to each other based on a key, you want to use a UNION which stacks recordsets/tables on top of eachother.
SELECT user_id FROM cloud_storage_a
UNION
SELECT user_id FROM cloud_storage_b
UNION
SELECT user_id FROM cloud_storage_c
Using keyword UNION here will give you distinct user_id's across all three tables. If you switched that to UNION ALL you would no longer get Distinct, which has it's advantages in other situations (not here, obviously).
Edited to add:
If you wanted to bring in user address you could use this thing as a subquery and join into your user table:
SELECT
subunion.user_id
user.address
FROM
user
INNER JOIN
(
SELECT user_id FROM cloud_storage_a
UNION
SELECT user_id FROM cloud_storage_b
UNION
SELECT user_id FROM cloud_storage_c
) subunion ON
user.user_id = subunion.user_id
That union will need to grow as you add more cloud_storage_N tables. All in all, it's not a great database design. You would be much better off creating a single cloud_storage table and having a field that delineates which one it is a, b, c, ... ,N
Then your UNION query would just be SELECT DISTINCT user_id FROM cloud_storage; and you would never need to edit it again.
You need to join unknown(?) number of tables cloud_storage_X this way.
You'd better change your schema to the following:
storage:
user_id cloud
------- -----
421 a
333 a
555 b
192 c
555 c
Then the query is as simple as this:
select distinct user_id
from storage;
select u.* from users u,
cloud_storage_a csa,
cloud_storage_b csb,
cloud_storage_c csc
where u.user_id = csa.user_id or u.user_id = csb.user_id or u.user_id = csc.user_id
You should simplify your schema to handle this type of queries.
To get columns from your users table for all (distinct) qualifying users:
SELECT * -- or whatever you need
FROM users u
WHERE EXISTS (SELECT 1 FROM cloud_storage_a WHERE user_id = u.user_id) OR
EXISTS (SELECT 1 FROM cloud_storage_b WHERE user_id = u.user_id) OR
EXISTS (SELECT 1 FROM cloud_storage_c WHERE user_id = u.user_id);
To just get all user_id and nothing else, #JNevill's UNION query looks good. You could join the result of this to users to the same effect:
SELECT u.* -- or whatever you need
FROM users u
JOIN (
SELECT user_id FROM cloud_storage_a
UNION
SELECT user_id FROM cloud_storage_b
UNION
SELECT user_id FROM cloud_storage_c
) c USING user_id);
But that's probably slower.