Handling nested case statements in Redshift

Handling nested case statements in Redshift - sql

I am writing a Redshift query which require use of multiple case statements.
Pretext:
Customers can associated with more than one organizations like, sweet or salt etc.
Ask :
We have to check that customers associated with 'SWEETS' organization are picked first, if no affiliation with 'SWEETS' is available , than we have to take id of that organization where flag = 1.
I have to use a case statement in redshift to derive the result.
There are three different tables, customer table, organization table and 3 table that determines how customers are associated with organization.
![enter image description here][1]
Code that I have tried is below , but after executing this , I am still getting the two organization ids, instead of one id which should be of sweet org.
SELECT customer_id
, organization_id
FROM customer_details AS customer
LEFT JOIN organization AS org
ON customer.customer_id
AND organization_id = CASE WHEN organization_id IN (SELECT organization_id
FROM organization_type
WHERE organization_type = 'SWEET')
THEN organization_id
ELSE org.organization_id END

You can use window functions:
select customer_id, organization_id
from (select c.customer_id, o.organization_id,
row_number() over (partition by o.customer_id order by o.organization_type = 'SWEET' desc) as seqnum
from customer_details c left join
organization o
on c.customer_id = o.organization_id
) co
where seqnum = 1;

Related

INNER JOIN of pagevies, contacts and companies - duplicated entries

In short: 3 table inner join duplicates records
I have data in BigQuery in 3 tables:
Pageviews with columns:
timestamp
user_id
title
path
Contacts with columns:
website_user_id
email
company_id
Companies with columns:
id
name
I want to display all recorded pageviews and, if user and/or company is known, display this data next to pageview.
First, I join contact and pageviews data (SQL is generated by Metabase business intelligence tool):
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
ORDER BY `timestamp` DESC
It works as expected and I can see pageviews attributed to known contacts.
Next, I'd like to show pageviews of contacts with known company and which company is this:
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`,
`Companies`.`name` AS `name`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
INNER JOIN `analytics.companies` `Companies` ON `Contacts`.`company_id` = `Companies`.`id`
ORDER BY `timestamp` DESC
With this query I would expect to see only pageviews where associated contact AND company are known (just another column for company name). The problem is, I get duplicate rows for every pageview (sometimes 5, sometimes 20 identical rows).
I want to avoid selecting DISTINCT timestamps because it can lead to excluding valid pageviews from different users but with identical timestamp.
How to approach this?

Your description sounds like you have duplciates in companies. This is easy to test for:
select c.id, count(*)
from `analytics.companies` c
group by c.id
having count(*) >= 2;
You can get the details using window functions:
select c.*
from (select c.*, count(*) over (partition by c.id) as cnt
from `analytics.companies` c
) c
where cnt >= 2
order by cnt desc, id;

How to SELECT with several conditions? (WHERE ... AND ... IN (...))

For example: in my database I have 3 tables: COMPANY, COUPON and COMPANY_COUPON.
COMPANY table has fields: ID and NAME, COUPON table has: ID, TITLE and TYPE, COMPANY_COUPON table has: ID of the COMPANies and ID of the COUPONs that they own.
So, in java to get all coupons of the company I use command:
SELECT coupon_id FROM company_coupon WHERE company_id = ?
And put it into Collection.
But I need something to get all coupons of the company by the type,
something like:
SELECT * FROM company_coupon WHERE company_id = 1 AND coupon_id = (SELECT * FROM coupon WHERE type = camping)
of course this one is not working, but I'm looking for something like that.
I know that i can get all coupons of the company and put them into Collection and then just delete all coupons that not equals to the specified type, but is there any way to do this process in database by SQL commands?

You might want to use WHERE IN here:
SELECT *
FROM COMPANY_COUPON
WHERE COMPANY_ID = 1 AND COUPON_ID IN (SELECT ID FROM COUPON WHERE TYPE = 'CAMPING');
You could also use EXISTS, which is probably the best way to write your logic:
SELECT cc.*
FROM COMPANY_COUPON cc
WHERE
cc.COMPANY_ID = 1 AND
EXISTS (SELECT 1 FROM COUPON c WHERE c.TYPE = 'CAMPING' AND c.ID = cc.COUPON_ID);
Using EXISTS might outperform doing a join between the two tables, because the database can stop as soon as it finds the very first match.

Use only one column with an IN operator
SELECT *
FROM COMPANY_COUPON
WHERE COMPANY_ID = 1
AND COUPON_ID IN (SELECT COUPON_ID FROM COUPON WHERE TYPE = CAMPING)

I think you just want a join:
SELECT cc.COUPON_ID
FROM COMPANY_COUPON cc JOIN
COUPON c
ON cc.COUPON_ID = c.ID
WHERE cc.COMPANY_ID = ? AND c.TYPE = ?;

How to filter out duplicate records caused by a JOIN in SQL?

I have a simple query that returns a list of phone numbers for a given customer. The users are able to search for a specific customer by entering any part of their address. The customer can have multiple phone numbers and addresses.
Here is an example of my query:
SELECT
ROW_NUMBER() OVER(PARTITION BY Customer.CustomerNumber, ORDER BY PhoneNumber.PhoneNumber) RowNumber,
Customer.CustomerNumber,
PhoneNumber.PhoneNumber
FROM Customer
JOIN PhoneNumber ON PhoneNumber.CustomerId = Customer.Id
JOIN CustomerAddress on CustomerAddress.CustomerId = Customer.Id
Here is what this query is producing when I have a customer that has two phone numbers and two addresses:
RowNumber CustomerNumber PhoneNumber
1 1 111-111-1111
2 1 222-222-2222
3 1 111-111-1111
4 1 222-222-2222
My desired result would be something along these lines:
RowNumber CustomerNumber PhoneNumber
1 1 111-111-1111
2 1 222-222-2222
I can only produce the desired result above when I remove the join on the address table.
A user should be able to look up a customer by entering in any part of the address (ex: I want to show any user that has an address in Phoenix). While it isn't displayed in the result, it still should be filterable.
I think I am able to do something like this:
SELECT *, ROW_NUMBER() OVER(PARTITION BY Test.CustomerNumber ORDER BY Test.PhoneNumber) RowNumber
FROM
(
SELECT
DISTINCT
CustomerNumber,
PhoneNumber.PhoneNumber
FROM Customer
JOIN PhoneNumber ON PhoneNumber.CustomerId = Customer.Id
JOIN CustomerAddress ON CustomerAddress.CustomerId = Customer.Id
) Test

You're not selecting any columns from the CustomerAddress table, so joining that table has only the effects of suppressing results for customers with no address and duplicating results for those with multiple addresses.
If you don't want either of those effects then don't join CustomerAddress. If you want only the former then you'd be better off with a different approach, such as
SELECT
ROW_NUMBER() OVER(PARTITION BY Customer.CustomerNumber,
ORDER BY PhoneNumber.PhoneNumber) RowNumber,
Customer.CustomerNumber,
PhoneNumber.PhoneNumber
FROM
Customer
JOIN PhoneNumber ON PhoneNumber.CustomerId = Customer.Id
WHERE
Customer.Id IN (SELECT CustomerId from CustomerAddress)
Presuming that Customer.Id is a primary key, that should produce duplicates only if duplicate phone numbers for a given customer are recorded in the PhoneNumber table.

Removing the join on address might affect the ability of users to search by address. If that is the problem, change the query to use exists:
SELECT ROW_NUMBER() OVER (PARTITION BY c.CustomerNumber ORDER BY pn.PhoneNumber) as RowNumber,
c.CustomerNumber,
pn.PhoneNumber
FROM Customer c JOIN
PhoneNumber pn
ON pn.CustomerId = c.Id
WHERE EXISTS (SELECT 1
FROM CustomerAddress ca
WHERE ca.CustomerId = c.Id AND
ca.address like '%#ADDRESS%' -- this is just an example of searching logic
);

Change your window function to partition by customer and phone number in a sub query. I use this all the time to filter dupes. Just set the partition by to your unique key.
Select * from (
SELECT
ROW_NUMBER() OVER(PARTITION BY Customer.CustomerNumber, PhoneNumber PhoneNumber, ORDER BY PhoneNumber.PhoneNumber) RowNumber,
Customer.CustomerNumber,
PhoneNumber.PhoneNumber
FROM Customer
JOIN PhoneNumber ON PhoneNumber.CustomerId = Customer.Id
JOIN CustomerAddress on CustomerAddress.CustomerId = Customer.Id)
Where RowNumber = 1

Select all customers loyal to one company?

I've got tables:
TABLE | COLUMNS
----------+----------------------------------
CUSTOMER | C_ID, C_NAME, C_ADDRESS
SHOP | S_ID, S_NAME, S_ADDRESS, S_COMPANY
ORDER | S_ID, C_ID, O_DATE
I want to select id of all customers who made order only from shops of one company - 'Samsung' ('LG', 'HP', ... doesn't really matter, it's dynamic).
I've come only with one solution, but I consider it ugly:
( SELECT DISTINCT c_id FROM order JOIN shop USING(s_id) WHERE s_company = "Samsung" )
EXCEPT
( SELECT DISTINCT c_id FROM order JOIN shop USING(s_id) WHERE s_company != "Samsung" );
Same SQL queries, but reversed operator. Isn't there any aggregate method which solves such query better?
I mean, there could be millions of orders(I don't really have orders, I've got something that occurs more often).
Is it efficient to select thousands of orders and then compare them to hundreds of thousands orders which have different company? I know, that it compares sorted things, so it's O( m + n + sort(n) + sort(m) ). But that's still large for millions of records, or isn't?
And one more question. How could I select all customer values (name, address). How can I join them, can I do just
SELECT CUSTOMER.* FROM CUSTOMER JOIN ( (SELECT...) EXCEPT (SELECT...) ) USING (C_ID);
Disclaimer: This question ain't homework. It's preparation for the exam and desire to things more effective. My solution would be accepted at exam, but I like effective programming.

I like to approach this type of question using group by and a having clause. You can get the list of customers using:
select o.c_id
from orders o join
shops s
on o.s_id = o.s_id
group by c_id
having min(s.s_company) = max(s.s_company);
If you care about the particular company, then:
having min(s.s_company) = max(s.s_company) and
max(s.s_company) = 'Samsung'
If you want full customer information, you can join the customers table back in.
Whether this works better than the except version is something that would have to be tested on your system.

How about a query that uses no aggregate functions like Min and Max?
select C_ID, S_ID
from shop
group by C_ID, S_ID;
Now we have a distinct list of customers and all the companies they shopped at. The loyal customers will be the ones who only appear once in the list.
select C_ID
from Q1
group by C_ID
having count(*) = 1;
Join back to the first query to get the company id:
with
Q1 as(
select C_ID, S_ID
from shop
group by C_ID, S_ID
),
Q2 as(
select C_ID
from Q1
group by C_ID
having count(*) = 1
)
select Q1.C_ID, Q1.S_ID
from Q1
join Q2
on Q2.C_ID = Q1.C_ID;
Now you have a list of loyal customers and the one company each is loyal to.

Find the latest date of two tables with matching primary keys

I have two tables tables, each with primary keys for different people and the contact dates in each category.I am trying to find the most recent contact date for each person, regardless of what table its in. For example:
CustomerService columns: CustomerKey, DateContacted
CustomerOutreach columns: CustomerKey, DateContacted
And I'm just trying to find the very latest date for each person.

Use something like this.
You need to combine the two tables. You can do this by a union. There will be duplicates, but you just group by the customerKey and then find the Max DateContacted
SELECT * INTO #TEMP FROM (
SELECT
CustomerKey
, DateContacted
FROM CustomerService CS
UNION
SELECT
CustomerKey
, DateContacted
FROM CustomerOutreach CS
)
SELECT
CustomerKey
, MAX(DateContacted)
FROM #TEMP
GROUP BY
CustomerKey

Join your tables on primary keys and make a conditional projection.
Select cs.CustomerKey,
CASE WHEN cs.DateContacted <= co.DateContacted
THEN co.DateContacted
ELSE cs.DateContacted END
from CustomerService cs inner join CustomerOutreach co
on cs.CustomerKey = co.CustomerKey

I would do something like this.
Select b.customerKey, b.dateContacted
from (
select a.customerKey, a.DateContacted, Row_Number() over (Partition by customerKey order by DateContacted desc) as RN
from (
Select c.customerKey,
case when (s.DateContacted > o.dateContacted) then s.dateContacted else o.datecontacted end as DateContacted
from Customer c
left outer join customerService s on c.customerKey = s.customerKey
left outer join customerOutreach o on c.customerKey = s.customerKey
where s.customerKey is not null or o.customerKey is not null
)a
)b
where b.RN = 1
This solution should take care of preventing the case of having duplicates if both tables have the same max DateContacted.
http://sqlfiddle.com/#!3/ca968/1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Handling nested case statements in Redshift - sql

Related

INNER JOIN of pagevies, contacts and companies - duplicated entries

How to SELECT with several conditions? (WHERE ... AND ... IN (...))

How to filter out duplicate records caused by a JOIN in SQL?

Select all customers loyal to one company?

Find the latest date of two tables with matching primary keys

Categories

Resources