SQL: How to remove duplicate rows created by CASE WHEN statement - sql

I have two tables: (A) customers of the gym and (B) customers of the restaurant. I want to create an indicator in table (A) to indicate the customers who have been to both the gym and the restaurant on the same day. In accomplishing this, I used the following SQL script, but it created duplicate rows:
SELECT *,
CASE WHEN a.GymDate = b.RestaurantDate THEN 'Meal + Gym on the same day'
ELSE 'Gym Only' END AS 'Meal+Gym'
FROM Table_A a
LEFT JOIN Table_B b
ON a.customerid = b.customerid;
May I know how to keep only Table_A, but with the addition of the 'Meal+Gym' Indicator? Thanks!

A case expression does not generate rows, it is your join that is generating the duplicate rows. You could add the date predicate to the join condition, and merely check for the existence of a record, e.g.
SELECT *,
CASE WHEN b.customerid IS NOT NULL THEN 'Meal + Gym on the same day'
ELSE 'Gym Only'
END AS [Meal+Gym]
FROM Table_A a
LEFT JOIN Table_B b
ON a.customerid = b.customerid
AND a.GymDate = b.RestaurantDate;
If table_B is not unique per customer/Date then you may need to do something like this to prevent duplicates:
SELECT *,
CASE WHEN r.RestaurantVisit IS NOT NULL THEN 'Meal + Gym on the same day'
ELSE 'Gym Only'
END AS [Meal+Gym]
FROM Table_A a
OUTER APPLY
( SELECT TOP 1 1
FROM Table_B b
WHERE a.customerid = b.customerid
AND a.GymDate = b.RestaurantDate
) AS r (RestaurantVisit);
N.B. While using single quotes works for column aliases, it is not a good habit at all, because it makes your column aliases indistinguishable from string literals other than from context. Even if this is clear to you, it probably isn't to other people, and since there's about a 10:1 ratio of reading:writing code, writing code that is easy to read is important. As such I've used square brackets for your column name instead

I would start with a table of customers, so you get an indicator for customers who have been to neither the gym nor a restaurant.
Then:
select c.*,
(case when exists (select 1
from table_a a join
table_b b
on a.customerid = b.customerid and
a.GymDate = b.RestaurantDate
where a.customerid = c.customerid
)
then 1 else 0
end) as same_day_gym_restaurant_flag
from customers c;

You can use CASE WHEN EXISTS instead of the LEFT JOIN:
SELECT *,
CASE WHEN EXISTS (
SELECT 1 FROM Table_B b
WHERE a.customerid = b.customerid
AND a.GymDate = b.RestaurantDate)
THEN 'Meal + Gym on the same day'
ELSE 'Gym Only'
END AS 'Meal+Gym'
FROM Table_A a
This assumes that you don't need any data from Table_B in the results.

Related

How can I exclude users who have only purchase with car?

There is this SQL(from Django) query:
SELECT "id", "name"
FROM "polls_client" INNER JOIN "polls_purchases"
ON ("id" = "client_id")
WHERE "polls_purchases"."product" IN (car, bike)
We need to select from query users who have purchase records only 'car'. I want to do this in one select to the database. How do I do this?
You can group by client and set the condition in the HAVING clause:
SELECT pc.id, pc.name
FROM polls_client pc INNER JOIN polls_purchases pp
ON pc.id = pp.client_id
GROUP BY pc.id, pc.name
HAVING SUM(CASE WHEN pp.product <> 'car' THEN 1 ELSE 0 END) = 0
We need to select from query users who have purchase records only 'car'.
The simplest, most efficient method should be not exists:
SELECT c.*
FROM "polls_client" c
WHERE NOT EXISTS (SELECT 1
FROM "polls_purchases" pp
WHERE c."id" = pp."client_id" AND pp."product" <> 'car'
);
In particular, this can take advantage of an index on polls_purchases(client_id, product).
I would also dissuade your from using double quotes for identifies. They only serve to clutter queries.

Match two tables based on minimum dates efficiently

I have two tables one which contains quarterly data and one which contains daily data. I would like to join the two tables such that for each day in the daily data the quarterly data for that quarterly is selected and returned daily. I am working with Postgres 9.3.
The current query is as follows:
select
a.ID,
a.datadate,
b.*,
case when a.datadate = b.rdq then 1 else 0 end as VALID
from proj_data a, proj_rat b
where a.id = b.id
and b.rdq = (select min(rdq)
from proj_rat c
where a.id = c.id and a.datadate >= c.rdq);
But it is excruciatingly slow and I need to do this for several thousand IDs. Can anyone suggest a more efficient solution?
This eliminates the need for a subquery in the where clause
select
ID,
a.datadate,
b.*,
(a.datadate = b.rdq)::integer as VALID
from
proj_data a
inner join
(
select distinct on (id, rdq) *
from project_rat
order by id, rdq
) b using(id)
where a.datadate >= b.rdq;

How can I get join to work with conditions

I have a table TimeIntervals with a relationship to Breaks with in turn has a relationship to DeletedBreaks.
What i'm trying to do is to receive al rows (Breaks) that has timeIntervals id and no deleted break for a given date.
That is if a break has no deleted breaks, no row in DeletedBreaks table for a break id Or if there is a row with that breaks id but not the given date, than that break should be returned.
Following is not working but you might understand what i'm trying to do:
SELECT B.*
FROM Breaks B
JOIN TimeIntervals T
ON B.TimeIntervalId = T.Id
JOIN DeletedBreaks DB
ON (
(
DB.BreakId = B.Id
AND DB.DeletedDate <> '2014-10-13'
)
OR DB.BreakId IS NULL
)
AND (T.Id = 2)
Use a LEFT JOIN to your DeletedBreaks table instead of an inner join since you don't want to drop Break records just becasue the DeletedBreaks ID is null.
To test for NULL DeletedBreaks or DeletedBreaks for a particular day, do so in the WHERE clause:
SELECT B.*
FROM Breaks B
JOIN TimeIntervals T
ON B.TimeIntervalId = T.Id
LEFT JOIN DeletedBreaks DB ON
DB.BreakId = B.Id
WHERE
(DB.DeletedDate <> '2014-10-13'
OR DB.BreakId IS NULL)
AND T.Id = 2

COUNT() columns by a specific value

I want to make a query on a SQL Compact 4.0 DB-Table, with 2 COUNT()-columns. The first column shall count all rows ( COUNT(*) ) and the second one shall only count the row, when the decimal-value of a specific column is higher as or equal to 3.0
I got this far:
SELECT COUNT(a.number) AS Participant, COUNT(b.specificColumn) AS Approved
FROM person AS a
LEFT OUTER JOIN test AS b
ON b.number = a.number
This way the second COUNT() will obviously only count rows, that actually have a value != NULL
I don't think you can do it using a count. Try using a case statement. Not tested:
SELECT COUNT(a.number) AS Participant,
SUM(case when b.specificColumn >3 then 1 else 0 end) as Approved
FROM person AS a
LEFT OUTER JOIN test AS b
ON b.number = a.number
SELECT COUNT(a.number) AS Participant,
SUM(CASE WHEN b.specificColumn IS NULL THEN 0
WHEN b.specificColumn >= 3 THEN 1
ELSE 0) AS Approved
FROM person AS a
LEFT OUTER JOIN test AS b
ON b.number = a.number

Joining to the same table in SQL - SQL Server 2008

I have the following table:
ID Type Description IDOfSystem
--------------------------------
1000 Company Company Item NULL
1010 System System Item NULL
1020 Company NULL 1010
I have System and Company Items. I need to write a select query that gets all the company items and system items UNLESS if a company item has a value in IDOfSystem I need to exclude that system item and get the description from the system item.
So, given the above table, the SQL select should return rows 1000, 1020 (with "System Item") as the description.
If 1020 didn't exist, I'd simply get 1000 and 1010.
I guess I can break this up into multiple queries and do a UNION. I tried to do a left outer join on the same table but couldn't get the description from the system row.
Any help?
SELECT ID, Type, Description
FROM MyTable AS A
WHERE IDOfSystem IS NULL AND NOT EXISTS (SELECT *
FROM MyTable AS B
WHERE B.IDOfSystem = A.ID)
UNION ALL
SELECT A.ID, A.Type, B.Description
FROM MyTable AS A INNER JOIN MyTable AS B ON A.IDOfSystem = B.ID
WHERE IDOfSystem IS NOT NULL
What I'm doing is first selecting all rows that don't have a referenced system, and aren't used as some other rows system.
Then I'm doing a union with another query that finds all rows with a referenced system, and joining in the system to grab it's description.
SELECT
Companies.ID
,Companies.Type
,COALESCE(Systems.Description, Companies.Description) as Description
FROM YourTable Companies
LEFT OUTER JOIN YourTable Systems on Systems.ID = Companies.IDOfSystem
WHERE NOT EXISTS
(
SELECT * FROM YourTable T3 WHERE T3.IDOfSystem = Companies.ID
)
Here it is running on SEDE.
This approach uses a self join to look up the corresponding system's description. A separate subquery filters out the referenced systems.
select yt1.Id
, yt1.Type
, coalesce(yt2.Description, yt1.Description) as Description
from YourTable yt1
left join
YourTable yt2
on yt1.type = 'Company'
and yt2.type = 'System'
and yt2.ID = yt1.IDOfSystem
where yt1.type in ('System', 'Company')
and not exists
(
select *
from YourTable yt3
where yt1.type = 'System'
and yt3.type = 'Company'
and yt1.ID = yt3.IDOfSystem
)
Working example at SE Data.
I'm sure there are better ways of doing this, but try:
SELECT A.Id, A.Type, ISNULL(A.Description,B.Description) Description, A.IDOfsystem
FROM YourTable A
LEFT JOIN YourTable B
ON A.IDOFSystem = B.ID
WHERE A.ID NOT IN (SELECT IDOfsystem FROM YourTable WHERE IDOfsystem IS NOT NULL)