SQL: Removing entire group when case is something in GROUP BY - sql

I am using postgreSQL, I have two tables, one is user, and one is usertasks.
user has following fields : userid, username
usertasks has following fields: taskid, taskdate, userid
userid and id are primary keys on above tables
I want to find all users who have not made any tasks between two specific dates.
My query is this:
Select u.userid
from users u
left join usertasks ut
on ut.userid=u.userid
group by u.userid
having count(case when ut.task_date >= '2015-7-1' AND ut.task_date <= '2015-10-6' then null else 1 end)>=1
Problem:
Some users have done tasks between those dates and have also done tasks excluding those two dates, so those users are returned as well, I don't want those users, because they have done tasks between those dates, I am interested only those users who have not done any tasks between those two dates.
It would work if entire group can be skipped if any of the tasks in that group are done between those two dates.
So basically, I want a query equivalent to this:
(Select u.userid
from users u)
except
(Select u.userid
from users u
left join usertasks ut
on ut.userid=u.userid
where ut.task_date >= '2015-7-1' AND ut.task_date <= '2015-10-6'
group by u.userid)
The above query returns right result, but its slower, thats why I can't use except or NOT IN.
SAMPLE DATA:
Create tables:
create table users( userid serial PRIMARY KEY, username VARCHAR(20) );
create table usertasks(taskid serial PRIMARY KEY, userid BIGINT, taskdate DATE );
Insert data:
INSERT INTO users (username) VALUES
('a'),
('b'),
('c'),
('d'),
('e'),
('f');
INSERT INTO usertasks (userid,taskdate) VALUES
(1,'2015-11-5'),
(1,'2015-10-1'),
(2,'2015-9-2'),
(3,'2015-9-2'),
(4,'2015-9-2'),
(5,'2015-9-2'),
(6,'2015-11-2');
according to the query, none of the usertasks with userid 1 should return

I would be inclined to user not exists:
select u.*
from users u
where not exists (select 1
from usertasks ut
where ut.userid = u.userid and
ut.task_date >= '2015-07-01' and
ut.booking_date <= '2015-10-06'
);
This seems like a direct translation of your requirement.
That said, this query:
Select u.userid
from users u left join
usertasks ut
on ut.userid = u.userid
group by u.userid
having sum(case when ut.task_date >= '2015-7-1' AND ut.booking_date <= '2015-10-6' then 1 end) = 0;
Should return the same result. I think the problem with your query is the having clause:
having count(case when ut.task_date >= '2015-7-1' AND ut.booking_date <= '2015-10-6' then null else 1 end) >= 1
Because of the left join, users with no tasks will end up with a count value of 1. Why? Because the ut fields will be NULL, which fails the comparison. Hence the else clause is used. The SUM() version doesn't have this short-coming.

Related

Join 2 tables on foreign key while using count() in SQL

So I have two tables: Please see the ER diagram here
I want to use SELECT to create one table with "name" from the USER table, "id" as the foreign key for the two tables, and the count of friend_id as the number of friends each user has.
Here is my code:
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
GROUP BY user_id)
FROM user
ORDER BY number DESC
I'm wondering what's the problem with these lines. Thank you!
You can use a subquery to calculate the count.
SELECT name, id, COALESCE(f.Count, 0) AS friend_count
FROM user u
LEFT JOIN (
SELECT user_id, COUNT(DISTINCT friend_id) AS Count
FROM friend
GROUP BY user_id
) f ON f.user_id = u.id
ORDER BY friend_count DESC
I used a LEFT JOIN so that if a user doesn't have a row in friend, it will still return a row with a friend count of 0 (thanks to COALESCE). I also added a DISTINCT so that if the friend has duplicates the friend is counted only one, might not be necessary especially if you have a UNIQUE INDEX setup on columns user_id, friend_id
Just add where to find only one id and remove group by because you have only one id for one or more friends as your diagram says.
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
WHERE user_id = user.id)
FROM user
ORDER BY number DESC
I think this will be correct for you puprose
CREATE TABLE #user(
id VARCHAR(22),
[name] VARCHAR(255),
)
CREATE TABLE #friend(
user_id VARCHAR(22),
friend_id VARCHAR(22)
)
SELECT name, id, (SELECT COALESCE(COUNT(friend_id), 0)
FROM #friend f
WHERE f.user_id = u.id
GROUP BY user_id) as number
FROM #user u
ORDER BY number DESC
--Same query with join:
SELECT u.[name], u.id, COALESCE(COUNT(f.friend_id),0) number
FROM #user u
LEFT JOIN #friend f ON f.user_id = u.id
GROUP BY u.[name], u.id
ORDER BY number

SQL Oracle - Multiple count queries on multiple tables

Maybe for some people it might look very simple, but I just cant get it.
My tables are:
CREATE TABLE USERS (user_ID number PRIMARY KEY, username varchar2(32), password varchar2(32));
CREATE TABLE VIDEOS (video_ID number PRIMARY KEY, title varchar(64), description varchar(128));
CREATE TABLE VIEWS (view_ID number PRIMARY KEY, user_ID number, video_ID number);
CREATE TABLE FAVORITES (fav_ID number PRIMARY KEY, user_ID number, video_ID number);
I ve created those separated queries:
SELECT u.username AS "Username", count(*) AS "Views"
FROM Views v, Videos vd, Users u
WHERE v.user_id = u.user_id
AND v.video_id = vd.video_id
GROUP BY u.username
SELECT u.username AS "Username", count(*) AS "Favorites"
FROM Favorites f, Videos vd, Users u
WHERE f.user_id = u.user_id
AND f.video_id = vd.video_id
GROUP BY u.username
And I want a query to show something like that in only one simple query:
Username Views Favorites
-------------------------------
Person1 12 1
Person2 234 21
...
I Googled bunch of similar questions but I couldnt make any of them to work.
So any help is greatly appreciated.
You are progressing on the right track. -> You got two queries and you wish to see them together. You could perform a full outer join to get your results you are looking for as below.
with fave
as (
SELECT u.username AS "Username"
, count(*) AS "Favorites"
FROM Favorites f
JOIN Videos vd
ON f.video_id = vd.video_id
JOIN Users u
ON f.user_id = u.user_id
GROUP BY u.username
)
,views
as (SELECT u.username AS "Username"
, count(*) AS "Views"
FROM Views v
JOIN Videos vd
ON v.video_id = vd.video_id
JOIN Users u
ON v.user_id = u.user_id
GROUP BY u.username
)
select isnull(f.username,v.username) as username
,f.favourites
,v.views
from fave f
full outer join views v
on f.username=v.username
Since you know your data better, you could optimize the query further. Eg: it could be a rule that user who has set a favourite would also have viewed the video. If this is true then you can write a better query to optimize the dataset in a single block, instead of two blocks using full outer join
Aggregate separately in Views:
select user_id, count(*) counter
from Views
group by user_id
and Favorites
select user_id, count(*) counter
from Favorites
group by user_id
and finally LEFT join Users to the above queries:
select u.username,
coalesce(v.counter, 0) Views,
coalesce(f.counter, 0) Favorites
from users u
left join (
select user_id, count(*) counter
from Views
group by user_id
) v on v.user_id = u.user_id
left join (
select user_id, count(*) counter
from Favorites
group by user_id
) f on f.user_id = u.user_id
I used LEFT joins because there may exist users that did not see any video or do not have any favorites. In any of these cases COALESCE() will return 0 instead of null.
The table Videos is not needed.

How to replace LEFT outer join with INNER join in SQL?

I have a view on which I need to provide cluster Indexing the problem is in order to provide cluster indexing the it should not have any of the left or right outer joins , and I want to replace the LEFT outer join with INNER join , one of the ways which I can think of is to insert a dummy value with lets say -1 in the right table and by doing this even if all the Ids from the left table wont match Ids from the right table in INNER JOIN but since we have inserted -1 in the right table and we are using IsNULL(u.UserId,-1) it should return all the values from the left table but somehow this approach is not working.
create table Users(
UserId int,
UserName nvarchar(255)
)
insert into Users values(1,'sid429')
insert into Users values(2,'ru654')
insert into Users values(3,'dick231')
create table managers
(
caseId int,
CaseName nvarchar(255),
UserId int
)
insert into managers values (100,'Case1',1)
insert into managers values (101,'Case2',2)
insert into managers values (-1,NULL,-1)
select username from users u inner join managers m on m.UserId=IsNULL(u.UserId,-1)
Don't talk about indexes, but I think you could replace LEFT JOIN by INNER JOIN + UNION
select username from users u inner join managers m on m.UserId= u.UserId
UNION ALL
select username from users u WHERE NOT EXISTS (SELECT 1 FROM managers m WHERE m.UserId = u.UserId)
IsNull(u.UserId,-1) doesn't seem right - u.UserId is never null, since the absence of data is in the managers table - in this case, u.UserId will always have a value, but m.UserId might not, so IsNull(u.UserId, -1) won't work.
I'm intrigued to see a better answer, but I don't think you can do that - I think you eventually need to substitute the value conditionally to -1 if it doesn't exist in the other table, like this:
select username from users u
inner join managers m on m.UserId =
case when not exists(select * from managers where UserId = u.UserId)
then -1 else u.UserId end
This has the desired effect, but looking at the execution plan, won't help your performance issue.
You can replace a LEFT OUTER JOIN with an INNER JOIN if you add the missing values in the related table.
It has not worked for you because you have added a -1 value. But the not matching value on your INNER JOIN is a 3, not a null or a -1.
You can do so at runtime with an UNION, no need to permanently create those values as you have tried to do (inserting that -1 value) :
with expanded_managers as (
select CaseId, CaseName, UserId
from managers
union
select null, null, UserId
from users
where not exists (select * from managers where managers.UserId = users.UserId)
)
select UserName, CaseName
from users
inner join expanded_managers on expanded_managers.UserId = users.UserId
if you require only username it should be simple:
select distinct username from users u inner join managers m on m.UserId=u.UserId OR ( m.UserId=-1 AND u.userId = u.userId)
I have cleaned-up this part a bit. I had to guess the logical model, given that you did not specify any constraints.
create table Users (
UserId int not null
, UserName nvarchar(255) not null
, constraint pk_users primary key (UserId)
, constraint ak_users unique (UserName)
);
create table Cases (
CaseId int not null
, CaseName nvarchar(255) not null
, UserId int not null
, constraint pk_cases primary key (CaseId)
, constraint ak_cases unique (CaseName)
, constraint fk_cases foreign key (UserId)
references Users (UserId)
);
insert into Users values(1,'sid429') ;
insert into Users values(2,'ru654') ;
insert into Users values(3,'dick231');
insert into Cases values (100,'Case1',1);
insert into Cases values (101,'Case2',2);
This is mostly self-explanatory, but you have to understand that candidate keys (unique) for the result are: {UserID, CaseId}, {UserName, CaseName}, {UserID, CaseName}, {UserName, CaseId}. Not sure if you were expecting that.
with
R_00 as (
select UserId from Users
except
select UserId from Cases
)
select u.UserId
, u.UserName
, c.CaseId
, c.CaseName
from Users as u
join Cases as c on u.UserId = c.UserId
union
select u.UserId
, u.UserName
, (-1) as CaseId
, 'n/a'as CaseName
from Users as u
join R_00 as r on r.UserId = u.UserID
;
Another version of this, similar to other examples in the post.
select u.UserId
, u.UserName
, c.CaseId
, c.CaseName
from Users as u
join Cases as c on u.UserId = c.UserId
union
select u.UserId
, u.UserName
, (-1) as CaseId
, 'n/a' as CaseName
from Users as u
where not exists (select 1 from Cases as c where c.UserId = u.userId)
;

SQL find records with duplicate email and date of birth

I'm trying to write a query to find any duplicate records in my database. I want to find all records (not the count) where the EmailAddress AND DateofBirth (both columns) already exist on another record.
Account tbl contains the EmailAddress.
User tbl contains the DateOfBirth
Join on AccountID
The following query selects records where the EmailAddress exists in another record OR the DateOfBirth exists in another record, but I'm unable to combine the two conditions. If I'm correct so far, the 'and' on line 7 acts more like an 'or' in my case..?
select a.AccountName, a.EmailAddress, u.DateOfBirth from Account as a
join [User] as u
on a.AccountID = u.AccountID
where a.EmailAddress in (
select EmailAddress from Account group by EmailAddress having count(*) > 1
)
and
DateOfBirth in(
select DateOfBirth from [User] group by DateOfBirth having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress
For example, this may produce 50 records. If I look through them, I find 5 records all with the matching EmailAddress, however only 4 of them have the same DateOfBirth. The 5th record is displaying due to another record in the database with the same DateOfBirth but different EmailAddress.
I'd like to find only those records who have both the matching email and dob.
Thanks as always, please ask if you require a further description.
Regards
Json
Using your approach, you can use exists:
select a.AccountName, a.EmailAddress, u.DateOfBirth
from Account as a join
[User] as u
on a.AccountID = u.AccountID
where exists (select EmailAddress
from Account a2 join
[User] u2
on a.AccountID = u.AccountID
where a2.EmailAddress = a.EmailAddress and
u2.DateOfBirth = u.DateOfBirth
group by EmailAddress
having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress;
A better way is to use window/analytic functions:
select AccountName, EmailAddress, DateOfBirth
from (select a.AccountName, a.EmailAddress, u.DateOfBirth,
count(*) over (partition by a.EmailAddress, u.DateOfBirth) as cnt
from Account as a join
[User] as u
on a.AccountID = u.AccountID
) ua
where cnt > 1
order by DateOfBirth, EmailAddress;
Join the two tables on the account id.
Group by email and date
Show only those entries which have count(*) > 1 (using the HAVING expression).
In MySQL (I have no MS SQL server available at the moment), this can be done with:
SELECT * FROM a JOIN b ON a.account = b.account
GROUP BY email, birth
HAVING count(*) > 1;
Where I used the following commands to setup the tables a and b:
create table a (
account int primary key auto_increment,
email text
);
create table b (
account int,
birth date,
constraint foreign key (account) references a (account)
);
insert into a (email) values ("email1"), ("email1"), ("email2"), ("email2");
insert into b values (1, "2000-01-01"), (2, "2000-01-01"), (3, "2000-01-01"), (4, "2000-01-02");

query with count subquery, inner join and group

I'm definitely a noob with SQL, I've been busting my head to write a complex query with the following table structure in Postgresql:
CREATE TABLE reports
(
reportid character varying(20) NOT NULL,
userid integer NOT NULL,
reporttype character varying(40) NOT NULL,
)
CREATE TABLE users
(
userid serial NOT NULL,
username character varying(20) NOT NULL,
)
The objective of the query is to fetch the amount of report types per user and display it in one column. There are three different types of reports.
A simple query with group-by will solve the problem but display it in different rows:
select count(*) as Amount,
u.username,
r.reporttype
from reports r,
users u
where r.userid=u.userid
group by u.username,r.reporttype
order by u.username
SELECT
username,
(
SELECT
COUNT(*)
FROM reports
WHERE users.userid = reports.userid && reports.reporttype = 'Type1'
) As Type1,
(
SELECT
COUNT(*)
FROM reports
WHERE users.userid = reports.userid && reports.reporttype = 'Type2'
) As Type2,
(
SELECT
COUNT(*)
FROM reports
WHERE users.userid = reports.userid && reports.reporttype = 'Type3'
) As Type3
FROM
users
WHERE
EXISTS(
SELECT
NULL
FROM
reports
WHERE
users.userid = reports.userid
)
SELECT
u.username,
COUNT(CASE r.reporttype WHEN 1 THEN 1 END) AS type1Qty,
COUNT(CASE r.reporttype WHEN 2 THEN 1 END) AS type2Qty,
COUNT(CASE r.reporttype WHEN 3 THEN 1 END) AS type3Qty
FROM reports r
INNER JOIN users u ON r.userid = u.userid
GROUP BY u.username
If your server's SQL dialect requires the ELSE branch to be present in CASE expressions, add ELSE NULL before every END.
If you're looking for the "amountof report types per user", you'll be expecting to see a number, either 1, 2 or 3 (given that there are three different types of reports) against each user. You won't be expecting the reporttype (it'll just be counted not displayed), so you don't need reporttype in either the SELECT or the GROUP BY part of the query.
Instead, use COUNT(DISTINCT r.reporttype) to count the number of different reporttypes that are used by each user.
SELECT
COUNT(DISTINCT r.reporttype) as Amount
,u.username
FROM users u
INNER JOIN reports r
ON r.userid=u.userid
GROUP BY
u.username
ORDER BY u.username