query with count subquery, inner join and group - sql

I'm definitely a noob with SQL, I've been busting my head to write a complex query with the following table structure in Postgresql:
CREATE TABLE reports
(
reportid character varying(20) NOT NULL,
userid integer NOT NULL,
reporttype character varying(40) NOT NULL,
)
CREATE TABLE users
(
userid serial NOT NULL,
username character varying(20) NOT NULL,
)
The objective of the query is to fetch the amount of report types per user and display it in one column. There are three different types of reports.
A simple query with group-by will solve the problem but display it in different rows:
select count(*) as Amount,
u.username,
r.reporttype
from reports r,
users u
where r.userid=u.userid
group by u.username,r.reporttype
order by u.username

SELECT
username,
(
SELECT
COUNT(*)
FROM reports
WHERE users.userid = reports.userid && reports.reporttype = 'Type1'
) As Type1,
(
SELECT
COUNT(*)
FROM reports
WHERE users.userid = reports.userid && reports.reporttype = 'Type2'
) As Type2,
(
SELECT
COUNT(*)
FROM reports
WHERE users.userid = reports.userid && reports.reporttype = 'Type3'
) As Type3
FROM
users
WHERE
EXISTS(
SELECT
NULL
FROM
reports
WHERE
users.userid = reports.userid
)

SELECT
u.username,
COUNT(CASE r.reporttype WHEN 1 THEN 1 END) AS type1Qty,
COUNT(CASE r.reporttype WHEN 2 THEN 1 END) AS type2Qty,
COUNT(CASE r.reporttype WHEN 3 THEN 1 END) AS type3Qty
FROM reports r
INNER JOIN users u ON r.userid = u.userid
GROUP BY u.username
If your server's SQL dialect requires the ELSE branch to be present in CASE expressions, add ELSE NULL before every END.

If you're looking for the "amountof report types per user", you'll be expecting to see a number, either 1, 2 or 3 (given that there are three different types of reports) against each user. You won't be expecting the reporttype (it'll just be counted not displayed), so you don't need reporttype in either the SELECT or the GROUP BY part of the query.
Instead, use COUNT(DISTINCT r.reporttype) to count the number of different reporttypes that are used by each user.
SELECT
COUNT(DISTINCT r.reporttype) as Amount
,u.username
FROM users u
INNER JOIN reports r
ON r.userid=u.userid
GROUP BY
u.username
ORDER BY u.username

Related

How to sum up max values from another table with some filtering

I have 3 tables
User Table
id
Name
1
Mike
2
Sam
Score Table
id
UserId
CourseId
Score
1
1
1
5
2
1
1
10
3
1
2
5
Course Table
id
Name
1
Course 1
2
Course 2
What I'm trying to return is rows for each user to display user id and user name along with the sum of the maximum score per course for that user
In the example tables the output I'd like to see is
Result
User_Id
User_Name
Total_Score
1
Mike
15
2
Sam
0
The SQL I've tried so far is:
select TOP(3) u.Id as User_Id, u.UserName as User_Name, SUM(maxScores) as Total_Score
from Users as u,
(select MAX(s.Score) as maxScores
from Scores as s
inner join Courses as c
on s.CourseId = c.Id
group by s.UserId, c.Id
) x
group by u.Id, u.UserName
I want to use a having clause to link the Users to Scores after the group by in the sub query but I get a exception saying:
The multi-part identifier "u.Id" could not be bound
It works if I hard code a user id in the having clause I want to add but it needs to be dynamic and I'm stuck on how to do this
What would be the correct way to structure the query?
You were close, you just needed to return s.UserId from the sub-query and correctly join the sub-query to your Users table (I've joined in reverse order to you because to me its more logical to start with the base data and then join on more details as required). Taking note of the scope of aliases i.e. aliases inside your sub-query are not available in your outer query.
select u.Id as [User_Id], u.UserName as [User_Name]
, sum(maxScore) as Total_Score
from (
select s.UserId, max(s.Score) as maxScore
from Scores as s
inner join Courses as c on s.CourseId = c.Id
group by s.UserId, c.Id
) as x
inner join Users as u on u.Id = x.UserId
group by u.Id, u.UserName;

SQL: Removing entire group when case is something in GROUP BY

I am using postgreSQL, I have two tables, one is user, and one is usertasks.
user has following fields : userid, username
usertasks has following fields: taskid, taskdate, userid
userid and id are primary keys on above tables
I want to find all users who have not made any tasks between two specific dates.
My query is this:
Select u.userid
from users u
left join usertasks ut
on ut.userid=u.userid
group by u.userid
having count(case when ut.task_date >= '2015-7-1' AND ut.task_date <= '2015-10-6' then null else 1 end)>=1
Problem:
Some users have done tasks between those dates and have also done tasks excluding those two dates, so those users are returned as well, I don't want those users, because they have done tasks between those dates, I am interested only those users who have not done any tasks between those two dates.
It would work if entire group can be skipped if any of the tasks in that group are done between those two dates.
So basically, I want a query equivalent to this:
(Select u.userid
from users u)
except
(Select u.userid
from users u
left join usertasks ut
on ut.userid=u.userid
where ut.task_date >= '2015-7-1' AND ut.task_date <= '2015-10-6'
group by u.userid)
The above query returns right result, but its slower, thats why I can't use except or NOT IN.
SAMPLE DATA:
Create tables:
create table users( userid serial PRIMARY KEY, username VARCHAR(20) );
create table usertasks(taskid serial PRIMARY KEY, userid BIGINT, taskdate DATE );
Insert data:
INSERT INTO users (username) VALUES
('a'),
('b'),
('c'),
('d'),
('e'),
('f');
INSERT INTO usertasks (userid,taskdate) VALUES
(1,'2015-11-5'),
(1,'2015-10-1'),
(2,'2015-9-2'),
(3,'2015-9-2'),
(4,'2015-9-2'),
(5,'2015-9-2'),
(6,'2015-11-2');
according to the query, none of the usertasks with userid 1 should return
I would be inclined to user not exists:
select u.*
from users u
where not exists (select 1
from usertasks ut
where ut.userid = u.userid and
ut.task_date >= '2015-07-01' and
ut.booking_date <= '2015-10-06'
);
This seems like a direct translation of your requirement.
That said, this query:
Select u.userid
from users u left join
usertasks ut
on ut.userid = u.userid
group by u.userid
having sum(case when ut.task_date >= '2015-7-1' AND ut.booking_date <= '2015-10-6' then 1 end) = 0;
Should return the same result. I think the problem with your query is the having clause:
having count(case when ut.task_date >= '2015-7-1' AND ut.booking_date <= '2015-10-6' then null else 1 end) >= 1
Because of the left join, users with no tasks will end up with a count value of 1. Why? Because the ut fields will be NULL, which fails the comparison. Hence the else clause is used. The SUM() version doesn't have this short-coming.

Alias table that's actually a pivoted one?

I have this query:
SELECT
A.USERID
A.NAME
PVT.PHONE 'PROBABLY A CASE STATEMENT ON NULL WILL GO HERE...
PVT.ADDRESS 'ON HERE AS WELL...
FROM
USERS A
'I NEED TO CREATE A PIVOT TABLE HERE WITH THE ALIAS OF 'PVT' ON TABLE 'B'
B Contents:
UserID PHONE ADDRESS TYPE
1 444-555-2222 XXXXXXX PHONE
1 XXXXXXX 66 Nowhere NOTADDRESS
I want, on the same row, the user's phone by getting B.PHONE if TYPE = 'PHONE'.
I also want, on the same row, the user's address by getting B.ADDRESS content if TYPE = 'ADDRESS'.
As you see in the table dump above, I don't have a record matching the user ID AND TYPE = 'ADDRESS'
So I would need to show a blank or 'No address' in the main SELECT which will show the phone, but on the same row, blank or 'No address'.
I don't want to create an INNER JOIN because if there are no matching UserID's in B, the query will not return the info that I have in table A for that user.
Also, a LEFT JOIN will create two rows, which I don't want.
I think I pivoted table as alias would do it, but I don't know how to create such an alias.
Any ideas ?
How about using conditional aggregation?
SELECT A.USERID, A.NAME
B.PHONE, B.ADDRESS
FROM USERS A LEFT JOIN
(SELECT UserId, MAX(CASE WHEN TYPE = 'PHONE' THEN PHONE END) as PHONE,
MAX(CASE WHEN TYPE = 'ADDRESS' THEN ADDRESS END) as ADDRESS
FROM B
GROUP BY UserId
) B
ON B.UserId = A.UserId;
If you have to use PIVOT then you'd need the pivot in a subquery and left join to it
SELECT
A.USERID,
A.NAME,
PVT.PHONE,
PVT.[ADDRESS]
FROM
Users A
LEFT JOIN (SELECT *
FROM
(SELECT
UserID,
[Type],
(CASE [Type] WHEN 'PHONE' THEN PHONE WHEN 'ADDRESS' THEN [Address] END) Info
FROM UserInfo) AS UI
PIVOT (
MAX(Info)
FOR [Type] IN ([PHONE], [ADDRESS])
) P
) PVT ON A.UserID = PVT.UserID
This gives you pretty much the same execution plan as the conditional aggregation query, but not as easy on the eyes.
SQL Fiddle

SQL find records with duplicate email and date of birth

I'm trying to write a query to find any duplicate records in my database. I want to find all records (not the count) where the EmailAddress AND DateofBirth (both columns) already exist on another record.
Account tbl contains the EmailAddress.
User tbl contains the DateOfBirth
Join on AccountID
The following query selects records where the EmailAddress exists in another record OR the DateOfBirth exists in another record, but I'm unable to combine the two conditions. If I'm correct so far, the 'and' on line 7 acts more like an 'or' in my case..?
select a.AccountName, a.EmailAddress, u.DateOfBirth from Account as a
join [User] as u
on a.AccountID = u.AccountID
where a.EmailAddress in (
select EmailAddress from Account group by EmailAddress having count(*) > 1
)
and
DateOfBirth in(
select DateOfBirth from [User] group by DateOfBirth having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress
For example, this may produce 50 records. If I look through them, I find 5 records all with the matching EmailAddress, however only 4 of them have the same DateOfBirth. The 5th record is displaying due to another record in the database with the same DateOfBirth but different EmailAddress.
I'd like to find only those records who have both the matching email and dob.
Thanks as always, please ask if you require a further description.
Regards
Json
Using your approach, you can use exists:
select a.AccountName, a.EmailAddress, u.DateOfBirth
from Account as a join
[User] as u
on a.AccountID = u.AccountID
where exists (select EmailAddress
from Account a2 join
[User] u2
on a.AccountID = u.AccountID
where a2.EmailAddress = a.EmailAddress and
u2.DateOfBirth = u.DateOfBirth
group by EmailAddress
having count(*) > 1
)
order by u.DateOfBirth, a.EmailAddress;
A better way is to use window/analytic functions:
select AccountName, EmailAddress, DateOfBirth
from (select a.AccountName, a.EmailAddress, u.DateOfBirth,
count(*) over (partition by a.EmailAddress, u.DateOfBirth) as cnt
from Account as a join
[User] as u
on a.AccountID = u.AccountID
) ua
where cnt > 1
order by DateOfBirth, EmailAddress;
Join the two tables on the account id.
Group by email and date
Show only those entries which have count(*) > 1 (using the HAVING expression).
In MySQL (I have no MS SQL server available at the moment), this can be done with:
SELECT * FROM a JOIN b ON a.account = b.account
GROUP BY email, birth
HAVING count(*) > 1;
Where I used the following commands to setup the tables a and b:
create table a (
account int primary key auto_increment,
email text
);
create table b (
account int,
birth date,
constraint foreign key (account) references a (account)
);
insert into a (email) values ("email1"), ("email1"), ("email2"), ("email2");
insert into b values (1, "2000-01-01"), (2, "2000-01-01"), (3, "2000-01-01"), (4, "2000-01-02");

Slow SQL view using several subqueries

There is probably a much better way to create these views. I have limited SQL experience so this is the way I designed it, I am hoping some of you SQL gurus can point me in a more efficient direction.
I essentially have 3 tables (sometimes 4) in my view, here is the essential structure:
Table USER
USER_ID | EMAIL | PASSWORD | CREATED_DATE
(Indexes: USER_ID)
Table USER_META
ID | USER_ID | NAME | VALUE
(Indexes: ID,USER_ID,NAME)
Table USER_SCORES
ID | USER_ID | GAME_ID | SCORE | CREATED_DATE
(Indexes: ID,USER_ID)
All the tables use the first ID column as an auto-increment primary key.
The second table "USER_META" is where I keep all the contact info and other misc. Primarily it is first_name,last_name, street,city, etc. - Depending on the user this could be 4 items or 140, which is why I use this table instead of having 150 columns in my USER table.
For reports, searching and editing I need about 20 values from USER_META, so I have views that look like this:
View V_USR_META
select USER_ID,EMAIL,
(select VALUE from USER_META
where NAME = 'FIRST_NAME' and USER_ID = u.USER_ID) as first_name,
(select VALUE from USER_META
where NAME = 'LAST_NAME' and USER_ID = u.USER_ID) as last_name,
(select VALUE from USER_META
where NAME = 'CITY' and USER_ID = u.USER_ID) as city,
(select VALUE from USER_META
where NAME = 'STATE' and USER_ID = u.USER_ID) as state,
(select VALUE from USER_META
where NAME = 'ZIP' and USER_ID = u.USER_ID) as zip,
/* 10 more selects for different meta values here */
(select max(SCORE) from USER_SCORES
where USER_ID = u.USER_ID) as high_score,
(select top (1) CREATED_DATE from USER_SCORES
where USER_ID = u.USER_ID
order by id desc) as last_game
from USER u
This get's pretty slow, and there are actually many more sub queries, this is just to illustrate the query. I also have to query a few other tables to get misc. info about the user.
I use the view when searching for a user, searches use name or userid or email or score, etc. I also use it to populate the user information screen when I present all the data in one place.
So - Is there a better way to write the view?
An alternative to all of those correlated subqueries would be to use max with case:
select u.USER_ID,
u.EMAIL,
max(case when um.name = 'FIRST_NAME' then um.value end) first_name,
max(case when um.name = 'LAST_NAME' then um.value end) last_name
...
from USER u
left join USER_META um
on u.user_id = um.user_id
group by u.user_id, u.email
Then you could add the user_scores results:
select u.USER_ID,
u.EMAIL,
max(case when um.name = 'FIRST_NAME' then um.value end) first_name,
max(case when um.name = 'LAST_NAME' then um.value end) last_name
...,
max(us.score) maxscore,
max(us.created_date) maxcreateddate
from USER u
left join USER_META um
on u.user_id = um.user_id
left join USER_SCORES us
on u.user_id = us.user_id
group by u.user_id, u.email
WITH Meta AS (
SELECT USER_ID
,FIRST_NAME
,LAST_NAME
,CITY
,STATE
,ZIP
FROM USER_META
PIVOT (
MAX(VALUE) FOR NAME IN (FIRST_NAME, LAST_NAME, CITY, STATE, ZIP)
) AS p
)
,MaxScores AS (
SELECT USER_ID
,MAX(SCORE) AS Score
FROM USER_SCORES
GROUP BY USER_ID
)
,LastGames AS (
SELECT USER_ID
,MAX(CREATED_DATE) AS GameDate
FROM USER_SCORES
GROUP BY USER_ID
)
SELECT USER.USER_ID
,USER.EMAIL
,Meta.FIRST_NAME
,Meta.LAST_NAME
,Meta.CITY
,Meta.STATE
,Meta.ZIP
,MaxScores.Score
,LastGames.GameDate
FROM USER
INNER JOIN Meta
ON USER.USER_ID = Meta.USER_ID
LEFT JOIN MaxScores
ON USER.USER_ID = MaxScores.USER_ID
LEFT JOIN LastGames
ON USER.USER_ID = LastGames.USER_ID