Select count field of rows that are related to another table - sql

I'm struggling with this issue from a long time and don't know how to solve it. It's hard for me to describe, so please be patient. There are two tables:
Table "Users"
UserId PK
Gender
Table "Forms"
FormId PK
UserId1 FK
UserId2 FK
Type
Forms are always related to two users, but not all users have related forms. Now I want to count specified gender only of those users, who have related forms.
So as a result, I want to have sth. like this:
# | Gender | GenderCount
1 | male | 43
2 | female | 12
3 | trans | 2
I tried the following SQL-Script but the result isn't distinct (the sum of all GenderCount is greater then the actual number of users)
SELECT u.Gender AS 'Gender', COUNT(u.Gender) AS 'GenderCount'
FROM Users u, Forms f
WHERE ((f.UserId1 = u.UserId)
OR (f.UserId2 = u.UserId))
AND (Type = 'Foo')
GROUP BY Gender
ORDER BY GenderCount
DESC
Any tips to solve this?

Let's take a look at what you want:
How many of each gender answered any form?
Note: each user should only be counted once, no matter how many forms they've filled out.
Phrased like this, the answer becomes fairly obvious, at least in pseudo-code:
SELECT
u.Gender,
COUNT(u.Gender)
FROM
Users u
WHERE
[User has answered a form]
GROUP BY
u.Gender
The easiest way to determine if a user has answered a form depends on the specific flavour of SQL being used. You'll need to use a subquery. There are a couple of options for how to access it.
IN is the most common method:
SELECT
u.Gender Gender,
COUNT(u.Gender) GenderCount
FROM
Users u
WHERE
u.id IN (
SELECT f.UserId1 user_id FROM Forms f WHERE Type = 'Foo'
UNION
SELECT f.UserId2 user_id FROM Forms f WHERE Type = 'Foo'
)
GROUP BY
Gender
ORDER BY
GenderCount DESC
Where available, EXISTS is more natural to read, and is sometimes faster:
SELECT
u.Gender Gender,
COUNT(u.Gender) GenderCount
FROM
Users u
WHERE
EXISTS(
SELECT '1'
FROM Forms f
WHERE
(f.UserId1 = u.id OR f.UserId2 = u.id)
AND Type = 'Foo'
)
GROUP BY
Gender
ORDER BY
GenderCount DESC
Regarding speed: The query optimiser will often convert IN to EXISTS where possible, to avoid selecting extra rows unnecessarily. However, the use of multiple columns necessitates either an OR or a UNION, so it may be pretty even in this case. ie: neither OR nor UNION play nicely with indexes.

SELECT u1.Gender AS 'Gender', COUNT(*) AS 'GenderCount'
FROM
Users u1
INNER JOIN
(SELECT DISTINCT u.UserId
FROM
Users u
INNER JOIN Forms f ON ((f.UserId1 = u.UserId)
OR (f.UserId2 = u.UserId))
AND (f.Type = 'Foo')) T ON T.UserId = u1.UserId
GROUP BY Gender
ORDER BY GenderCount DESC

Skip the join which is generating multiple rows per user:
SELECT Gender, COUNT(Gender) AS 'GenderCount'
FROM Users
WHERE UserId IN (SELECT UserId1 FROM Forms WHERE Type = 'Foo'
UNION
SELECT UserId2 FROM Forms WHERE Type = 'Foo')
GROUP BY Gender
ORDER BY GenderCount DESC
Or if you prefer to avoid a UNION (which is perfectly valid in this scenario BTW) you can use OR like this:
SELECT Gender, COUNT(Gender) AS 'GenderCount'
FROM Users
WHERE UserId IN (SELECT UserId1 FROM Forms WHERE Type = 'Foo')
OR UserId IN (SELECT UserId2 FROM Forms WHERE Type = 'Foo')
GROUP BY Gender
ORDER BY GenderCount DESC
As others have pointed out, there are ways to do this using a JOIN as well. However, a JOIN adds needless complexity for the DBMS engine as it will first need to match up the rows, and then reduce to DISTINCT values.

You should use
count(distinct u.UserId)
that way users only get counted once: count(distinct field_name) counts the number of unique values contained in field_name, so counting distinct on the primary key gives you the number of unique users, which is what you're looking for.
Also, instead of joining, you probably would be better off using an in clause like this
select Gender, count(distinct UserId) as GenderCount
from Users
where u.UserId in (select UserId1 from Forms) or u.UserId in (select UserId2 from Forms)
It's probably also going to be slightly faster.

Related

Join 2 tables on foreign key while using count() in SQL

So I have two tables: Please see the ER diagram here
I want to use SELECT to create one table with "name" from the USER table, "id" as the foreign key for the two tables, and the count of friend_id as the number of friends each user has.
Here is my code:
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
GROUP BY user_id)
FROM user
ORDER BY number DESC
I'm wondering what's the problem with these lines. Thank you!
You can use a subquery to calculate the count.
SELECT name, id, COALESCE(f.Count, 0) AS friend_count
FROM user u
LEFT JOIN (
SELECT user_id, COUNT(DISTINCT friend_id) AS Count
FROM friend
GROUP BY user_id
) f ON f.user_id = u.id
ORDER BY friend_count DESC
I used a LEFT JOIN so that if a user doesn't have a row in friend, it will still return a row with a friend count of 0 (thanks to COALESCE). I also added a DISTINCT so that if the friend has duplicates the friend is counted only one, might not be necessary especially if you have a UNIQUE INDEX setup on columns user_id, friend_id
Just add where to find only one id and remove group by because you have only one id for one or more friends as your diagram says.
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
WHERE user_id = user.id)
FROM user
ORDER BY number DESC
I think this will be correct for you puprose
CREATE TABLE #user(
id VARCHAR(22),
[name] VARCHAR(255),
)
CREATE TABLE #friend(
user_id VARCHAR(22),
friend_id VARCHAR(22)
)
SELECT name, id, (SELECT COALESCE(COUNT(friend_id), 0)
FROM #friend f
WHERE f.user_id = u.id
GROUP BY user_id) as number
FROM #user u
ORDER BY number DESC
--Same query with join:
SELECT u.[name], u.id, COALESCE(COUNT(f.friend_id),0) number
FROM #user u
LEFT JOIN #friend f ON f.user_id = u.id
GROUP BY u.[name], u.id
ORDER BY number

H2 making one select from 2

I got 3 tables, Users, courses and course realation tables. I want to get users who aren't on specific course. So I figure I need somehow merge 2 selects with right join. How could I make one select from 2 selects?
SELECT ID, NAME, LASTNAME, ROLE FROM COURSERELATION JOIN USERS ON
ID_USER = ID WHERE ID_COURSE = ?
RIGTH JOIN
SELECT ID, NAME, LASTNAME, ROLE from COURSERELATION JOIN USERS ON
ID_USER = ID WHERE ID_COURSE != ?
You need to extract users for which it doesn't exist a record of that user for the specific course. You can filter the rows using a NOT EXISTS clause over a subquery.
Please try below query:
SELECT u.ID,
u.NAME,
u.LASTNAME,
u.ROLE
FROM USERS u
WHERE NOT EXISTS (SELECT 1
FROM COURSERELATION s
WHERE s.id_user = u.id
AND s.id_course = 'YOUR_COURSE_ID_HERE' )

How to write a Sql query for retrieving one row from duplicate rows?

i have a User table which has many users but some users are having same first name and Last Name but only one user will have status active . So my requirement is if the user is unique i need the user regardless of Status but if the user is duplicate i need the record having status active.
How can i achieve this in SQL server?
Sorry For the confusion here is the example of User table
my result table should be
Here Steve Jordan is having 2 records so i need the record having status 1
and for records having distinct First name and last name i need all the records regard less of status.
Note : I have a user id as primary key but i am joining on first name and last name because other table doesn't have user id.
SELECT UserId, FirstName, LastName, Status FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY FirstName, LastName
ORDER BY Status DESC) AS rowNum
FROM [User]
) u
WHERE u.rowNum = 1
This essentially groups by first and last name, orders by Status so that active are higher priority, and takes only one of each unique first/last name combination. This ensures that each each unique first/last name combination is in the result set only once, and if there are multiples, the active one is the one returned. If a name combination has multiples, but they are all not active, then only one is returned, chosen arbitrarily.
Ideally, you should have the User ID PK in both tables, as this is much stronger relationally.
EDIT: A bit more complex, but this should give you what you're looking for.
SELECT *
FROM YourOtherTable A
JOIN Users B
ON A.FirstName = B.FirstName AND A.FirstName = B.FirstName
LEFT JOIN
(
SELECT FirstName, LastName FROM User GROUP BY FirstName, LastName HAVING COUNT(FirstName) = 1
) C
ON B.FirstName = C.FirstName AND B.LastName = C.LastName
WHERE B.Status = 1 OR C.FirstName IS NOT NULL
I didn't get you question. But, as per your subject line. It seems like you want the record, which is active, if record is duplicate.
select T.* from yourTable T INNER JOIN (select user, count(*) cnt FROM yourTable GROUP BY user) A ON A.user=T.user
WHERE A.cnt>1 and T.status='A';
If it wasn't your requirement. Then, I would ask you to share your table structure and expected output to understand better.

Slow SQL view using several subqueries

There is probably a much better way to create these views. I have limited SQL experience so this is the way I designed it, I am hoping some of you SQL gurus can point me in a more efficient direction.
I essentially have 3 tables (sometimes 4) in my view, here is the essential structure:
Table USER
USER_ID | EMAIL | PASSWORD | CREATED_DATE
(Indexes: USER_ID)
Table USER_META
ID | USER_ID | NAME | VALUE
(Indexes: ID,USER_ID,NAME)
Table USER_SCORES
ID | USER_ID | GAME_ID | SCORE | CREATED_DATE
(Indexes: ID,USER_ID)
All the tables use the first ID column as an auto-increment primary key.
The second table "USER_META" is where I keep all the contact info and other misc. Primarily it is first_name,last_name, street,city, etc. - Depending on the user this could be 4 items or 140, which is why I use this table instead of having 150 columns in my USER table.
For reports, searching and editing I need about 20 values from USER_META, so I have views that look like this:
View V_USR_META
select USER_ID,EMAIL,
(select VALUE from USER_META
where NAME = 'FIRST_NAME' and USER_ID = u.USER_ID) as first_name,
(select VALUE from USER_META
where NAME = 'LAST_NAME' and USER_ID = u.USER_ID) as last_name,
(select VALUE from USER_META
where NAME = 'CITY' and USER_ID = u.USER_ID) as city,
(select VALUE from USER_META
where NAME = 'STATE' and USER_ID = u.USER_ID) as state,
(select VALUE from USER_META
where NAME = 'ZIP' and USER_ID = u.USER_ID) as zip,
/* 10 more selects for different meta values here */
(select max(SCORE) from USER_SCORES
where USER_ID = u.USER_ID) as high_score,
(select top (1) CREATED_DATE from USER_SCORES
where USER_ID = u.USER_ID
order by id desc) as last_game
from USER u
This get's pretty slow, and there are actually many more sub queries, this is just to illustrate the query. I also have to query a few other tables to get misc. info about the user.
I use the view when searching for a user, searches use name or userid or email or score, etc. I also use it to populate the user information screen when I present all the data in one place.
So - Is there a better way to write the view?
An alternative to all of those correlated subqueries would be to use max with case:
select u.USER_ID,
u.EMAIL,
max(case when um.name = 'FIRST_NAME' then um.value end) first_name,
max(case when um.name = 'LAST_NAME' then um.value end) last_name
...
from USER u
left join USER_META um
on u.user_id = um.user_id
group by u.user_id, u.email
Then you could add the user_scores results:
select u.USER_ID,
u.EMAIL,
max(case when um.name = 'FIRST_NAME' then um.value end) first_name,
max(case when um.name = 'LAST_NAME' then um.value end) last_name
...,
max(us.score) maxscore,
max(us.created_date) maxcreateddate
from USER u
left join USER_META um
on u.user_id = um.user_id
left join USER_SCORES us
on u.user_id = us.user_id
group by u.user_id, u.email
WITH Meta AS (
SELECT USER_ID
,FIRST_NAME
,LAST_NAME
,CITY
,STATE
,ZIP
FROM USER_META
PIVOT (
MAX(VALUE) FOR NAME IN (FIRST_NAME, LAST_NAME, CITY, STATE, ZIP)
) AS p
)
,MaxScores AS (
SELECT USER_ID
,MAX(SCORE) AS Score
FROM USER_SCORES
GROUP BY USER_ID
)
,LastGames AS (
SELECT USER_ID
,MAX(CREATED_DATE) AS GameDate
FROM USER_SCORES
GROUP BY USER_ID
)
SELECT USER.USER_ID
,USER.EMAIL
,Meta.FIRST_NAME
,Meta.LAST_NAME
,Meta.CITY
,Meta.STATE
,Meta.ZIP
,MaxScores.Score
,LastGames.GameDate
FROM USER
INNER JOIN Meta
ON USER.USER_ID = Meta.USER_ID
LEFT JOIN MaxScores
ON USER.USER_ID = MaxScores.USER_ID
LEFT JOIN LastGames
ON USER.USER_ID = LastGames.USER_ID

What's wrong with this MySQL query? SELECT * AS `x`, how to use x again later?

The following MySQL query:
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
where `sID` in (select `siteID` from `sites` where `foo` = "bar")
order by `timestamp` desc limit 100
…returns an error:
Unknown column 'sID' in 'IN/ALL/ANY subquery'
I don't understand what I'm doing wrong here. The sID thing is not supposed to be a column, but the 'alias' (what is this called?) I created by executing (select siteID from users where userID = uID) as sID. And it’s not even inside the IN subquery.
Any ideas?
Edit: #Roland: Thanks for your comment. I have three tables, actions, users and sites. The table actions contains a userID field, which corresponds to an entry in the users table. Every user in this table (users) has a siteID.
I'm trying to select the latest actions from the actions table, and link them to the users and sites table to find out who performed those actions, and on which site. Hope that makes sense :)
You either need to enclose it into a subquery:
SELECT *
FROM (
SELECT userID as uID, (select siteID from users where userID = actions.userID) as sID,
FROM actions
) q
WHERE sID IN (select siteID from sites where foo = "bar")
ORDER BY
timestamp DESC
LIMIT 100
, or, better, rewrite it as a JOIN
SELECT a.userId, u.siteID
FROM actions a
JOIN users u
ON u.userID = a.userID
WHERE siteID IN
(
SELECT siteID
FROM sites
WHERE foo = 'bar'
)
ORDER BY
timestamp DESC
LIMIT 100
Create the following indexes:
actions (timestamp)
users (userId)
sites (foo, siteID)
The column alias is not established until the query processor finishes the Select clause, and buiulds the first intermediate result set, so it can only be referenced in a group By, (since the group By clause operates on that intermediate result set) if you want ot use it this way, puit the alias inside the sub-query, then it will be in the resultset generated by the subquery, and therefore accessible to the outer query. To illustrate
(This is not the simplest way to do this query but it illustrates how to establish and use a column alias from a subquery)
select a.userID as uID, z.Sid
from actions a
Join (select userID, siteID as sid1 from users) Z,
On z.userID = a.userID
where Z.sID in (select siteID from sites where foo = "bar")
order by timestamp desc limit 100
Try the following:
SELECT
a.userID as uID
,u.siteID as sID
FROM
actions as a
INNER JOIN
users as u ON u.userID=a.userID
WHERE
u.siteID IN (SELECT siteID FROM sites WHERE foo = 'bar')
ORDER BY
a.timestamp DESC
LIMIT 100
I think the reason for the error is that the alias isn't available to the WHERE instruction, which is why we have HAVING.
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
HAVING `sID` in (select `siteID` from `sites` where `foo` = "bar")
order by `timestamp` desc limit 100
Though i also agree with the other answers that your query could be better structured.
Try the following
SELECT
a.userID as uID
,u.siteID as sID
FROM
actions as a
INNER JOIN
users as u ON u.userID = a.userID
INNER JOIN
sites as s ON u.siteID = s.siteID
WHERE
s.foo = 'bar'
ORDER BY
a.timestamp DESC
LIMIT 100
If you wish to use a field from the select section later you can try a subselect
SELECT One,
Two,
One + Two as Three
FROM (
SELECT 1 AS One,
2 as Two
) sub
I don't know whether this was not in the SQL standard 11 years ago, but I found it the easiest way to use HAVING:
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
order by `timestamp` desc limit 100
HAVING `sID` in (select `siteID` from `sites` where `foo` = "bar")