`INTERSECT` does not return anything from two tables, separately values are returned fine - sql

I'm not sure what I am doing wrong here since I didn't touch SQL queries for several years plus MSSQL query language is a bit strange to me but after 30 minutes of googling I still cannot find the answer.
Problem
I have two queries that work perfectly fine:
SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts
SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
I need to get this information in one go in my API response since I don't want to execute two statements. How can I combine them into one query so it will return table as follows:
+------------------+---------------+
| NumberOfAccounts | NumberOfUsers |
+------------------+---------------+
| 10 | 16 |
+------------------+---------------+
What I have tried
UNION SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts UNION SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
This is giving me the result of both tables, however it all pushes it into NumberOfAccounts and the result is invalid for me to parse.
+------------------+
| NumberOfAccounts |
+------------------+
| 10 |
| 16 |
+------------------+
INTRSECT SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts INTERSECT SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
This just gives me empty result with only NumberOfAccounts column in it.

You can just put these as subqueries in a select:
SELECT (SELECT COUNT(*) FROM Accounts) as NumberOfAccounts,
(SELECT COUNT(*) FROM Users) as NumberOfUsers
In SQL Server, no FROM clause is needed.

UNION is the wrong usage here. Union will "merge" rows of identical tables (or identical selects) and not columns.
One solution might be:
SELECT AccountCount, UserCount FROM
(SELECT COUNT(*) AS AccountCount, 1 AS Id FROM Accounts) AS a
JOIN
(SELECT COUNT(*) AS UserCount, 1 as Id FROM Users) AS u ON (a.Id = u.Id)
Be aware of the artificial surrogate key 1 you need to insert to join both sub-selects together.

For completeness sake; with UNION ALL you'd do:
SELECT 'NumberOfAccounts' AS what, COUNT(*) AS howmany FROM accounts
UNION ALL
SELECT 'NumberOfUsers' AS what, COUNT(*) AS howmany FROM users;
which results in
+------------------+---------+
| what | howmany |
+------------------+---------+
| NumberOfAccounts | 10 |
| NumberOfUsers | 16 |
+------------------+---------+

And another variation:
WITH cte AS
(
SELECT COUNT(*) AS cntAccounts, 0 AS cntUsers FROM accounts
UNION ALL
SELECT 0 AS cntAccounts, COUNT(*) AS cntUsers FROM users
)
SELECT
SUM(cntAccounts) AS NumberOfAccounts
,SUM(cntUsers ) AS NumberOfUsers
FROM cte

If you want (need) better performance you can get the row counts from the following query which uses sys.dm_db_partition_stats to get the row counts:
SELECT (
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Accounts')
AND (index_id=0 or index_id=1)) NumberOfAccounts,
(
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Users')
AND (index_id=0 or index_id=1)) NumberOfUsers

Related

Split record into 2 records with distinct values based on a unique id

I have a table with some IDs that correspond to duplicate data that i would like to get rid of. They are linked by a groupid number. Currently my data looks like this:
|GroupID|NID1 |NID2 |
|S1 |644763|643257|
|T2 |4759 |84689 |
|W3 |96676 |585876|
In order for the software to run, I need the data in the following format:
|GroupID|NID |
|S1 |644763|
|S1 |643257|
|T2 |4759 |
|T2 |84689 |
|W3 |96676 |
|W3 |585876|
Thank you for your time.
You want union all :
select groupid, nid1 as nid
from table t
union all -- use "union" instead if you don't want duplicate rows
select groupid, nid2
from table t;
In Oracle 12C+, you can use lateral joins:
select t.groupid, v.nid
from t cross apply
(select t.nid1 as nid from dual union all
select t.nid2 as nid from dual
) v;
This is more efficient than union all because it only scans the table once.
You can also express this as:
select t.groupid,
(case when n.n = 1 then t.nid1 when n.n = 2 then t.nid2 end) as nid
from t cross join
(select 1 as n from dual union all select 2 from dual) n;
A little more complicated, but still only one scan of the table.

SQL Sort table by number of items in common

I have 3 tables, user, artist and a join table.
I'd like to find for a particular user, the ordering of the rest of the user table by the number of artists they have in common in the join table, or potentially just the n other users who are have the most in common with them.
For example in the table:
userID | artistID
-------|----------
1 | 1
1 | 2
2 | 1
2 | 2
3 | 1
I'd want to get that the ordering for user1 would be (2,3) because user2 shares both artist1 and artist2 with user1, whereas user3 only shares artist1.
Is there a way to get this from SQL?
Thanks
Assuming that you always know the user ID you want to check agaist, you can also do the following:
SELECT user, count(*) as in_common
FROM user_artist
WHERE
user<>1 AND
artist IN (SELECT artist FROM user_artist WHERE user=1)
GROUP BY user
ORDER BY in_common DESC;
This avoids joining which might have better performance on a large table. Your example is sqlfiddle here
You can do this with a self-join and aggregation:
select ua.userID, count(ua1.artistID) as numInCommonWithUser1
from userartists ua left join
userartists ua1
on ua.artistID = ua1.artistID and ua1.userID = 1
group by ua.userID
order by numInCommonWithUser1 desc;
If Suppose you know the user ID you are going to check then this query will complete your requirement and also perform very well.
SELECT ua1.user, count(*) as all_Common
FROM user_artist ua1
WHERE
(
Select count(*)
From user_artist ua2
Where ua2.user=1
AND ua2.artist=ua1.artist
)>0
AND ua1.user = 1
GROUP BY ua1.user
ORDER BY ua1.all_Common DESC;
Let me know if any question!

Slightly different greatest-n-per-group

I have read this comment which explains the greatest-n-per-group problem and its solution. Unfortunately, I am facing a slightly different approach, and I am failing to find a solution for it.
Let's suppose I have a table with some basic info regarding users. Due to implementation, this info may or may not repeat itself:
+----+-------------------+----------------+---------------+
| id | user_name | user_name_hash | address |
+----+-------------------+----------------+---------------+
| 1 | peter_jhones | 0xFF321345 | Some Av |
| 2 | sally_whiterspoon | 0x98AB5454 | Certain St |
| 3 | mark_jackobson | 0x0102AB32 | Some Av |
| 4 | mark_jackobson | 0x0102AB32 | Particular St |
+----+-------------------+----------------+---------------+
As you can see, mark_jackobson appears twice, although its address is different in each appearance.
Every now and then, an ETL process queries new user_names and fetches the most recent records of each. Aftewards, it stores the user_name_hash in a table to sign it has already imported that certain user_name
+----------------+
| user_name_hash |
+----------------+
| 0xFF321345 |
| 0x98AB5454 |
+----------------+
Everything begins with the following query:
SELECT DISTINCT user_name_hash
FROM my_table
EXCEPT
SELECT user_name_hash
FROM my_hash_table
This way, I am able to select the new hashes from my table. Since I need to query the most recent occurrence of a hash, I wrap it as a sub-query:
SELECT MAX(id)
FROM my_table
WHERE user_name_hash IN (
SELECT DISTINCT user_name_hash
FROM my_table
EXCEPT
SELECT user_name_hash
FROM my_hash_table)
GROUP BY user_name_hash
Perfect! With the ids of my new users, I can query the addresses as follows:
SELECT
address,
user_name_hash
FROM my_table
WHERE Id IN (
SELECT MAX(id)
FROM my_table
WHERE user_name_hash IN (
SELECT DISTINCT user_name_hash
FROM my_table
EXCEPT
SELECT user_name_hash
FROM my_hash_table)
GROUP BY user_name_hash)
From my perspective, the above query works, but it does not seem optimal. Reading this comment, I noticed I could query the same data, using joins. Since I am failing to write the desired query, could anyone help me out and point me to a direction?
This is the query I have attempted, without success.
SELECT
tb1.address,
tb1.user_name_hash
FROM my_table tb1
INNER JOIN my_table tb2
ON tb1.user_name_hash = tb2.user_name_hash
LEFT JOIN my_hash_table ht
ON tb1.user_name_hash = ht.user_name_hash AND tb1.id > tb2.id
WHERE ht.user_name_hash IS NULL;
Thanks in advance.
EDIT > I am working with PostgreSQL
I believe you are looking for something like this:
SELECT
address,
user_name_hash
FROM my_table t1
JOIN (
SELECT MAX(id) maxid
FROM my_table t2
WHERE NOT EXISTS (
SELECT 1
FROM my_hash_table t3
WHERE t2.user_name_hash = t3.user_name_hash
)
GROUP BY user_name_hash
) t ON t1.ID = t.maxid
I'm using NOT EXISTS instead of EXCEPT since it is more clear to the optimizer.
You can get a better performance using a left outer join (to get the newest records not already imported) and then compute the max id for these records (subquery in the HAVING clause).
SELECT t1.address,
t1.user_name_hash,
MAX(id) AS maxid
FROM my_table t1
LEFT JOIN my_hash_table th ON t1.user_name_hash = th.user_name_hash
WHERE th.user_name_hash IS NULL
GROUP BY t1.address,
t1.user_name_hash
HAVING MAX(id) = (SELECT MAX(id)
FROM my_table t1)

How can you get a histogram of counts from a join table without using a subquery?

I have a lot of tables that look like this: (id, user_id, object_id). I am often interested in the question "how many users have one object? how many have two? etc." and would like to see the distribution.
The obvious answer to this looks like:
select x.ucount, count(*)
from (select count(*) as ucount from objects_users group by user_id) as x
group by x.ucount
order by x.ucount;
This produces results like:
ucount | count
-------|-------
1 | 15
2 | 17
3 | 23
4 | 104
5 | 76
7 | 12
Using a subquery here feels inelegant to me and I'd like to figure out how to produce the same result without. Further, if the question you're trying to ask is slightly more complicated it gets messy passing more information out of the subquery. For example, if you want the data further grouped by the user's creation date:
select
x.ucount,
(select cdate from users where id = x.user_id) as cdate,
count(*)
from (
select user_id, count(*) as ucount
from objects_users group by user_id
) as x
group by cdate, x.ucount,
order by cdate, x.ucount;
Is there some way to avoid the explosion of subqueries? I suppose in the end my objection is aesthetic, but it makes the queries hard to read and hard to write.
I think a subquery is exactly the appropriate way to do this, regardless of your RDBMS. Why would it be inelegant?
For the second query, just join the users table like this:
SELECT
x.ucount,
u.cdate,
COUNT(*)
FROM (
SELECT
user_id,
COUNT(*) AS ucount
FROM objects_users
GROUP BY user_id
) AS x
LEFT JOIN users AS u
ON x.user_id = u.id
GROUP BY u.cdate, x.ucount
ORDER BY u.cdate, x.ucount

return count 0 with mysql group by

database table like this
============================
= suburb_id | value
= 1 | 2
= 1 | 3
= 2 | 4
= 3 | 5
query is
SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id
however, while I run this query, it doesn't give COUNT(suburb_id) = 0 when suburb_id = 0
because in suburbs table, there is no suburb_id 4, I want this query to return 0 for suburb_id = 4, like
============================
= total | suburb_id
= 2 | 1
= 1 | 2
= 1 | 3
= 0 | 4
A GROUP BY needs rows to work with, so if you have no rows for a certain category, you are not going to get the count. Think of the where clause as limiting down the source rows before they are grouped together. The where clause is not providing a list of categories to group by.
What you could do is write a query to select the categories (suburbs) then do the count in a subquery. (I'm not sure what MySQL's support for this is like)
Something like:
SELECT
s.suburb_id,
(select count(*) from suburb_data d where d.suburb_id = s.suburb_id) as total
FROM
suburb_table s
WHERE
s.suburb_id in (1,2,3,4)
(MSSQL, apologies)
This:
SELECT id, COUNT(suburb_id)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
LEFT JOIN
suburbs s
ON s.suburb_id = ids.id
GROUP BY
id
or this:
SELECT id,
(
SELECT COUNT(*)
FROM suburb
WHERE suburb_id = id
)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
This article compares performance of the two approaches:
Aggregates: subqueries vs. GROUP BY
, though it does not matter much in your case, as you are querying only 4 records.
Query:
select case
when total is null then 0
else total
end as total_with_zeroes,
suburb_id
from (SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id) as dt
#geofftnz's solution works great if all conditions are simple like in this case. But I just had to solve a similar problem to generate a report where each column in the report is a different query. When you need to combine results from several select statements, then something like this might work.
You may have to programmatically create this query. Using left joins allows the query to return rows even if there are no matches to suburb_id with a given id. If your db supports it (which most do), you can use IFNULL to replace null with 0:
select IFNULL(a.count,0), IFNULL(b.count,0), IFNULL(c.count,0), IFNULL(d.count,0)
from (select count(suburb_id) as count from suburbs where id=1 group by suburb_id) a,
left join (select count(suburb_id) as count from suburbs where id=2 group by suburb_id) b on a.suburb_id=b.suburb_id
left join (select count(suburb_id) as count from suburbs where id=3 group by suburb_id) c on a.suburb_id=c.suburb_id
left join (select count(suburb_id) as count from suburbs where id=4 group by suburb_id) d on a.suburb_id=d.suburb_id;
The nice thing about this is that (if needed) each "left join" can use slightly different (possibly fairly complex) query.
Disclaimer: for large data sets, this type of query might have not perform very well (I don't write enough sql to know without investigating further), but at least it should give useful results ;-)