Merging queries with common group by parameters - sql

I have three SQL queries built, which act on a set of about eight tables. Each of the three has the same parameter in its group by clause. I'm trying to merge the three queries, to generate a single output table, though everything I try seems to break the working base queries.
I would like an output that looks like;
[heading] [query1 output column] [query2 output column] [query3 output column]
with the [heading] column being the existing parameter used in the group by clause for the three existing queries.
I realize I'm demonstrating my newbness here, though I'm stuck, and have gone around in the same erroneous circle way too many times....
EDIT *
I thought the code snip would be confusing, though here's a slightly abbreviated version;
SELECT
TITLE_COUNT.index,
COUNT (TITLE_COUNT.TTOTAL) AS TITLES,
COUNT (AUTHOR_COUNT.TTOTAL) AS AUTHOR
FROM(
SELECT
index, title, date,
COUNT (*) AS TTOTAL
FROM (SELECT DISTINCT index, title, date FROM TableP P) P
GROUP BY
index, title, date
ORDER BY
index
) AS TITLE_COUNT,
(
SELECT
index,
COUNT (*) AS TTOTAL
FROM (SELECT DISTINCT index, FROM TableM M) M
GROUP BY
index
ORDER BY
index
) AS AUTHOR_COUNT
WHERE
TITLE_COUNT.index = AUTHOR_COUNT.index
GROUP BY
TITLE_COUNT.index
;
My problem with the above is, the count columns in the output tables have been multiplied. For example, stand along the queries give me something like
TITLE
index count
001 12
002 10
003 15
AUTHOR
index count
001 2
002 4
003 6
Though my query above results in
001 24
002 40
003 90

Don't double count:
SELECT
TITLE_COUNT.index,
TITLE_COUNT.TTOTAL AS TITLES,
AUTHOR_COUNT.TTOTAL AS AUTHOR
FROM(
SELECT
index, title, date,
COUNT (*) AS TTOTAL
FROM (SELECT DISTINCT index, title, date FROM TableP P) P
GROUP BY
index, title, date
ORDER BY
index
) AS TITLE_COUNT
FULL OUTER JOIN (
SELECT
index,
COUNT (*) AS TTOTAL
FROM (SELECT DISTINCT index, FROM TableM M) M
GROUP BY
index
ORDER BY
index
) AS AUTHOR_COUNT
ON TITLE_COUNT.index = AUTHOR_COUNT.index
GROUP BY
TITLE_COUNT.index

Related

SQL - Count Results of 2 Columns

I have the following table which contains ID's and UserId's.
ID UserID
1111 11
1111 300
1111 51
1122 11
1122 22
1122 3333
1122 45
I'm trying to count the distinct number of 'IDs' so that I get a total, but I also need to get a total of ID's that have also seen the that particular ID as well... To get the ID's, I've had to perform a subquery within another table to get ID's, I then pass this into the main query... Now I just want the results to be displayed as follows.
So I get a Total No for ID and a Total Number for Users ID - Also would like to add another column to get average as well for each ID
TotalID Total_UserID Average
2 7 3.5
If Possible I would also like to get an average as well, but not sure how to calculate that. So I would need to count all the 'UserID's for an ID add them altogether and then find the AVG. (Any Advice on that caluclation would be appreciated.)
Current Query.
SELECT DISTINCT(a.ID)
,COUNT(b.UserID)
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
GROUP BY a.ID
Which then Lists all the IDs and COUNT's all the USERID.. I would like a total of both columns. I've tried warpping the query in a
SELECT COUNT(*) FROM (
but this only counts the ID's which is great, but how do I count the USERID column as well
You seem to want this:
SELECT COUNT(DISTINCT a.ID), COUNT(b.UserID),
COUNT(b.UserID) * 1.0 / COUNT(DISTINCT a.ID)
FROM a INNER JOIN
b
ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999);
Note: DISTINCT is not a function. It applies to the whole row, so it is misleading to put an expression in parentheses after it.
Also, the GROUP BY is unnecessary.
The 1.0 is because SQL Server does integer arithmetic and this is a simple way to convert a number to a decimal form.
You can use
SELECT COUNT(DISTINCT a.ID) ...
to count all distinct values
Read details here
I believe you want this:
select TotalID,
Total_UserID,
sum(Total_UserID+TotalID) as Total,
Total_UserID/TotalID as Average
from (
SELECT (DISTINCT a.ID) as TotalID
,COUNT(b.UserID) as Total_UserID
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
) x

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

Compare data from query result to different table data in PostgreSQL

STEP 1
Select data1,name,phone,address from dummyTable limit 4;
From above query, I will get the following result for example:
data1 | name | phone | address
fgh | hjk | 567...| CA
ghjkk | jkjii| 555...| NY
Now, after having the above result I am suppose to match data1 records that I got from above query to existing another table in a database called existingTable which has a same column called data1 in it. If the result above gives data1 value as 'fgh' so I take that 'fgh' and compare with that existingtable column called data1.
STEP 2
Next, after I am finished comparing, I need to apply some condition as follows:
if((results.data1.value).equals(existingTable.data1.value))
then count --
else
count++
So by above condition I am trying to explain, that if the value I got from the result is matched then I do count decrement by 1 and if not then count is incremented by 1.
Summary
I basically wanted to achieve this in one single query, is it possible using PostgreSQL?
I think you can translate that to a simple query:
SELECT d.data1, d.name, d.phone, d.address
, count(*) - 2 * count(e.data1)
FROM (
SELECT data1, name, phone, address
FROM dummytable
-- ORDER BY ???
LIMIT 4
) d
LEFT JOIN existingtable e USING (data1)
GROUP BY d.data1, d.name, d.phone, d.address;
The major ingredient is the LEFT [OUTER] JOIN. Follow the link to the manual.
count(*) counts all rows from dummytable.
count(e.data1) only counts rows from existingtable where a matching data1 exists (count() does not count NULL values). I subtract that twice to match your formula.
About ORDER BY: There is no natural order in a database table. You need to order by something to get predictable results.
If there can be duplicates in existingtable but you want to count every distinct data1 only once, eliminate dupes before you join or use an EXISTS semi-join:
SELECT data1, name, phone, address
, count(*) - 2 * count(EXISTS (
SELECT 1 FROM existingtable e
WHERE e.data1 = d.data1) OR NULL)
FROM (
SELECT data1, name, phone, address
FROM dummytable
-- ORDER BY ???
LIMIT 4
) d
GROUP BY data1, name, phone, address;
The last count works because (TRUE OR NULL) IS TRUE, but (FALSE OR NULL) IS NULL.

return count 0 with mysql group by

database table like this
============================
= suburb_id | value
= 1 | 2
= 1 | 3
= 2 | 4
= 3 | 5
query is
SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id
however, while I run this query, it doesn't give COUNT(suburb_id) = 0 when suburb_id = 0
because in suburbs table, there is no suburb_id 4, I want this query to return 0 for suburb_id = 4, like
============================
= total | suburb_id
= 2 | 1
= 1 | 2
= 1 | 3
= 0 | 4
A GROUP BY needs rows to work with, so if you have no rows for a certain category, you are not going to get the count. Think of the where clause as limiting down the source rows before they are grouped together. The where clause is not providing a list of categories to group by.
What you could do is write a query to select the categories (suburbs) then do the count in a subquery. (I'm not sure what MySQL's support for this is like)
Something like:
SELECT
s.suburb_id,
(select count(*) from suburb_data d where d.suburb_id = s.suburb_id) as total
FROM
suburb_table s
WHERE
s.suburb_id in (1,2,3,4)
(MSSQL, apologies)
This:
SELECT id, COUNT(suburb_id)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
LEFT JOIN
suburbs s
ON s.suburb_id = ids.id
GROUP BY
id
or this:
SELECT id,
(
SELECT COUNT(*)
FROM suburb
WHERE suburb_id = id
)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
This article compares performance of the two approaches:
Aggregates: subqueries vs. GROUP BY
, though it does not matter much in your case, as you are querying only 4 records.
Query:
select case
when total is null then 0
else total
end as total_with_zeroes,
suburb_id
from (SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id) as dt
#geofftnz's solution works great if all conditions are simple like in this case. But I just had to solve a similar problem to generate a report where each column in the report is a different query. When you need to combine results from several select statements, then something like this might work.
You may have to programmatically create this query. Using left joins allows the query to return rows even if there are no matches to suburb_id with a given id. If your db supports it (which most do), you can use IFNULL to replace null with 0:
select IFNULL(a.count,0), IFNULL(b.count,0), IFNULL(c.count,0), IFNULL(d.count,0)
from (select count(suburb_id) as count from suburbs where id=1 group by suburb_id) a,
left join (select count(suburb_id) as count from suburbs where id=2 group by suburb_id) b on a.suburb_id=b.suburb_id
left join (select count(suburb_id) as count from suburbs where id=3 group by suburb_id) c on a.suburb_id=c.suburb_id
left join (select count(suburb_id) as count from suburbs where id=4 group by suburb_id) d on a.suburb_id=d.suburb_id;
The nice thing about this is that (if needed) each "left join" can use slightly different (possibly fairly complex) query.
Disclaimer: for large data sets, this type of query might have not perform very well (I don't write enough sql to know without investigating further), but at least it should give useful results ;-)

Select values in SQL that do not have other corresponding values except those that i search for

I have a table in my database:
Name | Element
1 2
1 3
4 2
4 3
4 5
I need to make a query that for a number of arguments will select the value of Name that has on the right side these and only these values.
E.g.:
arguments are 2 and 3, the query should return only 1 and not 4 (because 4 also has 5). For arguments 2,3,5 it should return 4.
My query looks like this:
SELECT name FROM aggregations WHERE (element=2 and name in (select name from aggregations where element=3))
What do i have to add to this query to make it not return 4?
A simple way to do it:
SELECT name
FROM aggregations
WHERE element IN (2,3)
GROUP BY name
HAVING COUNT(element) = 2
If you want to add more, you'll need to change both the IN (2,3) part and the HAVING part:
SELECT name
FROM aggregations
WHERE element IN (2,3,5)
GROUP BY name
HAVING COUNT(element) = 3
A more robust way would be to check for everything that isn't not in your set:
SELECT name
FROM aggregations
WHERE NOT EXISTS (
SELECT DISTINCT a.element
FROM aggregations a
WHERE a.element NOT IN (2,3,5)
AND a.name = aggregations.name
)
GROUP BY name
HAVING COUNT(element) = 3
It's not very efficient, though.
Create a temporary table, fill it with your values and query like this:
SELECT name
FROM (
SELECT DISTINCT name
FROM aggregations
) n
WHERE NOT EXISTS
(
SELECT 1
FROM (
SELECT element
FROM aggregations aii
WHERE aii.name = n.name
) ai
FULL OUTER JOIN
temptable tt
ON tt.element = ai.element
WHERE ai.element IS NULL OR tt.element IS NULL
)
This is more efficient than using COUNT(*), since it will stop checking a name as soon as it finds the first row that doesn't have a match (either in aggregations or in temptable)
This isn't tested, but usually I would do this with a query in my where clause for a small amount of data. Note that this is not efficient for large record counts.
SELECT ag1.Name FROM aggregations ag1
WHERE ag1.Element IN (2,3)
AND 0 = (select COUNT(ag2.Name)
FROM aggregatsions ag2
WHERE ag1.Name = ag2.Name
AND ag2.Element NOT IN (2,3)
)
GROUP BY ag1.name;
This says "Give me all of the names that have the elements I want, but have no records with elements I don't want"