How can I write this SQL in a better way? - sql

This is the query and I'm trying to write it in a better way.
Calculate the average number of languages in every country in a region.
CREATE TABLE region3 AS SELECT regions.name, count(country_languages.country_id) FROM regions
RIGHT OUTER JOIN countries on countries.region_id = regions.region_id
RIGHT OUTER JOIN country_languages on countries.country_id = country_languages.country_id
GROUP BY regions.name;
CREATE TABLE region2 AS SELECT regions.name, count(countries.country_id) FROM regions RIGHT OUTER JOIN countries on countries.region_id = regions.region_id GROUP BY regions.name;
SELECT region2.name, region2.count as total_countries, region3.count as langs from region2
LEFT OUTER JOIN region3 on region2.name = region3.name;
SELECT name, ROUND(langs::decimal/total_countries, 1) as avg_lang_count_per_country from regions_new ORDER BY avg_lang_count_per_country DESC;
This is how it should look.

I guess it is as simple as:
SELECT regions.name, AVG(country_language_count) AS average_country_language_count
FROM regions
JOIN (
SELECT countries.region_id, COUNT(*) AS country_language_count
FROM countries
JOIN country_languages on countries.country_id = country_languages.country_id
GROUP BY countries.country_id, countries.region_id
) AS subquery1 ON regions.region_id = subquery1.region_id
GROUP BY regions.name

Related

How to Include Zero in a COUNT() Aggregate?

I have three tables, and I join them and use where Group by - count, I could not get the countries with zero results in the output. I am still lost.
Here is the SQLfiddle
http://sqlfiddle.com/#!4/e330ec/7
CURRENT OUTPUT
(UKD) 3
(EUR) 2
(USA) 2
(CHE) 1
EXPECTED OUTPUT
(UKD) 3
(EUR) 2
(IND) 0
(LAO) 0
(USA) 2
(CHE) 1
You can use a RIGHT JOIN as suggested in another answer or you can reorder your joins and use a LEFT JOIN:
SELECT
C.COUNTRY_CODE,
COUNT(GAME_TYPE)
FROM
COUNTRY_TABLE C
LEFT JOIN PLAYER_TABLE P ON P.COUNTRY_ID = C.COUNTRY_ID
LEFT JOIN PLAYER_GAME_TYPE G ON P.PLAYER_ID = G.PLAYER_ID
WHERE
G.GAME_TYPE = 'GOLF'
OR G.GAME_TYPE IS NULL
GROUP BY
C.COUNTRY_CODE;
Note the inclusion of OR G.GAME_TYPE IS NULL in the WHERE clause -- if you only have G.GAME_TYPE = 'GOLF', then desired results will be filtered out after the joins.
You can prefer applying the following steps as an option :
convert second LEFT JOIN to RIGHT JOIN(since desired missing abbreviations are in COUNTRY_TABLE which stays at right)
make the filtering condition(followed by the WHERE clause) G.GAME_TYPE = 'GOLF' a match condition
by taking next to the ON clause
such as
SELECT C.COUNTRY_CODE, COUNT(GAME_TYPE)
FROM PLAYER_TABLE P
LEFT JOIN PLAYER_GAME_TYPE G
ON P.PLAYER_ID = G.PLAYER_ID
RIGHT JOIN COUNTRY_TABLE C
ON P.COUNTRY_ID = C.COUNTRY_ID
AND G.GAME_TYPE = 'GOLF'
GROUP BY C.COUNTRY_CODE;
Demo
The simple change of tables join order can solve the problem
SELECT C.COUNTRY_CODE, COUNT(GAME_TYPE)
FROM COUNTRY_TABLE C -- get all countries
LEFT JOIN PLAYER_TABLE P ON P.COUNTRY_ID = C.COUNTRY_ID -- join all players
LEFT JOIN PLAYER_GAME_TYPE G ON P.PLAYER_ID = G.PLAYER_ID AND G.GAME_TYPE = 'GOLF' -- join only GOLF games
GROUP BY C.COUNTRY_CODE;
sqlize online
This is your query:
SELECT C.COUNTRY_CODE, COUNT(GAME_TYPE)
FROM PLAYER_TABLE P
LEFT JOIN PLAYER_GAME_TYPE G ON P.PLAYER_ID = G.PLAYER_ID
LEFT JOIN COUNTRY_TABLE C ON P.COUNTRY_ID = C.COUNTRY_ID
WHERE G.GAME_TYPE = 'GOLF'
GROUP BY C.COUNTRY_CODE;
This query seems to try to select all players even when they are no golfers. This doesn't work, however, as WHERE G.GAME_TYPE = 'GOLF' removes all outer joined rows, so you end up with an inner join (all players who play golf.) At last you outer join the countries table, which would give you players that don't belong to a country. Is this indented? I don't think so.
What you want is countries, so select from countries. Then properly outer join players and types in order to count them.
SELECT c.country_code, COUNT(g.game_type) as golfers_in_country
FROM country_table c
LEFT JOIN player_table p ON p.country_id = c.country_id
LEFT JOIN player_game_type g ON g.player_id = p.player_id AND g.game_type = 'GOLF'
GROUP BY c.country_code
ORDER BY c.country_code;
You can use a CTE to get this more readable. It is longer, but makes the intention crystal-clear. Structuring one's queries like this helps avoiding mistakes.
WITH golfers AS
(
SELECT *
FROM player_table
WHERE player_id IN
(
SELECT player_id
FROM player_game_type
WHERE game_type = 'GOLF'
)
)
SELECT c.country_code, COUNT(g.player_id) as golfers_in_country
FROM country_table c
LEFT JOIN golfers g ON g.country_id = c.country_id
GROUP BY c.country_code
ORDER BY c.country_code;

WHERE clause with JOIN SQL

SELECT AVG(score) AS avg_score, st.name
FROM firstTable AS ft
LEFT JOIN secondTable AS st
ON ft.dog_id = st.dog_id
WHERE (SELECT COUNT(ft.dog_id) FROM firstTable) > 1
GROUP BY dog_id
The where clause doesnt seem to do anything. Why is that? - I'm essentially trying to output the average score only to the dogs that appear more than once in the first table
You should use an INNER join since you want only dogs that match in both tables and add the condition in the HAVING clause:
SELECT AVG(ft.score) AS avg_score, st.name
FROM secondTable AS st INNER JOIN firstTable AS ft
ON ft.dog_id = st.dog_id
GROUP BY st.dog_id
HAVING COUNT(*) > 1;

Multiple joins with group by (Sum)

When I using multiple JOIN, I hope to get the sum of some column in joined tables.
SELECT
A.*,
SUM(C.purchase_price) AS purcchase_total,
SUM(D.sales_price) AS sales_total,
B.user_name
FROM
PROJECT AS A
LEFT JOIN
USER AS B ON A.user_idx = B.user_idx
LEFT JOIN
PURCHASE AS C ON A.project_idx = C.project_idx
LEFT JOIN
SALES AS D ON A.project_idx = D.project_idx
GROUP BY
????
You need to use subquery as follows:
SELECT A.project_idx,
a.project_name,
A.project_category,
sum(C.purchase_price) AS purcchase_total,
sum(D.sales_price) as sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_price
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_price
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
I am not sure but you can use inner join of project with user instead of left join.
SELECT A.project_idx,
a.project_name,
A.project_category,
purcchase_total,
sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_total
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_total
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
This is working correctly on MS-SQL Server.
Thanks to Popeye
You are attempting to aggregate over two unrelated dimensions, and that throws off all the calculations.
Correlated subqueries are an alternative:
SELECT p.*,
(SELECT SUM(pu.purchase_price)
FROM PURCHASE pu
WHERE p.project_idx = pu.project_idx
) as purchase_total,
(SELECT SUM(s.sales_price)
FROM SALES s
WHERE p.project_idx = s.project_idx
) as sales_total,
u.user_name
FROM PROJECT p LEFT JOIN
USER u
ON p.user_idx = u.user_idx ;
Note that this uses meaningful table aliases so the query is easier to read. Arbitrary letters are really no better (and perhaps worse) than using the entire table name.
Correlated subqueries avoid the outer aggregation as well -- and let you select all the columns from the first table, which is what you want. They also often have better performance with the right indexes.

Nested 'Where'?

I have a table named Actor, with only a column for City (CityId). I want to return the number of actors in a particular State (StateId). The catch however is that I have separate tables for City, County, and finally State (City has CountyId, County has StateId). How do I this in a T-SQL query?
I have a solution that involves nested Select statements, something like:
SELECT COUNT(1)
FROM Actor a
WHERE a.CityId IN
(SELECT CityId FROM City WHERE CountyId IN...)
...but is there a more efficient way to do this? Thanks
You can use this query to get your output
----------------------------------------------------------
SELECT COUNT(ActorId)
FROM Actor a
INNER JOIN City c ON a.cityId = c.cityId
INNER JOIN Country con ON c.countryId = con.countryId
INNER JOIN STATE s ON con.stateId = s.stateId
GROUP BY s.stateId
Use JOINS to query your data.
I am using INNER JOIN here.
Assuming that you have CountryId in your City Table, You can do it following way.
In case you don't have countryId in your City Table you have to apply one more INNER JOIN on State Table.
SELECT COUNT(1) FROM Actor a INNER JOIN
City b ON a.CityId = b.CityId
WHERE b.CountryId IN (...)
You can easily put the JOINS across different table that you have and then use the Group By clause to find out the total number of actors from specific state.
I have used the column name on the basis of my wild guess, you can change them with the original name that you have in your database.
SELECT StateId,
Count(ActorId) AS Total
FROM ACTOR
INNER JOIN City ON Actor.CityId = City.CityId
INNER JOIN County ON County.CountyId = City.CountyId
INNER JOIN State ON State.StateId = County.StateId
GROUP BY State.StateId
Assuming the relation names, you can do something like this with joins:
select s.ID, s.Name, count(*)
from Actors a
inner join Cities c on c.ID = a.CityID
inner join County cn on cn.ID = c.CountyID
inner join State s on s.ID = cn.StateID
group by s.ID, s.Name
If you only need the StateId you don't even need to join with states, this will do:
select cn.StateID, count(*)
from Actors a
inner join Cities c on c.ID = a.CityID
inner join County cn on cn.ID = c.CountyID
group by cn.StateID

Query extensibility with WHERE EXISTS with a large table

The following query is designed to find the number of people who went to a hospital, the total number of people who went to a hospital and the divide those two to find a percentage. The table Claims is two million plus rows and does have the correct non-clustered index of patientid, admissiondate, and dischargdate. The query runs quickly enough but I'm interested in how I could make it more usable. I would like to be able to add another code in the line where (hcpcs.hcpcs ='97001') and have the change in percentRehabNotHomeHealth be relfected in another column. Is there possible without writing a big, fat join statement where I join the results of the two queries together? I know that by adding the extra column the math won't look right, but I'm not worried about that at the moment. desired sample output: http://imgur.com/BCLrd
database schema
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join
--this join adds the hospitalCounts column
(
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
) as t on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select distinct p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10
You might look into CTE (Common Table Expressions) to get what you need. It would allow you to get summarized data and join that back to the detail on a common key. As an example I modified your join on the subquery to be a CTE.
;with hospitalCounts as (
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
)
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join hospitalCounts on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10