SQL: Select inside select? - sql

I have a table of car accident in a major city, and the structure is like:
accident_table has the following columns:
id, caseno, date_of_occurrence, street, iucr, primary_type,
description, district, community_area, year, updated_on
I want to write a query that finds the street which has the most accidents for each district(I think the street count for each street is the number of accident that happened on that street).
Here is what I have:
SELECT DISTINCT on (street)
street,
district
FROM
(
SELECT
count(street) as street_cnt,
street,
district
FROM accident_table
)
WHERE street_count = (SELECT max(street_cnt))
It did not give me syntax error, but timed out, so I guess it took too long to run.
What's wrong and how to fix it?
Thanks,
Philip

First aggregate to get the count of accidents for each street. Then use the rank() window function to rank the streets within a district by the count of accidents in them. Then only select the ones that were ranked at the top.
SELECT x.district,
x.street,
x.accidents
FROM (SELECT a.district,
a.street,
count(*) accidents,
rank() OVER (PARTITION BY a.district
ORDER BY count(*) DESC) r
FROM accident_table a
GROUP BY a.district,
a.street) x
WHERE x.r = 1;

Your code looks like Postgres. In that database, you can express this without a subquery:
SELECT DISTINCT ON (a.district)
a.district, a.street, COUNT(*) as accidents
FROM accident_table a
GROUP BY a.district, a.street
ORDER BY a.district, COUNT(*) DESC;
That said, your problem is performance, which is probably not affected by subqueries. An index on accident_table(district, street) might help performance.

Related

Improve distinct query performance

Any idea of how we can improve this query execution ? (maybe with some pre-aggregation)?
SELECT p.segment, country, count(distinct userid)
from pixel_data_opt p
WHERE country in ('US')
and segment is not null
GROUP BY p.segment, country;
I tried the below but it didn't help -
select segment, country,sum(cnt)
from
(SELECT p.segment, country, userid,count(*) as cnt
from pixel_data_opt p
WHERE country in ('US')
and segment is not null
GROUP BY p.segment, country,userid
)
group by 1,2;
There's nothing wrong with your first query - though, it could have been where country = 'US' - but optimizer (as far as Oracle is concerned) is smart enough to figure it out.
Is the country column indexed? If not, do that.
Also, gather statistics on the table.
It would probably help if you posted some more info, e.g. number of rows involved, explain plan as it shows figures that mean something.
For this query:
SELECT p.segment, country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
segment is not null
GROUP BY p.segment, country;
You want an index on the table. There are several approaches. One reasonable choice is: pixel_data_opt(country, segment, userid).
I would suggest rewriting the query as:
SELECT p.segment, 'US' as country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
segment is not null
GROUP BY p.segment;
and using the above index.

Get the first 3 elements for each value of an attribute SQL

I'm using the mondial database that can be queried here http://www.semwebtech.org/sqlfrontend/
I am trying to get the 3 religion that are most praticed on each continent I've come up with this:
select religion.name, sum(religion.percentage) as total, continent
from religion join encompasses on religion.country = encompasses.country
group by name, continent order by continent, total DESC
This gives me a list of each religion with for each continent ordered by their popularity but how do I get the first 3 results for each continent ?
I have looked up cursor but I don't see how to apply them to my case and it looks like there is a simple answer
I would use window functions:
select rc.*
from (select r.name, sum(r.percentage) as total, e.continent,
row_number() over (partition by e.continent order by sum(r.percentage) as total desc) as seqnum
from religion r join
encompasses e
on r.country = e.country
group by r.name, e.continent
) rc
where seqnum <= 3;
Having explained how to modify your query to answer the question, let me now point out that your query is wrong. The sum of the percentage of people is not the same as the total number of people. For instance, I think Bhutan is pretty close to 100% Buddhist (Buddhism is the state religion). But there are more Buddhists in India (~0.7% Buddhist, according to one source).

SQL: find duplicates, with a different field

I have to find duplicates in an Access table, where one field is different.
I'll try to explain: assuming to have this data set
ID Country CountryB Customer
====================================================
1 Italy Austria James
2 Italy Austria James
3 USA Austria James
I have to find all the records with duplicated CountryB and Customer, but with different Country.
For instance, with the data above, the ID 1 and 2 are NOT duplicated (as they are from the same Country), while 1 and 3 (or 2 and 3) are.
The "best" query I got is the following one:
SELECT COUNT(*), CountryB, Customer FROM
(SELECT MIN(ID) as MinID, Country, CountryB, Customer FROM myTable GROUP BY Country, CountryB, Customer)
GROUP BY CountryB, Customer
HAVING COUNT(*)>1
I'm not sure if this is the smartest option, anyhow.
Furthermore, since I need to "mark" all the duplicates, I have to do something more, like this:
SELECT ID, a.Country, a.CountryB, a.Customer FROM myTable a
INNER JOIN
(
SELECT COUNT(*), CountryB, Customer FROM
(SELECT MIN(ID) as MinID, Country, CountryB, Customer FROM myTable GROUP BY Country, CountryB, Customer)
GROUP BY CountryB, Customer
HAVING COUNT(*)>1
) dt
ON a.Country=dt.Country and a.CountryB=dt.CountryB and a.Customer=dt.Customer
Any suggestion this approach is greatly appreciated.
I finally found a solution.
The correct solution is in this answer:
SELECT DISTINCT HAVING Count unique conditions
Adapted with this version, since I'm using Access 2010:
Count Distinct in a Group By aggregate function in Access 2007 SQL
Therefore, in my example table above, I can use this query to find duplicate records:
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
or this query to find all the IDs of the duplicated records:
SELECT ID FROM myTable a INNER JOIN
(
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
) dt
ON a.CountryB=dt.CountryB AND a.Customer=dt.Customer

postgis postgres count and group by column for ST_Distance function

This SQL produces the following:
SELECT city FROM travel_logs ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
"Tshopo"
"Tshopo"
"Mongala"
"Haut-Komo"
This SQL produces the following:
SELECT city, count(*) AS count FROM travel_logs GROUP BY travel_logs.start_point, city ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
"Tshopo";1
"Tshopo";1
"Mongala";1
"Haut-Komo";1
Basically, I want the result like this that groups by city and the number of times same city occurs. something like this
"Tshopo";2 <--- its summed up correctly
"Mongala";1
"Haut-Komo";1
Im not an expert on joins, subquery, would that help ? Thanks in advance.
this worked for me:
select city, count(*) as count
from
(SELECT city FROM travel_logs ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
) as subquery_travel_logs_nearest group by city
Simple, plain SQL without a sub-query:
SELECT city, count(*)
FROM travel_logs
GROUP BY city
ORDER BY ST_Distance(start_point,
ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'));

SQL return based on Aggregate Function

I have this query that works, but it returns information for all cities and I only want to return based on the max population in the city for each row for one country but aggregate functions can't be used in the where clause. How can I limit my results to one per country?
SELECT lab6.country.name, max(lab6.city.population) AS largest_pop
FROM lab6.country, lab6.city
WHERE lab6.country.country_code = lab6.city.country_code
GROUP BY lab6.country.name, lab6.city.name"
PostgreSQL supports window functions that you can take advantage with.
SELECT countryName, cityName, largest_pop
FROM
(
SELECT a.name countryName,
b.name cityName,
b.population AS largest_pop,
DENSE_RANK() OVER (PARTITION BY a.name
ORDER BY b.population DESC) rn
FROM lab6.country a, lab6.city b
WHERE a.country_code = b.country_code
) x
WHERE rn = 1
Window Functions
Maybe I'm misunderstanding but do you just want to return the largest city in each country?
If so, you simply can group by country, instead of by country and city. You'll need to include the attribute that identifies a country, and the name of that country in your GROUP BY statement. Your query will end up looking like:
SELECT lab6.country.name AS cName, max(lab6.city.population) AS largest_pop
FROM lab6.country, lab6.city
WHERE lab6.country.country_code = lab6.city.country_code
GROUP BY lab6.country.country_code, lab6.country.name
If you want to also include the name of the largest city, you'll first need to decide what to do if there are multiple largest cities (countries where there are two or more cities with the same, largest, population). I'm going to assume you're okay with including them all. In that case, you can simply do a sub-query in your FROM clause, joined on cities with the same population:
SELECT lc.cName, lab6.city.name, lc.largest_pop
FROM (
SELECT lab6.country.country_code AS cCode
lab6.country.name AS cName,
max(lab6.city.population) AS largest_pop
FROM lab6.country, lab6.city
WHERE lab6.country.country_code = lab6.city.country_code
GROUP BY lab6.country.country_code, lab6.country.name
) AS lc
JOIN lab6.city ON lc.cCode = lab6.city.country_code
WHERE lab6.city.population = lc.largest_pop