LIMITING MAX VALUES IN SQL - sql

I am completely rewriting this question, I just cant crack it
IDB DB2 SQL
(from a Chicago Crime Dataset)
Which community area is most crome prone?
When I use this code, it does correctly count and sort the data
select community_area_number as community_area_number, count(community_area_number) as total_area_crime
from chicago_crime_data
group by community_area_number
order by total_area_crime desc;
the problem is, it lists all the data descending, but no matter what MAX statement I use, either in the select or the order by statement, it wont show just the max values.
The max values are 43, so I would like to to show both 'community_area_numbers' that have 43.
Instead it shows the entire list.
Here is a screenshot
also, yes I understand I can just do a LIMIT 2 command, but that would be cheating since I manually checked that there are 2 max values, but if this data changed or i didnt know that, it doesnt solve anything
thanks in advance

What you would be looking for is the standard SQL clause FETCH WITH TIES;
select community_area_number, count(*) as total_area_crime
from chicago_crime_data
group by community_area_number
order by total_area_crime desc
fetch first row with ties;
Unfortunately, though, DB2 doesn't support WITH TIES in FETCH FIRST.
The classic way (that is before we had the window functions RANK and DENSE_RANK) is to use a subquery: Get the maximum value, then get all rows with that maximum. I am using a CTE (aka WITH clause) here in order not to have to write everything twice.
with counted as
(
select community_area_number, count(*) as total_area_crime
from chicago_crime_data
group by community_area_number
)
select community_area_number, total_area_crime
from counted
where total_area_crime = (select max(total_area_crime) from counted);
(Please note that this is a mere COUNT(*), because we want to count rows per community_area_number.)

Like #topsail mentioned. You could use a rank function.
From the table you have above you could do the following
SELECT t.* FROM
(
SELECT *,
RANK() OVER (Order by Total_Area_Crime DESC) rnk
from
table1
)t
WHERE t.rnk = 1
db fiddle
So your full query should look something like this:
With cte AS (
SELECT MAX(COMMUNITY_AREA_NUMBER) AS COMMUNITY_AREA_NUMBER,
COUNT(COMMUNITY_AREA_NUMBER) AS TOTAL_AREA_CRIME
FROM CHICAGO_CRIME_DATA
GROUP BY COMMUNITY_AREA_NUMBER
ORDER BY TOTAL_AREA_CRIME DESC;
)
SELECT t.* FROM
(
SELECT *,
RANK() OVER (Order by Total_Area_Crime DESC) rnk
from
cte
)t
WHERE t.rnk = 1

It turns out the professor did want us to use the Limit command.
Here is the final answer:
SELECT COMMUNITY_AREA_NUMBER, COUNT(ID) AS CRIMES_RECORDED
FROM CHICAGO_CRIME_DATA
GROUP BY COMMUNITY_AREA_NUMBER
ORDER BY CRIMES_RECORDED DESC LIMIT 1;
thanks to all those who responded :D

Related

SQL - return coumn value based on max column count

I have an SQL query where I can look up the maximum occurrence of a column. It works fine for what I needed, but now need to expand on this to only return the name of the column.
There's likely a way to improve my original query but I haven't been able to work it out.
select COMMUNITY_AREA_NUMBER, count(*) as CRIMES
from CHICAGO_CRIME_DATA
group by COMMUNITY_AREA_NUMBER
order by CRIMES desc
limit 1
My basic question is, how can I re-write this to only return the COMMUNITY_AREA_NUMBER?
I've tried to restructure as:
select COMMUNITY_AREA_NUMBER from CHICAGO_CRIME_DATA
where count(*) as CRIMES group by COMMUNITY_AREA_NUMBER order by CRIMES desc limit 1
I've also tried to incorporate a MAX function, but I don't seem to be getting anywhere with it.
edit:
Apparently I've just learned something inner queries. I needed to keep my original query and create an outer query that request something from that result.
select COMMUNITY_AREA_NUMBER from
(select COMMUNITY_AREA_NUMBER, count(*) as CRIMES
from CHICAGO_CRIME_DATA
group by COMMUNITY_AREA_NUMBER
order by CRIMES desc
limit 1)
Is this what you want?
SELECT COMMUNITY_AREA_NUMBER
FROM CHICAGO_CRIME_DATA
GROUP BY COMMUNITY_AREA_NUMBER
ORDER BY COUNT(*) DESC
LIMIT 1;
This returns a single community number having the most crime records. I have simply moved COUNT(*) from the select clause, where you don't want it, to the ORDER BY clause.

How to work with problems correlated subqueries that reference other tables, without using Join

I am trying to work on public dataset bigquery-public-data.austin_crime.crime of the BigQuery. My goal is to get the output as three column that shows the
discription(of the crime), count of them, and top district for that particular description(crime).
I am able to get the first two columns with this query.
select
a.description,
count(*) as district_count
from `bigquery-public-data.austin_crime.crime` a
group by description order by district_count desc
and was hoping I can get that done with one query and then I tried this in order to get the third column showing me the Top district for that particular description (crime) by adding the code below
select
a.description,
count(*) as district_count,
(
select district from
( select
district, rank() over(order by COUNT(*) desc) as rank
FROM `bigquery-public-data.austin_crime.crime`
where description = a.description
group by district
) where rank = 1
) as top_District
from `bigquery-public-data.austin_crime.crime` a
group by description
order by district_count desc
The error i am getting is this. "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."
I think i can do that by joins. Can someone has better solution possibly to do that using without join.
Below is for BigQuery Standard SQL
#standardSQL
SELECT description,
ANY_VALUE(district_count) AS district_count,
STRING_AGG(district ORDER BY cnt DESC LIMIT 1) AS top_district
FROM (
SELECT description, district,
COUNT(1) OVER(PARTITION BY description) AS district_count,
COUNT(1) OVER(PARTITION BY description, district) AS cnt
FROM `bigquery-public-data.austin_crime.crime`
)
GROUP BY description
-- ORDER BY district_count DESC

can we get totalcount and last record from postgresql

i am having table having 23 records , I am trying to get total count of record and last record also in single query. something like that
select count(*) ,(m order by createdDate) from music m ;
is there any way to pull this out only last record as well as total count in PostgreSQL.
This can be done using window functions
select *
from (
select m.*,
row_number() over (order by createddate desc) as rn,
count(*) over () as total_count
from music
) t
where rn = 1;
Another option would be to use a scalar sub-query and combine it with a limit clause:
select *,
(select count(*) from order_test.orders) as total_count
from music
order by createddate desc
limit 1;
Depending on the indexes, your memory configuration and the table definition might be faster then the two window functions.
No, it's not not possible to do what is being asked, sql does not function that way, the second you ask for a count () sql changes the level of your data to an aggregation. The only way to do what you are asking is to do a count() and order by in a separate query.
Another solution using windowing functions and no subquery:
SELECT DISTINCT count(*) OVER w, last_value(m) OVER w
FROM music m
WINDOW w AS (ORDER BY date DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
The point here is that last_value applies on partitions defined by windows and not on groups defined by GROUP BY.
I did not perform any test but I suspect my solution to be the less effective amongst the three already posted. But it is also the closest to your example query so far.

Selecting top results from SQL Count query, including table join - Oracle

I have this query currently, which selects the top "number of pickups" in descending order. I need to filter only the top 10 rows/highest numbers though. How can I do this?
I have tried adding 'WHERE ROWNUM <= 10' at the bottom, to no avail.
SELECT customer.company_name, COUNT (item.pickup_reference) as "Number of Pickups"
FROM customer
JOIN item ON (customer.reference_no=item.pickup_reference)
GROUP BY customer.company_name, item.pickup_reference
ORDER BY COUNT (customer.company_name) DESC;
Thanks for any help!
You need to subquery it for the rownum to work.
SELECT *
FROM
(
SELECT customer.company_name, COUNT (item.pickup_reference) as "Number of Pickups"
FROM customer
JOIN item ON (customer.reference_no=item.pickup_reference)
GROUP BY customer.company_name, item.pickup_reference
ORDER BY COUNT (customer.company_name) DESC
)
WHERE rownum <= 10
You could alternatively use ranking functions, but given the relative simplicity of this, I'm not sure whether I would.
The solution by using the rank is something like this :
select customer.company_name, COUNT (item.pickup_reference) from (
select distinct customer.company_name, COUNT (item.pickup_reference) ,
rank() over ( order by count(item.pickup_reference) desc) rnk
from customer
JOIN item ON (customer.reference_no=item.pickup_reference)
group by customer.company_name, item.pickup_reference
order by COUNT (customer.company_name) )
where rnk < 10
Using the 'rownum' to get the top result doesn't give the expected result, because it get the 10 first rows which are not ordred, and then order them (Please notify this on a comment on Andrew's response, I don't have the right to add the comment) .

MAX on columns generated by SUM and GROUP BY

I'm trying to get the MAX on a column which is generated dynamically using the SUM statement. The SUM statement is used together with the 'GROUP by' syntax.
This is the original query, however it needs to be modified to work with grouping, sums and of course MAX.
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
As you can see SUM is adding all the values inside video_plays as total_video_plays..
But I SIMPLY want to get the MAX of total_video_plays
My attempts are below, however they do not work..
SELECT SUM(video_plays) AS MAX(total_video_plays)
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
How would you get the MAX on a column made dynamically without using subqueries - Because the above is already placed within one.
Something like
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id`
ORDER BY total_video_plays DESC
LIMIT 1
Hat Tip OMG Ponies for proper MySQL dialect.
You can not do what you're asking without a subquery, because you can't run two aggregate functions, one on top of the other.
Will this work for you?
SELECT MAX(total_video_plays) from table (
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC )
It contains a subquery, but maybe not in the sense you were thinking.
This works for me.
select video_id, sum(video_plays) as sum_video_plays
from (
select video_id, video_plays
, row_number() over (partition by video_id
order by video_id desc) as rn
from video_statistics
) as T
where rn = 1
group by video_id;
can't you just do this?:
SELECT video_id, max(video_plays) AS max_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
There's an example here:
http://dev.mysql.com/doc/refman/5.1/de/select.html
SELECT user, MAX(salary) FROM users
GROUP BY user HAVING MAX(salary) > 10;
EDIT: second attempt, albeit using a subquery:
select max(sum_video_plays)
from (
SELECT video_id, sum(video_plays) AS sum_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id`
) x
Your outer query may well be selecting from a much smaller set, and (depending on your data distribution etc.) may be quite performant.