SQL obtaining items ranked by their count(*) - sql

I have been attempting the following query for a while- not sure how to approach this issue I'm having.
I need to obtain bands that cover the second most styles of music - including all equal bands if there is a tie for second. For example for the table band_style,
Band_id | Style
---------------------
1 Rock
2 Pop
1 Punk
3 Classical
1 Metal
2 Rock
4 Pop
4 Rap
The returned result should be
Band_id | Num_styles
2 2
4 2
My initial attempt at a solution:
SELECT band_id, COUNT(*) AS num_styles FROM band_style
GROUP BY band_id HAVING COUNT(*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id));
So this gives me the count of all the bands with less styles than the maximum. Now, I'd like to take ALL rows which have the maximum value of this query. I do not want to use rownum or limit because from what I've experienced this doesn't work too well in the case of ties. I am also wondering if there is a way to wrap this in another MAX function, but I don't really see how.
Any help with this issue would be appreciated- also think this would be useful to know to see if it can be applied to 3rd, 4th highest, etc.
(Using Oracle/SQLPlus)
Assuming this is a large data file and we do not necessarily know what the "second highest count" is.
UPDATE: this almost works- gets all bands with less than max number of styles. But calling MAX doesn't seem to be working, as the table returned still has all values of NUM except the max..
WITH data AS (
SELECT band_id, COUNT(*) AS NUM FROM band_style GROUP BY band_id HAVING COUNT (*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id)))
SELECT data.band_id, data.NUM FROM data
INNER JOIN ( SELECT band_id m, MAX(NUM) n
FROM data GROUP BY band_id
) t
ON t.band_id = data.band_id
AND t.NUM = data.NUM;

If you have to stick with mysql, this sql will be much more difficult. But if you could switch to mariadb or oracle this should work.
with data as (
select
band_id, count(*) styles,
dense_rank() over (order by count(*) desc) place
from
table1 group by band_id)
select * from data where place=2
http://sqlfiddle.com/#!4/dc3f6/12
Your friend here is the window function dense_rank.
The output is:
BAND_ID STYLES PLACE
2 2 2
4 2 2
And here to avoid some missunderstandings, due to place 2 is here styles 2.
http://sqlfiddle.com/#!4/2be32/3
Now the styles count is different from the place id.
BAND_ID STYLES PLACE
4 3 2
This illustrates that dense_rank does not know the second highest count value beforehand.

Related

Is there a way to display the first two results of each unique id?

I work in healthcare. In a Postgres database, we have a table member IDs and dates. I'm trying to pull the latest two dates for each member ID.
Simplified sample data:
A 1
B 1
B 2
C 1
C 5
C 7
D 1
D 2
D 3
D 4
Desired result:
A 1
B 1
B 2
C 1
C 5
D 1
D 2
I get a strong feeling this is for a homework assignment and would recommend that you look into partitioning and specifically rank() function by yourself first before looking at my solution.
Moreover, you have not specified how you received the initial result you provided, so I'll have to assume you just did select letter_column, number_column from my_table; to achieve the result.
So, what you actually want here is partition the initial query result into groups by the letter_column and select the first two rows in each. rank() function lets you assign each row a number, counting within groups:
select letter_column,
number_column,
rank() over (partition by letter_column order by number_column) as rank
from my_table;
Since it's a function, you can't use it in a predicate in the same query, so you'll have to build another query around this one, this time filtering the results where rank is over 2:
with ranked_results as (select letter_column,
number_column,
rank() over (partition by letter_column order by number_column asc) as rank
from my_table mt)
select letter_column,
number_column
from ranked_results
where rank < 3;
Here's an SQLFiddle to play around: http://sqlfiddle.com/#!15/e90744/1/0
Hope this helps!

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

Count how many times you exit route the car

i have a question, im on a project GPS,
When a car on route save in our database a value = 1, but when this go off route save in our database a value=0, so how can i determine the number of times out of route from an SQL query?
This is a example of our table:
Then if you look the pic, value 1 car is on route and 0 car is off route, i want count the values groups, for example my result will be:
Off Route = 2 (times)
There are several ways to do this. Depending on your database, there may be an easier version with window functions such as lead and lag. However, this should be generic with exists:
select count(y1.id)
from yourtable y1
where y1.value = 0 and exists (
select 1
from yourtable y2
where y2.id = y1.id - 1 and y2.value = 1)
;with cteBase as (
Select *,RowNr = Row_Number() over (Order by ID) From YourTable
)
Select A.*
From cteBase A
Join cteBase B on (A.RowNr=B.RowNr+1 and A.Value=0 and B.Value=1)
Returns
ID Value RowNr
4 0 4
6 0 6
I didn't want to assume the ID was incremental so I used Row_Number(). If incremental #sgeddes works just as well
If what you need is just the number of times Value is 1 as on Route and 0 as Off Route, try the following query:
select
sum(Value) as [On_route],
sum(abs(Value-1)) as [Off_Route]
from GPSTable
abs(Value-1) essentially coverts 0 to 1 and 1 to 0 which is easy enough to check,

SQL getting data from 2 tables

I've got a tricky (at least for me it's tricky) question, I want to arrange data by comment count. My first table is called all_comments which has these columns (more but not essential):
comment, target_id
My second table is called our_videos which has these columns (more but not essential):
id, title
I want to get the count of all comments that have target_id same as id on 2nd table and arrange that data by comment count. Here is example of what I want:
TABLE #1:
id target_id
----------------
1 3
2 5
3 5
4 3
5 3
TABLE #2:
id title
-----------
1 "test"
2 "another-test"
3 "testing"
5 "......"
This is basically saying that data, that is in 2nd database and have id of 3 have 3 comments, and data that have id of 5 have 2 comments, and I want to arrange that data by this comment count and get result like this:
RESULT:
id title
----------------
3 "testing"
5 "......."
1 "test"
2 "another-test"
If I missed any important info needed for this question just ask, thanks for help, peace :)
it is very simple query and you definitely have to look at any sql tutorial
naive variant will be:
select videos.id, videos.title, count(*) as comment_count
from videos
left outer join
comments
on (videos.id = comments.target_id)
group by videos.id, videos.title
order by comment_count desc
this version has some performance problems, because you have to group by name, to speed up it we usually do next thing:
select videos.id, videos.title, q.cnt as comment_count
from videos
left outer join
(
select target_id, count(*)
from comments
group by target_id
) as q
on videos.id = q.target_id
order by q.cnt DESC
select videos.id, videos.title, isnull(cnt, 0) as cnt
from videos
left outer join
(select target_id, count(*) as cnt
from comments
group by target_id) as cnts
on videos.id = cnts.target_Id
order by isnull(cnt, 0) desc, videos.title
Some systems will let you write this even though sorting is not strictly supposed to happen on an column not included in the output. I don't necessarily recommend it but I might argue it's the most straightforward.
select id, title from videos
order by (select count(*) from comments where target_id = videos.id) desc, title
If you don't mind having it in the output it's a quick change:
select id, title from videos,
(select count(*) from comments where target_id = videos.id) as comment_count
order by comment_count desc, title
SQL generally has a lot of options.