Compare the same id with 2 values in string in one table - sql

I have a table like this:
id
status
grade
123
Overall
A
123
Current
B
234
Overall
B
234
Current
D
345
Overall
C
345
Current
A
May I know how can I display how many ids is fitting with the condition:
The grade is sorted like this A > B > C > D > F,
and the Overall grade must be greater than or equal to the Current grade
Is it need to use CASE() to switch the grade to a number first?
e.g. A = 4, B = 3, C = 2, D = 1, F = 0
In the table, there should be 345 is not match the condition. How can I display the tables below:
qty_pass_the_condition
qty_fail_the_condition
total_ids
2
1
3
and\
fail_id
345
Thanks.

As grade is sequential you can do order by desc to make the number. for the first result you can do something like below
select
sum(case when GradeRankO >= GradeRankC then 1 else 0 end) AS
qty_pass_the_condition,
sum(case when GradeRankO < GradeRankC then 1 else 0 end) AS
qty_fail_the_condition,
count(*) AS total_ids
from
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankO
from YourTbale
) as a where Status='Overall'
) as b
inner join
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankC
from YourTbale
) as a where Status='Current'
) as c on b.Id=c.Id
For second one you can do below
select
b.Id fail_id
from
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankO
from Grade
) as a where Status='Overall'
) as b
inner join
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankC
from Grade
) as a where Status='Current'
) as c on b.Id=c.Id
where GradeRankO < GradeRankC

You can use pretty simple conditional aggregation for this, there is no need for window functions.
A Pass is when the row of Overall has grade which is less than or equal to Current, with "less than" being in A-Z order.
Then aggregate again over the whole table, and qty_pass_the_condition is simply the number of non-nulls in Pass. And qty_fail_the_condition is the inverse of that.
SELECT
qty_pass_the_condition = COUNT(t.Pass),
qty_fail_the_condition = COUNT(*) - COUNT(t.Pass),
total_ids = COUNT(*)
FROM (
SELECT
t.id,
Pass = CASE WHEN MIN(CASE WHEN t.status = 'Overall' THEN t.grade END) <=
MIN(CASE WHEN t.status = 'Current' THEN t.grade END)
THEN 1 END
FROM YourTable t
GROUP BY
t.id
) t;
To query the actual failed IDs, simply use a HAVING clause:
SELECT
t.id
FROM YourTable t
GROUP BY
t.id
HAVING MIN(CASE WHEN t.status = 'Overall' THEN t.grade END) >
MIN(CASE WHEN t.status = 'Current' THEN t.grade END);
db<>fiddle

Related

Counting number of orders depending on city

I have a temp table that is being created, we will say that column 1 is an order_id, and column 2 is user_id, column 3 is start_date, column 4 is end_date and column 5 is city.
order_id user_id Start_date end_date city
101 1 200001 200101 X
101 2 200101 200110 y
101 3 200110 200112 z
101 3 200112 200210 z
I want to count by city the number of order_ids that moved out of it to another city and in another column the number of order_ids that moved into it from another city.
I would like it to come out as a table, like this:
city moved_out_orders moved_into_orders
x 1 0
y 1 1
z 0 1
You can do:
with
x as (
select a.city as from_city, b.city as to_city
from t a
join t b on a.order_id = b.order_id
and a.city <> b.city
and a.end_date = b.start_date
),
o (city, cnt) as (
select from_city, count(*) from x group by from_city
),
i (city, cnt) as (
select to_city, count(*) from x group by to_city
)
select
coalesce(i.city, o.city) as city,
o.cnt as moved_out_orders,
i.cnt as moved_in_orders
from i
full join o on o.city = i.city
Hmmm . . . I think you just want to enumerate the rows for each order and then discard the highest and lowest for each count:
select city,
sum(case when seqnum_desc > 1 then 1 else 0 end) as moved_out,
sum(case when seqnum_asc > 1 then 1 else 0 end) as moved_in
from (select t.*,
row_number() over (partition by orderid order by startdate) as seqnum_asc,
row_number() over (partition by orderid order by startdate desc) as seqnum_desc
from t
) t
group by city;
EDIT:
You appear to have adjacent rows in the same city. Seems strange, but instead you can use lead() and lag():
select city,
sum(case when next_city <> city then 1 else 0 end) as moved_out,
sum(case when prev_city <> city then 1 else 0 end) as moved_in
from (select t.*,
lag(city) over (partition by orderid order by startdate) as prev_city,
lead(city) over (partition by orderid order by startdate) as next_city
from t
) t
group by city;

Retrieve the minimum value of a column from the max value of another column

I have a hive table t1 that looks like this:
ID Score1 score2
1 4 11
1 5 12
1 5 13
2 3 14
2 3 15
2 2 12
2 2 11
3 6 10
3 6 11
3 6 12
I want for each ID, to select the max value of score1, and if the max value exists more than once, then from the rows that contain max(score1) I want to get min(score2).
So, I want the minimum score2 of the maximum score1 rows, the results should be something like this
ID Score1 score2
1 5 12
2 3 14
3 6 10
Most of the ideas I have turn this to be a very complicated query, and I think there is a simple solution for it that I am not able to find yet.
Any ideas?
Use window functions:
select t.*
from (select t.*,
row_number() over (partition by id order by score1 asc, score2 desc) as seqnum
from t
) t
where seqnum = 1;
Try:
SELECT Z.ID, Z.SCORE1, MIN(SCORE2) AS SCORE2
FROM
(SELECT A.ID, A.SCORE FROM
YOUR_TABLE A
INNER JOIN
(SELECT ID, MAX(SCORE1) FROM YOUR_TABLE GROUP BY ID) B
ON A.ID = B.ID AND A.SCORE1 = B.SCORE1
GROUP BY A.ID, A.SCORE
HAVING COUNT(*)>1
) Z
INNER JOIN YOUR_TABLE C
ON Z.ID = C.ID AND Z.SCORE1 = C.SCORE1
GROUP BY Z.ID, Z.SCORE1;
You can do this with window functions:
SELECT ID, score1, MIN(score2) AS score2
FROM (
SELECT score1, score2, ID
FROM (
SELECT score1, score2, ID
FROM MyTable
QUALIFY RANK OVER(PARTITION BY ID ORDER BY score1 DESC) > 1
) src
QUALIFY COUNT() OVER(PARTITION BY ID) > 1
) src
GROUP BY 1,2
Sorry, writing this from my phone...can't format it well, there may be syntax errors too.
select id, min(score2)
from table1 t
inner join (
select id, max(score1) maxscore1 group by id
) d on t.id = d.id and t.score1 = d.maxscore1
group by t.id
having count(*) > 1 # if the max value exists more than once
an alternative query, if the db does support "analyic functions", is
select
id, min(score2)
from (
select id, score1, score2
, count(case when score1 = max(score1) over(partition by id) then 1 end) count_max
from table1
) d
where count_max > 1 -- if the max value exists more than once
group by
id

selecting the highest count for a categorical variable when grouping

I have the following table:
custID Cat
1 A
1 B
1 B
1 B
1 C
2 A
2 A
2 C
3 B
3 C
4 A
4 C
4 C
4 C
What I need is the most efficient way to aggregate by CustID in such a manner that I obtain the most frequent category (cat), the second most frequent and the third. The output of the above should be
most freq 2nd most freq 3rd most freq
1 B A C
2 A C Null
3 B C Null
4 C A Null
When there is a tie in the count I do not really care what is first and what is second. For example for customer 1 2nd most freq and 3rd most freq could be swapped because each of them occur 1 time only.
Any sql would be fine, preferable hive sql.
Thank you
Try to use group by twice and dense_rank() to sort accorting to the cat count. Actually I'm not 100% sure , but I guess it should work in hive as well.
select custId,
max(case when t.rn = 1 then cat end) as [most freq],
max(case when t.rn = 2 then cat end) as [2nd most freq],
max(case when t.rn = 3 then cat end) as [3th most freq]
from
(
select custId, cat, dense_rank() over (partition by custId order by count(*) desc) rn
from your_table
group by custId, cat
) t
group by custId
demo
According to the comments I add slightly modified solution that conforms with Hive SQL
select custId,
max(case when t.rn = 1 then cat else null end) as most_freq,
max(case when t.rn = 2 then cat else null end) as 2nd_most_freq,
max(case when t.rn = 3 then cat else null end) as 3th_most_freq
from
(
select custId, cat, dense_rank() over (partition by custId order by ct desc) rn
from (
select custId, cat, count(*) ct
from your_table
group by custId, cat
) your_table_with_counts
) t
group by custId
Hive SQL demo
SELECT journal, count(*) as frequency
FROM ${hiveconf:TNHIVE}
WHERE journal IS NOT NULL
GROUP BY journal
ORDER BY frequency DESC
LIMIT 5;

How to filter out the first and last entry from a table using RANK?

I've this data:
Id Date Value
'a' 2000 55
'a' 2001 3
'a' 2012 2
'a' 2014 5
'b' 1999 10
'b' 2014 110
'b' 2015 8
'c' 2011 4
'c' 2012 33
I want to filter out the first and the last value (when the table is sorted on the Date column), and only keep the other values. In case there are only two entries, nothing is returned. (Example for Id = 'c')
ID Date Value
'a' 2001 3
'a' 2012 2
'b' 2014 110
I tried to use order by (RANK() OVER (PARTITION BY [Id] ORDER BY Date ...)) in combination with this article (http://blog.sqlauthority.com/2008/03/02/sql-server-how-to-retrieve-top-and-bottom-rows-together-using-t-sql/) but I can't get it to work.
[UPDATE]
All the 3 answers seem fine. But I'm not a SQL expert, so my question is which one has the fastest performance if the table has around 800000 rows and there a no indexes on any column.
You can use row_number twice to determine the min and max dates and then filter accordingly:
with cte as (
select id, [date], value,
row_number() over (partition by id order by [date]) minrn,
row_number() over (partition by id order by [date] desc) maxrn
from data
)
select id, [date], value
from cte
where minrn != 1 and maxrn != 1
SQL Fiddle Demo
Here's another approach using min and max for this without needing to use a ranking function:
with cte as (
select id, min([date]) mindate, max([date]) maxdate
from data
group by id
)
select *
from data d
where not exists (
select 1
from cte c
where d.id = c.id and d.[date] in (c.mindate, c.maxdate))
More Fiddle
Here is a similar solution with row_number and count :
SELECT id,
dat,
value
FROM (SELECT *,
ROW_NUMBER()
OVER(
partition BY id
ORDER BY dat) rnk,
COUNT(*)
OVER (
partition BY id) cnt
FROM #table) t
WHERE rnk NOT IN( 1, cnt )
You can do this with EXISTS:
SELECT *
FROM Table1 a
WHERE EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date < a.Date
)
AND EXISTS (SELECT 1
FROM Table1 b
WHERE a.ID = b.ID
AND b.Date > a.Date
)
Demo: SQL Fiddle

Sorting sub-queries in SQL

I've been trying to a query working in SQL 2012 which I'm almost certain I am over complicating
I have a table which stores an order number, item numbers (multiple per order), status codes (multiple per item) and a timestamp
So basically something like this
Order Item Status
1 1 1
1 1 2
2 1 1
2 1 2
2 1 3
3 1 3
3 2 1
3 2 2
Within my query (using this table as the example), I need see the following 1 entry for each line and item but only showing the highest available status... BUT not if the status is 3
So in this case, I'd want to see
Order Item Status
1 1 2
3 2 2
The issue I had is that the query itself works... but it returns the FIRST status code it finds. Not the highest one. So I end up with
Order Item Status
1 1 1
3 2 1
Here's the full expanded code snippet
with summary as (
select a.order_no as order_no, a.item_no as item_no, a.timestamp as timestamp,
max(a.status_code) as status_code, row_number() over (partition by order_no
order by item_no asc) as rn
from db.ordhist a
where a.order_no > 120400000 and a.order_no < 120800000
and a.timestamp < Dateadd(DD,-3,GETDATE() )
and a.status_code >= 133
and not exists (
select b.order_no, b.item_no
from db.ordhist b
where b.status_code in (137,170,201,999)
and b.order_no = a.order_no
and b.item_no = a.item_no)
and not exists (
select c.order_no
from db.ordhist c
where c.status_code = 6
and c.order_no = a.order_no)
group by a.order_no, a.item_no, a.timestamp)
select * from summary where rn = 1
I think you don't need ROW_NUMBER just use a GROUP BY with HAVING MAX([Status])<>3:
SELECT [Order],[Item],MAX([Status])
FROM Table_Name
GROUP BY [Order],[Item]
HAVING MAX([Status])<>3
Ok I think I may have answered my own question.... by removing the grouping within the "summary" and doing grouping in the final results query instead
-- Produced (or higher) but not Delivery Noted --
with summary as (
select a.order_no as order_no, a.item_no as item_no, a.timestamp as timestamp, a.status_code as status_code
from db.ordhist a
where a.order_no > 120400000 and a.order_no < 120800000
and a.timestamp < Dateadd(DD,-3,GETDATE() )
and a.status_code >= 133
and not exists (
select b.order_no, b.item_no
from db.ordhist b
where b.status_code in (137,170,201,999)
and b.order_no = a.order_no
and b.item_no = a.item_no)
and not exists (
select c.order_no
from db.ordhist c
where c.status_code = 6
and c.order_no = a.order_no))
select order_no, item_no, timestamp, max(status_code) from summary
group by order_no, item_no, timestamp
order by status_code