I have a temp table that is being created, we will say that column 1 is an order_id, and column 2 is user_id, column 3 is start_date, column 4 is end_date and column 5 is city.
order_id user_id Start_date end_date city
101 1 200001 200101 X
101 2 200101 200110 y
101 3 200110 200112 z
101 3 200112 200210 z
I want to count by city the number of order_ids that moved out of it to another city and in another column the number of order_ids that moved into it from another city.
I would like it to come out as a table, like this:
city moved_out_orders moved_into_orders
x 1 0
y 1 1
z 0 1
You can do:
with
x as (
select a.city as from_city, b.city as to_city
from t a
join t b on a.order_id = b.order_id
and a.city <> b.city
and a.end_date = b.start_date
),
o (city, cnt) as (
select from_city, count(*) from x group by from_city
),
i (city, cnt) as (
select to_city, count(*) from x group by to_city
)
select
coalesce(i.city, o.city) as city,
o.cnt as moved_out_orders,
i.cnt as moved_in_orders
from i
full join o on o.city = i.city
Hmmm . . . I think you just want to enumerate the rows for each order and then discard the highest and lowest for each count:
select city,
sum(case when seqnum_desc > 1 then 1 else 0 end) as moved_out,
sum(case when seqnum_asc > 1 then 1 else 0 end) as moved_in
from (select t.*,
row_number() over (partition by orderid order by startdate) as seqnum_asc,
row_number() over (partition by orderid order by startdate desc) as seqnum_desc
from t
) t
group by city;
EDIT:
You appear to have adjacent rows in the same city. Seems strange, but instead you can use lead() and lag():
select city,
sum(case when next_city <> city then 1 else 0 end) as moved_out,
sum(case when prev_city <> city then 1 else 0 end) as moved_in
from (select t.*,
lag(city) over (partition by orderid order by startdate) as prev_city,
lead(city) over (partition by orderid order by startdate) as next_city
from t
) t
group by city;
Related
I have a table like this:
id
status
grade
123
Overall
A
123
Current
B
234
Overall
B
234
Current
D
345
Overall
C
345
Current
A
May I know how can I display how many ids is fitting with the condition:
The grade is sorted like this A > B > C > D > F,
and the Overall grade must be greater than or equal to the Current grade
Is it need to use CASE() to switch the grade to a number first?
e.g. A = 4, B = 3, C = 2, D = 1, F = 0
In the table, there should be 345 is not match the condition. How can I display the tables below:
qty_pass_the_condition
qty_fail_the_condition
total_ids
2
1
3
and\
fail_id
345
Thanks.
As grade is sequential you can do order by desc to make the number. for the first result you can do something like below
select
sum(case when GradeRankO >= GradeRankC then 1 else 0 end) AS
qty_pass_the_condition,
sum(case when GradeRankO < GradeRankC then 1 else 0 end) AS
qty_fail_the_condition,
count(*) AS total_ids
from
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankO
from YourTbale
) as a where Status='Overall'
) as b
inner join
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankC
from YourTbale
) as a where Status='Current'
) as c on b.Id=c.Id
For second one you can do below
select
b.Id fail_id
from
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankO
from Grade
) as a where Status='Overall'
) as b
inner join
(
select * from (
select Id,Status,
Rank() over (partition by Id order by grade desc) GradeRankC
from Grade
) as a where Status='Current'
) as c on b.Id=c.Id
where GradeRankO < GradeRankC
You can use pretty simple conditional aggregation for this, there is no need for window functions.
A Pass is when the row of Overall has grade which is less than or equal to Current, with "less than" being in A-Z order.
Then aggregate again over the whole table, and qty_pass_the_condition is simply the number of non-nulls in Pass. And qty_fail_the_condition is the inverse of that.
SELECT
qty_pass_the_condition = COUNT(t.Pass),
qty_fail_the_condition = COUNT(*) - COUNT(t.Pass),
total_ids = COUNT(*)
FROM (
SELECT
t.id,
Pass = CASE WHEN MIN(CASE WHEN t.status = 'Overall' THEN t.grade END) <=
MIN(CASE WHEN t.status = 'Current' THEN t.grade END)
THEN 1 END
FROM YourTable t
GROUP BY
t.id
) t;
To query the actual failed IDs, simply use a HAVING clause:
SELECT
t.id
FROM YourTable t
GROUP BY
t.id
HAVING MIN(CASE WHEN t.status = 'Overall' THEN t.grade END) >
MIN(CASE WHEN t.status = 'Current' THEN t.grade END);
db<>fiddle
Let this be the table that is provided.
PID
TID
Type
Freq
1
1
A
3
1
1
A
2
1
1
A
1
1
1
B
3
1
2
A
4
1
2
B
5
I want to write a query to get an output like this.
PID
TID
Type
Max_Freq_1
Max_Freq_2
1
1
A
3
2
1
1
B
3
NULL
1
2
A
4
NULL
1
2
B
5
NULL
That is, given a combination of PID, TID, Type, what is the highest and second-highest frequency? If there aren't a sufficient number of entries in the table, then put second highest as NULL
If your database can use the window functions, then the top 2 Freq can be calculated via the DENSE_RANK function.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT PID, TID, Type, Freq
, DENSE_RANK() OVER (PARTITION BY PID, TID, Type ORDER BY Freq DESC) AS Rnk
FROM YourTable t
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
pid
tid
type
max_freq_1
max_freq_2
1
1
A
3
2
1
1
B
3
null
1
2
A
4
null
1
2
B
5
null
If ROW_NUMBER isn't available, then try this.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT t1.PID, t1.TID, t1.Type, t1.Freq
, COUNT(DISTINCT t2.Freq) AS Rnk
FROM YourTable t1
LEFT JOIN YourTable t2
ON t2.PID = t1.PID
AND t2.TID = t1.TID
AND t2.Type = t1.Type
AND t2.Freq >= t1.Freq
GROUP BY t1.PID, t1.TID, t1.Type, t1.Freq
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
Demo on db<>fiddle here
This is what I came up with on PostgreSQL. Using the window function like row_number is the easiest way to get the result you want.
with t as (
select *, row_number() over (partition by pid, tid, "type" order by freq desc) as r
from test_so
) select pid, tid, "type", max(case when r = 1 then freq end) as "highest", max(case when r = 2 then freq end) as "second_highest"
from t
group by pid, tid, "type"
I want to calculate unique rankings but I get duplicate rankings
Here's my attempt:
SELECT
TG.EMPCODE,
DENSE_RANK() OVER (ORDER BY TS.COUNT_DEL DESC, TG.COUNT_TG DESC) AS YOUR_RANK
FROM
(SELECT
EmpCode,
SUM(CASE WHEN Tgenerate = 1 THEN 1 ELSE 0 END) AS COUNT_TG
FROM
TBLTGENERATE1
GROUP BY
EMPCODE) TG
INNER JOIN
(SELECT
EMP_CODE,
SUM(CASE WHEN STATUS = 'DELIVERED' THEN 1 ELSE 0 END) AS COUNT_DEL
FROM
TBLSTAT
GROUP BY
EMP_CODE) TS ON TG.EMPCODE = TS.EMP_CODE;
The output I get is like this:
EID Rank
---------
102 1
105 2
101 2
103 3
106 4
There is same rank for 105 and 101.
How do I calculate unique ranking?
Use ROW_NUMBER() instead of DENSE_RANK():
SELECT TG.EMPCODE,
ROW_NUMBER() OVER (ORDER BY TS.COUNT_DEL DESC, TG.COUNT_TG DESC) AS YOUR_RANK
Ties will then be given sequential rankings.
I have the following table:
custID Cat
1 A
1 B
1 B
1 B
1 C
2 A
2 A
2 C
3 B
3 C
4 A
4 C
4 C
4 C
What I need is the most efficient way to aggregate by CustID in such a manner that I obtain the most frequent category (cat), the second most frequent and the third. The output of the above should be
most freq 2nd most freq 3rd most freq
1 B A C
2 A C Null
3 B C Null
4 C A Null
When there is a tie in the count I do not really care what is first and what is second. For example for customer 1 2nd most freq and 3rd most freq could be swapped because each of them occur 1 time only.
Any sql would be fine, preferable hive sql.
Thank you
Try to use group by twice and dense_rank() to sort accorting to the cat count. Actually I'm not 100% sure , but I guess it should work in hive as well.
select custId,
max(case when t.rn = 1 then cat end) as [most freq],
max(case when t.rn = 2 then cat end) as [2nd most freq],
max(case when t.rn = 3 then cat end) as [3th most freq]
from
(
select custId, cat, dense_rank() over (partition by custId order by count(*) desc) rn
from your_table
group by custId, cat
) t
group by custId
demo
According to the comments I add slightly modified solution that conforms with Hive SQL
select custId,
max(case when t.rn = 1 then cat else null end) as most_freq,
max(case when t.rn = 2 then cat else null end) as 2nd_most_freq,
max(case when t.rn = 3 then cat else null end) as 3th_most_freq
from
(
select custId, cat, dense_rank() over (partition by custId order by ct desc) rn
from (
select custId, cat, count(*) ct
from your_table
group by custId, cat
) your_table_with_counts
) t
group by custId
Hive SQL demo
SELECT journal, count(*) as frequency
FROM ${hiveconf:TNHIVE}
WHERE journal IS NOT NULL
GROUP BY journal
ORDER BY frequency DESC
LIMIT 5;
I've been trying to a query working in SQL 2012 which I'm almost certain I am over complicating
I have a table which stores an order number, item numbers (multiple per order), status codes (multiple per item) and a timestamp
So basically something like this
Order Item Status
1 1 1
1 1 2
2 1 1
2 1 2
2 1 3
3 1 3
3 2 1
3 2 2
Within my query (using this table as the example), I need see the following 1 entry for each line and item but only showing the highest available status... BUT not if the status is 3
So in this case, I'd want to see
Order Item Status
1 1 2
3 2 2
The issue I had is that the query itself works... but it returns the FIRST status code it finds. Not the highest one. So I end up with
Order Item Status
1 1 1
3 2 1
Here's the full expanded code snippet
with summary as (
select a.order_no as order_no, a.item_no as item_no, a.timestamp as timestamp,
max(a.status_code) as status_code, row_number() over (partition by order_no
order by item_no asc) as rn
from db.ordhist a
where a.order_no > 120400000 and a.order_no < 120800000
and a.timestamp < Dateadd(DD,-3,GETDATE() )
and a.status_code >= 133
and not exists (
select b.order_no, b.item_no
from db.ordhist b
where b.status_code in (137,170,201,999)
and b.order_no = a.order_no
and b.item_no = a.item_no)
and not exists (
select c.order_no
from db.ordhist c
where c.status_code = 6
and c.order_no = a.order_no)
group by a.order_no, a.item_no, a.timestamp)
select * from summary where rn = 1
I think you don't need ROW_NUMBER just use a GROUP BY with HAVING MAX([Status])<>3:
SELECT [Order],[Item],MAX([Status])
FROM Table_Name
GROUP BY [Order],[Item]
HAVING MAX([Status])<>3
Ok I think I may have answered my own question.... by removing the grouping within the "summary" and doing grouping in the final results query instead
-- Produced (or higher) but not Delivery Noted --
with summary as (
select a.order_no as order_no, a.item_no as item_no, a.timestamp as timestamp, a.status_code as status_code
from db.ordhist a
where a.order_no > 120400000 and a.order_no < 120800000
and a.timestamp < Dateadd(DD,-3,GETDATE() )
and a.status_code >= 133
and not exists (
select b.order_no, b.item_no
from db.ordhist b
where b.status_code in (137,170,201,999)
and b.order_no = a.order_no
and b.item_no = a.item_no)
and not exists (
select c.order_no
from db.ordhist c
where c.status_code = 6
and c.order_no = a.order_no))
select order_no, item_no, timestamp, max(status_code) from summary
group by order_no, item_no, timestamp
order by status_code