Case statement for HIVE platform - hive

I have a table with the following columns:
ID
Scheduled Date
Status
Target Date
I need to extract 'Status' corresponding to minimum 'Appointment Date' for each ID. If not available then I need to extract status corresponding to the minimum 'Target Date' for that ID.
Sample data:
ID | Scheduled_Date | Status | Target_Date
1 12/11/2017 Completed 12/11/2017
1 12/12/2017 Completed 12/12/2017
2 12/13/2017 Completed 12/13/2017
3 12/14/2017 Pending 12/14/2017
3 12/15/2017 Pending 12/15/2017
4 Confirmed 12/18/2017
4 Confirmed 12/19/2017
5 12/14/2017 Completed 12/14/2017
5 12/15/2017 Pending 12/15/2017
Can you please correct the code that I am trying to write?
SELECT ID,
CASE WHEN ID IS NOT NULL THEN
CASE WHEN MIN(SCHEDULED_DATE) IS NOT NULL
THEN STATUS
ELSE
END
CASE WHEN MIN(TARGET_DATE) IS NOT NULL
THEN STATUS
ELSE ''
END
FROM FIRST_STATUS

Try this query.
SELECT id,
status
FROM yourtable t
WHERE COALESCE (Scheduled_Date,
Target_Date) IN
(SELECT MIN(COALESCE (Scheduled_Date,Target_Date))
FROM yourtable i
WHERE i.ID = t.id
GROUP BY i.ID);
DEMO

Use row_number() analytic function:
select id,
status
from
(
select id,
status,
row_number() over(partition by id, order by nvl(Scheduled_Date,Target_Date)) rn
from yourtable t
)s
where rn=1
;

Related

Check for condition in GROUP BY?

Take this example data:
ID Status Date
1 Pending 2/10/2020
2 Pending 2/10/2020
3 Pending 2/10/2020
2 Pending 2/10/2020
2 Pending 2/10/2020
1 Complete 2/15/2020
I need an SQL statement that will group all the data but bring back the current status. So for ID 1 the group by needs a condition that only returns the Completed row and also returned the pending rows for ID 2 and 3.
I am not 100% how to write in the condition for this.
Maybe something like:
SELECT ID, Status, Date
FROM table
GROUP BY ID, Status, Date
ORDER BY ID
The problem with this is the resulting data would look like:
ID Status Date
1 Pending 2/10/2020
1 Complete 2/15/2020
2 Pending 2/10/2020
3 Pending 2/10/2020
But I need:
ID Status Date
1 Complete 2/15/2020
2 Pending 2/10/2020
3 Pending 2/10/2020
What can I do to check for the Completed status so I can only return Completed in the group by?
Do only GROUP BY the ID column. Use MIN() to chose Complete before Pending.
SELECT ID, MIN(Status)
FROM table
GROUP BY ID
ORDER BY ID
To use Date as 'last row indicator', you can:
DECLARE #Src TABLE (
ID int,
Status varchar(20),
Date Date
)
INSERT #Src VALUES
(1, 'Pending' ,'2/10/2020'),
(1, 'Complete' ,'2/15/2020'),
(2, 'Pending' ,'2/10/2020'),
(3, 'Pending' ,'2/10/2020');
SELECT TOP 1 WITH TIES *
FROM #Src
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date DESC)
Result:
ID Status Date
----------- -------------------- ----------
1 Complete 2020-02-15
2 Pending 2020-02-10
3 Pending 2020-02-10

SQL JOIN - retrieve MAX DateTime from second table and the first DateTime after previous MAX for other value

I have issue with creating a proper SQL expression.
I have table TICKET with column TICKETID
TICKETID
1000
1001
I then have table STATUSHISTORY from where I need to retrieve what was the last time (maximum time) when that ticket entered VENDOR status (last VENDOR status) and when it exited VENDOR status (by exiting VENDOR status I mean the first next INPROG status, but only first INPROG after the VENDOR status, it's always INPROG the next status after VENDOR status). Also it is also possible that VENDOR status for ID does not exist at all in STATUSHISOTRY (then nulls should be returned), but INPROG exists always - it can be before but also and after VENDOR status, if ID is not anymore in VENDOR status.
Here is the example of STATUSHISTORY.
ID TICKETID STATUS DATETIME
1 1000 INPROG 01.01.2017 10:00
2 1000 VENDOR 02.01.2017 10:00
3 1000 INPROG 03.01.2017 10:00
4 1000 VENDOR 04.01.2017 10:00
5 1000 INPROG 05.01.2017 10:00
6 1000 HOLD 06.01.2017 10:00
7 1000 INPROG 07.01.2017 10:00
8 1001 INPROG 02.02.2017 10:00
9 1001 VENDOR 03.02.2017 10:00
10 1001 INPROG 04.02.2017 10:00
11 1001 VENDOR 05.02.2017 10:00
So the result when doing the query from TICKET table and doing the JOIN with table STATUSHISTORY should be:
ID VENDOR_ENTERED VENDOR_EXITED
1000 04.01.2017 10:00 05.01.2017 10:00
1001 05.02.2017 10:00 null
Because for ID 1000 last VENDOR status was at 04.01.2017 and the first INPROG status after the VENDOR status for that ID was at 05.01.2017 while for ID 1001 the last VENDOR status was at 05.02.2017 and after that INPROG status did not happen yet.
If VENDOR did not exist then both columns should be null in result.
I am really stuck with this, trying different JOINs but without any progress.
Thank you in advance if you can help me.
You can do this with window functions. First, assign a "vendor" group to the tickets. You can do this using a cumulative sum counting the number of "vendor" records on or before each record.
Then, aggregate the records to get one record per "vendor" group. And use row numbers to get the most recent records. So:
with vg as (
select ticket,
min(datetime) as vendor_entered,
min(case when status = 'INPROG' then datetime end) as vendor_exitied
from (select sh.*,
sum(case when status = 'VENDOR' then 1 else 0 end) over (partition by ticketid order by datetime) as grp
from statushistory sh
) sh
group by ticket, grp
)
select vg.tiketid, vg.vendor_entered, vg.vendor_exited
from (select vg.*,
row_number() over (partition by ticket order by vendor_entered desc) as seqnum
from vg
) vg
where seqnum = 1;
You can aggregate to get max time, then join onto all of the date values higher than that time, and then re-aggregate:
select a.TicketID,
a.VENDOR_ENTERED,
min( EXIT_TIME ) as VENDOR_EXITED
from (
select TicketID,
max( DATETIME ) as VENDOR_ENTERED
from StatusHistory
where Status = 'VENDOR'
group by TicketID
) as a
left join
(
select TicketID,
DATETIME as EXIT_TIME
from StatusHistory
where Status = 'INPROG'
) as b
on a.TicketID = b.TicketID
and EXIT_TIME >= a.VENDOR_ENTERED
group by a.TicketID,
a.VENDOR_ENTERED
DB2 is not supported in SQLfiddle, but a standard SQL example can be found here.

SQL select specific group from table

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.
I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;
This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

SQL query to group by data but with order by clause

I have table booking in which I have data
GUEST_NO HOTEL_NO DATE_FROM DATE_TO ROOM_NO
1 1 2015-05-07 2015-05-08 103
1 1 2015-05-11 2015-05-12 104
1 1 2015-05-14 2015-05-15 103
1 1 2015-05-17 2015-05-20 101
2 2 2015-05-01 2015-05-02 204
2 2 2015-05-04 2015-05-05 203
2 2 2015-05-17 2015-05-22 202
What I want is to get the result as.
1 ) It should show output as Guest_no, Hotel_no, Room_no, and column with count as number of time previous three column combination repeated.
So OutPut should like
GUEST_NO HOTEL_NO ROOM_NO Count
1 1 103 2
1 1 104 1
1 1 101 1
2 2 204 1
etc. But I want result to in ordered way e.g.: The output should be order by bk.date_to desc
My query is as below its showing me count but if I use order by its not working
select bk.guest_no, bk.hotel_no, bk.room_no,
count(bk.guest_no+bk.hotel_no+bk.room_no) as noOfTimesRoomBooked
from booking bk
group by bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to
order by bk.date_to desc
So with adding order by result is showing different , because as I added order by date_to column so i have to add this column is group by clause too which will end up in different result as below
GUEST_NO HOTEL_NO ROOM_NO Count
1 1 103 1
1 1 104 1
1 1 103 1
1 1 101 1
2 2 204 1
Which is not the output I want.
I want these four column but with order by desc of date_to column and count as no of repetition of first 3 columns
I think a good way to do this would be grouping by guest_no, hotel_no and room_no, and sorting by the maximum (i.e. most recent) booking date in each group.
SELECT
guest_no,
hotel_no,
room_no,
COUNT(1) AS BookingCount
FROM
booking
GROUP BY
guest_no,
hotel_no,
room_no
ORDER BY
MAX(date_to) DESC;
Maybe this is what you're looking for?
select
guest_no,
hotel_no,
room_no,
count(*) as Count
from
booking
group by
guest_no,
hotel_no,
room_no
order by
min(date_to) desc
Or maybe max() instead of min(). SQL Fiddle: http://sqlfiddle.com/#!6/e684c/3
You could try this.
select t.* from
(
select bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to,
count(*) as noOfTimesBooked from booking bk
group by bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to
) t
order by t.date_to
You will also have to select date_to and then group the result by it.
If you use 'group by' clause, SQL Server doesn't allow you to use 'order by'. So you can make a sub query and use 'order by' in the outer query.
SELECT * FROM
(select bk.guest_no,bk.hotel_no,bk.room_no
,count(bk.guest_no+bk.hotel_no+bk.room_no) as noOfTimesRoomBooked,
(SELECT MAX(date_to) FROM booking CK
WHERE CK.guest_no=BK.guest_no AND bk.hotel_no=CK.bk.hotel_no
bk.room_no=CK.ROOM_NO ) AS DATEBOOK
from booking bk
group by bk.guest_no,bk.hotel_no,bk.room_no,bk.date_to) A
ORDER BY DATEBOOK
IT MIGHT HELP YOU

Find count from specific table by specific filter sql server

I am Having table like this:
id candid candname status date time location jobcode
1 12 hhhhhhhhhh Introduce 2014-05-21 14:0 NewYork 10JN
3 12 hhhhhhhhhh Reject 2014-05-21 15:0 AM London 10JN
4 12 hhhhhhhhhh Interview 2014-05-21 15:0 PM Chicago 10JN
5 11 Pinky Bare Introduce 2014-05-21 65:6 India 10JN
6 11 Pinky Bare Interview 2014-05-21 4:56 AM 10JN
7 13 chetan Tae Introduce 2014-05-21 4:54 AM Nagpur faOl
8 13 chetan Tae Interview 2014-05-21 3:45 Pune faOl
9 14 manisha mane Introduce 2014-05-21 3:33 PM Pune faOl
10 18 ranju gondane Introduce 2014-05-28 3:44 Nagpur AQW-06
12 18 ranju gondane Interview 2014-05-28 5:45 45454 AQW-06
13 18 ranju gondane Reject 2014-05-28 43:43 rsds AQW-06
14 19 vandanna rai Introduce 2014-05-28 7:7 yyyr AQW-06
if i use query
SELECT COUNT(*) FROM [tablename]
WHERE
(jobcode='AQW-06')
AND
([status] <> 'Interview' AND [status] <> 'Reject'
AND
[status] <> 'ON-Hold' AND [status] <> 'Hire')
I get count 2 for introduce candidates..
if the candidate is interviewd after introduce, it will not counted as Introduce
I want the count of Introduce, interviewd, rejected candidates of specofic jobcode
Please help me for this.
You can try
select status, count(*)
from [tablename]
where jobcode = 'AQW-06'
group by status
Edit: You can try use something like this
select count(x.candid) numofcandidates, x.statusnum
from
(select candid, max(case when status = 'Reject' then 3
when status = 'Interview' then 2
when status = 'Introduce' then 1 end) statusnum
from [tablename] t
where jobcode = 'AQW-06'
group by candid) x
group by x.statusnum;
What I actually did is to "translate" the status to a number, so I can use the highest status first. All you need to do then it to "translate" back the statusnum to the values of your table. In my opinion I would use a statusnum in my table directly
Try this:
;with reftable as
(select 1 'key', 'Introduce' 'val'
union
select 2 'key', 'Interview' 'val'
union
select 3 'key', 'Rejected' 'val'
),
cte as
(select e.candid, e.[status], row_number() over (partition by e.candid order by r.[key] desc) rn
from yourtable e
inner join reftable r on e.[status] = r.val
where e.[status] in ('Introduce','Interview','Rejected')
and e,jobcode = 'AQW-06')
select [status], count([status])
from cte
where rn = 1
group by [status]
Basically, we assign a numeric value to your text status to allow sorting. In the over clause, we sort by this numeric value in descending order to get the highest status of a candidate as you describe. Then, we just count the number of occurrences of each status.
Note that you can extend this to include values for status like 'Hire'. To do this, you will need to add it to the list in reftable with appropriate numeric value, and also add it to the filter in cte.
I want the count of Introduce, interviewed, rejected candidates of specific jobcode
The query below will return the results you need:
SELECT SUM(t.IsIntroduction) AS CountOfIntroductions,
SUM(t.IsInterview) AS CountOfInterviews,
SUM(t.IsRejected) AS CountOfRejections
FROM (
SELECT id,
CASE WHEN Status = 'Introduce' THEN 1 ELSE 0 END AS IsIntroduction,
CASE WHEN Status = 'Interview' THEN 1 ELSE 0 END AS IsInterview,
CASE WHEN Status = 'Reject' THEN 1 ELSE 0 END AS IsRejected
FROM [Tablename]
WHERE JobCode = 'AQW-06'
) AS t
Sample at this SQL Fiddle.