Get Status value based on timesheet date - sql

I have an Assets table that has an audit log of when a particular status of that Asset changes Status... so look's something similar to this
AssetId CapexStatus Date
------- ----------- -----
AM706 1 2017-02-03
AM706 0 2017-02-07
AM706 1 2017-02-10
I then have a timesheet table which has the AssetID and a transaction date on it. I basically want to pull the Capex Status out of the AssetLog table based on the AssetId and the current Capex Status at the time of the transaction date. eg. If the transaction date is 8th Feb then the Capex Status should be "0".
Timesheet table
TimesheetId AssetId TimesheetDate
----------- ------- -------------
1 AM706 2017-02-01
2 AM706 2017-02-08
3 AM706 2017-02-12

I think something like this might do it:
select
t.*,
a.CapexStatus
from
TimeSheet t
outer apply (Select top 1 * from AssetLog al
where
al.AssetID = t.AssetID
and al.Date < t.TimesheetDate
order by al.Date desc) a

create view vwMaxCapex
as
select top 1 capexStatus, date, AssetId from AssetsLog
order by date asc
go
select a.AssetId, a.timesheetDate,
(select capexstatus
from vwMaxCapex
where date<=a.timesheetDate and assetId=a.AssetId) capex
from timetable a

Related

Count the number of transactions per month for an individual group by date Hive

I have a table of customer transactions where each item purchased by a customer is stored as one row. So, for a single transaction there can be multiple rows in the table. I have another col called visit_date.
There is a category column called cal_month_nbr which ranges from 1 to 12 based on which month transaction occurred.
The data looks like below
Id visit_date Cal_month_nbr
---- ------ ------
1 01/01/2020 1
1 01/02/2020 1
1 01/01/2020 1
2 02/01/2020 2
1 02/01/2020 2
1 03/01/2020 3
3 03/01/2020 3
first
I want to know how many times customer visits per month using their visit_date
i.e i want below output
id cal_month_nbr visit_per_month
--- --------- ----
1 1 2
1 2 1
1 3 1
2 2 1
3 3 1
and what is the avg frequency of visit per ids
ie.
id Avg_freq_per_month
---- -------------
1 1.33
2 1
3 1
I tried with below query but it counts each item as one transaction
select avg(count_e) as num_visits_per_month,individual_id
from
(
select r.individual_id, cal_month_nbr, count(*) as count_e
from
ww_customer_dl_secure.cust_scan
GROUP by
r.individual_id, cal_month_nbr
order by count_e desc
) as t
group by individual_id
I would appreciate any help, guidance or suggestions
You can divide the total visits by the number of months:
select individual_id,
count(*) / count(distinct cal_month_nbr)
from ww_customer_dl_secure.cust_scan c
group by individual_id;
If you want the average number of days per month, then:
select individual_id,
count(distinct visit_date) / count(distinct cal_month_nbr)
from ww_customer_dl_secure.cust_scan c
group by individual_id;
Actually, Hive may not be efficient at calculating count(distinct), so multiple levels of aggregation might be faster:
select individual_id, avg(num_visit_days)
from (select individual_id, cal_month_nbr, count(*) as num_visit_days
from (select distinct individual_id, visit_date, cal_month_nbr
from ww_customer_dl_secure.cust_scan c
) iv
group by individual_id, cal_month_nbr
) ic
group by individual_id;

Check for condition in GROUP BY?

Take this example data:
ID Status Date
1 Pending 2/10/2020
2 Pending 2/10/2020
3 Pending 2/10/2020
2 Pending 2/10/2020
2 Pending 2/10/2020
1 Complete 2/15/2020
I need an SQL statement that will group all the data but bring back the current status. So for ID 1 the group by needs a condition that only returns the Completed row and also returned the pending rows for ID 2 and 3.
I am not 100% how to write in the condition for this.
Maybe something like:
SELECT ID, Status, Date
FROM table
GROUP BY ID, Status, Date
ORDER BY ID
The problem with this is the resulting data would look like:
ID Status Date
1 Pending 2/10/2020
1 Complete 2/15/2020
2 Pending 2/10/2020
3 Pending 2/10/2020
But I need:
ID Status Date
1 Complete 2/15/2020
2 Pending 2/10/2020
3 Pending 2/10/2020
What can I do to check for the Completed status so I can only return Completed in the group by?
Do only GROUP BY the ID column. Use MIN() to chose Complete before Pending.
SELECT ID, MIN(Status)
FROM table
GROUP BY ID
ORDER BY ID
To use Date as 'last row indicator', you can:
DECLARE #Src TABLE (
ID int,
Status varchar(20),
Date Date
)
INSERT #Src VALUES
(1, 'Pending' ,'2/10/2020'),
(1, 'Complete' ,'2/15/2020'),
(2, 'Pending' ,'2/10/2020'),
(3, 'Pending' ,'2/10/2020');
SELECT TOP 1 WITH TIES *
FROM #Src
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date DESC)
Result:
ID Status Date
----------- -------------------- ----------
1 Complete 2020-02-15
2 Pending 2020-02-10
3 Pending 2020-02-10

How to flag active customers who have at least one transaction per month?

Objective is to create a flag for active customers.
An active customer is someone who has atleast one transaction every month.
Time frame - May 2018 to May 2019
Data is at transaction level
-------------------------------------
txn_id | txn_date | name | amount
-------------------------------------
101 2018-05-01 ABC 100
102 2018-05-02 ABC 200
-------------------------------------
output should be like this -
----------------
name | flag
----------------
ABC active
BCF inactive
You can use aggregation to get the active customers:
select name
from t
where txn_date >= '2018-05-01' and txn_date < '2019-06-01'
group by name
having count(distinct last_day(txn_date)) = 13 -- all months accounted for
EDIT:
If you want a flag, just move the condition to a case expression:
select name,
(case when count(distinct case when txn_date >= '2018-05-01' and txn_date < '2019-06-01' then last_day(txn_date) end) = 13
then 'active' else 'inactive'
end) as flag
from t;

SQL JOIN - retrieve MAX DateTime from second table and the first DateTime after previous MAX for other value

I have issue with creating a proper SQL expression.
I have table TICKET with column TICKETID
TICKETID
1000
1001
I then have table STATUSHISTORY from where I need to retrieve what was the last time (maximum time) when that ticket entered VENDOR status (last VENDOR status) and when it exited VENDOR status (by exiting VENDOR status I mean the first next INPROG status, but only first INPROG after the VENDOR status, it's always INPROG the next status after VENDOR status). Also it is also possible that VENDOR status for ID does not exist at all in STATUSHISOTRY (then nulls should be returned), but INPROG exists always - it can be before but also and after VENDOR status, if ID is not anymore in VENDOR status.
Here is the example of STATUSHISTORY.
ID TICKETID STATUS DATETIME
1 1000 INPROG 01.01.2017 10:00
2 1000 VENDOR 02.01.2017 10:00
3 1000 INPROG 03.01.2017 10:00
4 1000 VENDOR 04.01.2017 10:00
5 1000 INPROG 05.01.2017 10:00
6 1000 HOLD 06.01.2017 10:00
7 1000 INPROG 07.01.2017 10:00
8 1001 INPROG 02.02.2017 10:00
9 1001 VENDOR 03.02.2017 10:00
10 1001 INPROG 04.02.2017 10:00
11 1001 VENDOR 05.02.2017 10:00
So the result when doing the query from TICKET table and doing the JOIN with table STATUSHISTORY should be:
ID VENDOR_ENTERED VENDOR_EXITED
1000 04.01.2017 10:00 05.01.2017 10:00
1001 05.02.2017 10:00 null
Because for ID 1000 last VENDOR status was at 04.01.2017 and the first INPROG status after the VENDOR status for that ID was at 05.01.2017 while for ID 1001 the last VENDOR status was at 05.02.2017 and after that INPROG status did not happen yet.
If VENDOR did not exist then both columns should be null in result.
I am really stuck with this, trying different JOINs but without any progress.
Thank you in advance if you can help me.
You can do this with window functions. First, assign a "vendor" group to the tickets. You can do this using a cumulative sum counting the number of "vendor" records on or before each record.
Then, aggregate the records to get one record per "vendor" group. And use row numbers to get the most recent records. So:
with vg as (
select ticket,
min(datetime) as vendor_entered,
min(case when status = 'INPROG' then datetime end) as vendor_exitied
from (select sh.*,
sum(case when status = 'VENDOR' then 1 else 0 end) over (partition by ticketid order by datetime) as grp
from statushistory sh
) sh
group by ticket, grp
)
select vg.tiketid, vg.vendor_entered, vg.vendor_exited
from (select vg.*,
row_number() over (partition by ticket order by vendor_entered desc) as seqnum
from vg
) vg
where seqnum = 1;
You can aggregate to get max time, then join onto all of the date values higher than that time, and then re-aggregate:
select a.TicketID,
a.VENDOR_ENTERED,
min( EXIT_TIME ) as VENDOR_EXITED
from (
select TicketID,
max( DATETIME ) as VENDOR_ENTERED
from StatusHistory
where Status = 'VENDOR'
group by TicketID
) as a
left join
(
select TicketID,
DATETIME as EXIT_TIME
from StatusHistory
where Status = 'INPROG'
) as b
on a.TicketID = b.TicketID
and EXIT_TIME >= a.VENDOR_ENTERED
group by a.TicketID,
a.VENDOR_ENTERED
DB2 is not supported in SQLfiddle, but a standard SQL example can be found here.

MAX on group returns multiple values with same date but different times

I have followed many of the excellent pieces of advise on this site about selecting the MAX from a group of rows.
I have a history file and I only want the top date and comments for each project number. I am creating a derived table in a Boxi universe from this information. It all goes pretty well but if there are two entries for the same day but with different times they are both returned. This duplicates that entry on the subsequent report. Is there some way to make the MAX command go down to the time level of the date field?
Database is SQL Server 2005
-------------Sql used for derived table
Select
Projectno, Comment, CreatedOn
from
ReportHistory
Where
ReportHistory.ItemName=('ProjectCode1')
and
CreatedOn in(Select max(CreatedOn) FROM ReportHistory group by Projectno)
-------------------Example database
Projectno Comment Created on
1 Started 2013-01-04 11:04:00
2 Late 2013-01-06 11:22:00
3 Late 2013-01-07 11:06:00
1 On Time 2013-01-08 11:01:00 *these two both get selected*
1 Late 2013-01-08 12:05:00 *these two both get selected*
3 Back on schedule 2013-01-08 14:20:00
2 Still overdue 2013-01-09 09:01:00
MAX on a DATETIME data type do obviously take the time into account, that is not what's wrong with your query. The problem is that you are not ensuring that the max value for CreatedOn is for the correct ProjectNo. You could use analytical functions for this:
;WITH CTE AS
(
SELECT Projectno,
Comment,
CreatedOn,
ROW_NUMBER() OVER(PARTITION BY ProjectNo ORDER BY CreatedOn DESC) RN
FROM ReportHistory
WHERE ReportHistory.ItemName = 'ProjectCode1'
)
SELECT Projectno, Comment, CreatedOn
FROM CTE
WHERE RN = 1
Query if there are no same projectno with the same date:
SQLFIDDLEExample
SELECT h.Projectno,
h.Comment,
h.[Created on]
FROM ReportHistory h
WHERE h.[Created on] =(Select max(h2.[Created on])
FROM ReportHistory h2
WHERE h2.Projectno = h.Projectno )
ORDER BY h.Projectno
Result:
| PROJECTNO | COMMENT | CREATED ON |
-----------------------------------------------------------------
| 1 | Late | January, 08 2013 12:05:00+0000 |
| 2 | Still overdue | January, 09 2013 09:01:00+0000 |
| 3 | Back on schedule | January, 08 2013 14:20:00+0000 |
Query if there are same projectno with the same date:
SELECT h.Projectno,
MAX(h.Comment) AS Comment,
h.[Created on]
FROM ReportHistory h
WHERE h.[Created on] =(Select max(h2.[Created on])
FROM ReportHistory h2
WHERE h2.Projectno = h.Projectno )
GROUP BY h.Projectno,
h.[Created on]
ORDER BY h.Projectno
I think you receive copies when dates at different projects are identical.
For eg. add in your data (4, 'On Time', '2013-01-08 11:01:00')
Then result will be SQLFiddle
But you need this result SQLFiddle
SELECT *
FROM ReportHistory t
WHERE t.ItemName=('ProjectCode1')
AND EXISTS (
SELECT 1
FROM ReportHistory
WHERE projectNo = t.projectNo
GROUP BY projectNo
HAVING MAX(CreatedOn) = t.CreatedOn
)