building historic table in hiveql with expiredate

building historic table in hiveql with expiredate - hive

I need to build som kind of historic based on table
with id and date
every date marks a change and the newest date the
active one,
table
historic_trans
id trans_date
22 20170510
22 20170502
22 20170412
I want to build a historic table where the newest row
get marks at active by adding a expiredate column as '99991231'
I can easy find the active ones by
select id, max(trans_date)trans_date, '99991231' as Expiredate, 'yes' as active
from historic_trans
where id = '22'
group by id
But i really need to set set the trans_date on the previous row
in my inactive rows
id trans_date Expiredate active
22 20170510 99991231 yes
22 20170502 20170510 no
22 20170412 20170502 no
So that expiredate reflects the change in the transaction
Can that be accomplished in pure hql/sql
i have been playing with following code but i am stuck in it
select historic_trans.id, historic_trans.trans_date,
case when aktiv.Expiredate = '99991231' then aktiv.Expiredate
else aktiv.Expiredate
end as Expiredate
from historic_trans
left outer join
(
select id, max(trans_date)trans_date, '99991231' as Expiredate, 'yes' as active
from historic_trans
where id = '22'
group by id
) aktiv on aktiv.id = historic_trans.id and aktiv.trans_date = historic_trans.trans_date
where historic_trans.id = '22'
any suggestions ?

select id
,trans_date
,lag (trans_date,1,date '9999-12-31') over w as Expiredate
,case when row_number () over w = 1 then 'yes' else 'no' end as active
from historic_trans
window w as (partition by id order by trans_date desc)
;
+----+------------+------------+--------+
| id | trans_date | expiredate | active |
+----+------------+------------+--------+
| 22 | 2017-05-10 | 9999-12-31 | yes |
| 22 | 2017-05-02 | 2017-05-10 | no |
| 22 | 2017-04-12 | 2017-05-02 | no |
+----+------------+------------+--------+

Related

SQL Query Inactive Users with last end date

This is a follow up from a question I asked about a year ago Old thread
The answers I got then have worked fine but now I discovered that I need to tweak the query to be able to get the latest end date for the inactive users.
So again here's a quick example table of users, some are active and some are inactive and some have several period of employment.
when someone is reemployed a new row will be added for that employment period.
Username will always be the same.
So I want to find which users that is disabled and doesn't have an active employment also if there is several period of employment I want the one that has the latest end date. One row per username with all the columns.
The database is SQL Server 2016.
Example table:
| username | name | active | Job title | enddate
+-----------+----------- +--------+-------------+----------
| 1111 | Jane Doe | 1 | CIO | 1/3/2022
| 1111 | Jane Doe | 0 | Janitor | 1/2/2018
| 1112 | Bob Doe | 1 | Coder | NULL
| 1113 | James Doe | 0 | Coder | 1/3/2018
| 1114 | Ray Doe | 1 | Manager | NULL
| 1114 | Ray Doe | 0 | Clerk | 2/2/2019
| 1115 | Emma Doe | 1 | Waiter | NULL
| 1116 | Sarah Doe | 0 | Greeter | 3/4/2016
| 1116 | Sarah Doe | 0 | Trainer | 4/5/2019
So for user 1116 I would ideally get one row with enddate 4/5/2019
The query I use from the answers in the old thread is this one:
;WITH NonActiveDisabledUsers AS
(
SELECT DISTINCT
U.username
FROM
UserEmployment AS U
WHERE
U.active = 0 AND
NOT EXISTS (SELECT 'no current active employment'
FROM UserEmployment AS C
WHERE U.username = C.username AND
C.active = 1 AND
(C.enddate IS NULL OR C.enddate >= CONVERT(DATE, GETDATE())))
)
SELECT
R.*
FROM
NonActiveDisabledUsers AS N
CROSS APPLY (
SELECT TOP 1 -- Just 1 record
U.*
FROM
UserEmployment AS U
WHERE
N.username = U.username AND
U.active = 0
ORDER BY
U.enddate DESC -- Determine which record should we display
) AS R
This gives me the right user and employment status but not the latest end date since it will get the first result for user 1116

We can use conditional aggregation with a window aggregate to get the number of active rows for this user.
We then filter to only inactive, and row-number the result by enddate taking the first row per group:
SELECT
username,
name,
active,
[Job title],
enddate
FROM (
SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY username ORDER BY enddate DESC)
FROM (
SELECT *,
CountOfActive = COUNT(CASE WHEN
Active = 1 AND
(enddate IS NULL OR enddate >= CONVERT(DATE, GETDATE())) THEN 1 END
) OVER (PARTITION BY username)
FROM UserEmployment
) AS t
WHERE CountOfActive = 0
) AS t
WHERE rn = 1;
Note that the row-numbering does not take into account nulls in enddate which would be sorted last. You would need a conditional ordering:
ROW_NUMBER() OVER (PARTITION BY username ORDER BY CASE WHEN enddate IS NULL THEN 0 ELSE 1 END ASC, enddate DESC)

Hmmm . . . I think you can just get the most recent record and check that it is not active:
select ue.*
from (select ue.*,
row_number() over (partition by user_id
order by active desc, enddate desc
) as seqnum
from UserEmployment ue
) ue
where seqnum = 1 and active = 0;

Here is a variant that will produce the desired result:
SELECT distinct username, max(enddate)
FROM UserEmployment as t1
WHERE
t1.active = 0 AND
NOT EXISTS (select username from UserEmployment as t2 WHERE active = 1 AND
(t2.enddate IS NULL OR t2.enddate >= CONVERT(DATE, GETDATE())) AND
t1.username = t2.username)
GROUP BY username

SQL logic to determine unsold inventory and corresponding available dates (Available to sell)

I am looking for advice on how to generate SQL to be used in SQL Server that will show available inventory to sell and the corresponding date that said inventory will be available. I am easily able to determine if we have inventory that is available immediately but can't wrap my head around what logic would be needed to determine future available quantities.
In the below table. The +/- column represents the weekly inbound vs outbound and the quantity available is a rolling SUM OVER PARTITION BY of the +/- column. I was able to get the immediate quantity available through this simple logic:
Case when Min(X.Qty_Available) > 0 Then Min(X.Qty_Available) else 0 END
AS Immediate_available_Qty
Table:
+-------------+---------------+---------------+------+---------------+
| Item Number | Item Name | week_end_date | +/- | Qty_Available |
+-------------+---------------+---------------+------+---------------+
| 123456 | Fidget Widget | 7/13/2019 | 117 | 117 |
| 123456 | Fidget Widget | 7/20/2019 | 49 | 166 |
| 123456 | Fidget Widget | 7/27/2019 | -7 | 159 |
| 123456 | Fidget Widget | 8/3/2019 | -12 | 147 |
| 123456 | Fidget Widget | 8/10/2019 | -1 | 146 |
| 123456 | Fidget Widget | 8/17/2019 | 45 | 191 |
| 123456 | Fidget Widget | 8/24/2019 | -1 | 190 |
| 123456 | Fidget Widget | 8/31/2019 | -1 | 189 |
| 123456 | Fidget Widget | 9/7/2019 | 50 | 239 |
+-------------+---------------+---------------+------+---------------+
My desired results of this query would be as follows:
+-----------+-----+
| Output | Qty |
+-----------+-----+
| 7/13/2019 | 117 |
| 7/20/2019 | 29 |
| 8/17/2019 | 43 |
+-----------+-----+
the second availability is determined by taking the first available quantity of 117 out of each line in Qty_Available column and finding the new minimum. If the new min is Zero, find the next continuously positive string of data (that runs all the way to the end of the data). Repeat for the third_available quantity and then stop.
I was on the thought train of pursuing RCTE logic but don't want to dive into that rabbit hole if there is a better way to tackle this issue and I'm not even sure the RCTE work for this problem?

This should return your expected result:
SELECT Item_Number, Min(week_end_date), Sum("+/-")
FROM
(
SELECT *
-- put a positive value plus all following negative values in the same group
-- using a Cumulative Sum over 0/1
,Sum(CASE WHEN "+/-" > 0 THEN 1 ELSE 0 end)
Over (PARTITION BY Item_Number
ORDER BY week_end_date
ROWS UNBOUNDED PRECEDING) AS grp
FROM my_table
) AS dt
WHERE grp <= 3 -- only the 1st 3 groups
GROUP BY Item_Number, grp

So here's what I came up with. I know this is poor, I didn't want to leave this thread high and dry and maybe I can get more insight on a better path. Please know that I've never had any real training so I don't know what I don't know.
I ended up running this into a temp table and altering the commented out section in table "A". then re-running that into a temp table.
Select
F.Upc,
F.name,
F.Week_end_date as First_Available_Date,
E.Qty_Available_1
From
(
Select Distinct
D.Upc,
D.name,
Case When Min(D.Rolling_Qty_Available) Over ( PARTITION BY D.upc) < 1 then 0 else
Min(D.Rolling_Qty_Available) Over ( PARTITION BY D.upc) END as Qty_Available_1,
Case When Max(D.Look_up_Ref) Over ( PARTITION BY D.upc) = 0 then '-1000' else
Max(D.Look_up_Ref) Over ( PARTITION BY D.upc) END as Look_up_Ref_1
From
(
Select
A.Upc,
A.name,
A.Week_end_Date,
A.Rolling_Qty_Available,
CASE WHEN
C.Max_Row = A.Row_num and A.[Rolling_Qty_Available] >1 THEN 1
ELSE
CASE WHEN
Sum(A.Calc_Row_Thing) OVER (Partition by A.UPC Order by A.Row_Num DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND Current ROW
) = (C.Max_Row - A.Row_num + 1)
THEN
C.Max_Row - A.Row_num + 1
ELSE 0 END
END as Look_up_Ref
FROM (
Select
G.Upc,
G.Name,
G.Week_End_Date,
G.Row_num,
G.Calc_Row_Thing,
G.Rolling_Qty_Available
--CASE When (G.Rolling_Qty_Available -
--isnull(H.Qty_Available_1,0)) > 0 then 1 else - 0 END as
--Calc_Row_Thing,
From [dbo].[ATS_item_detail_USA_vw] as G
--Left Join [dbo].[tmp_ats_usa_qty_1] as H on G.upc = H.upc
) AS A --Need to subtract QTY 1 out of here and below
join (
SELECT
B.upc,
Max(Row_num) AS Max_Row
FROM [dbo].[ATS_item_detail_USA_vw] AS B
GROUP BY B.upc
) as C on A.upc = C.upc
) as D
GROUP BY
D.Upc,
D.name,
D.Rolling_Qty_Available,
D.Look_up_Ref
HAVING Max(D.Look_up_Ref) > 1
) as E
Left join
(
SELECT
A.Upc,
A.name,
A.Week_end_Date,
A.Rolling_Qty_Available,
CASE WHEN
C.Max_Row = A.Row_num and A.[Rolling_Qty_Available] >1 THEN 1
ELSE
CASE WHEN
Sum(A.Calc_Row_Thing) OVER (Partition by A.UPC Order by A.Row_Num DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND Current ROW
) = (C.Max_Row - A.Row_num + 1)
THEN
C.Max_Row - A.Row_num + 1
ELSE 0 END
END as Look_up_Ref
From (
Select
G.Upc,
G.Name,
G.Week_End_Date,
G.Row_num,
G.Calc_Row_Thing,
G.Rolling_Qty_Available
--CASE When (G.Rolling_Qty_Available -
--isnull(H.Qty_Available_1,0)) > 0 then 1 else - 0 END as
--Calc_Row_Thing,
From [dbo].[ATS_item_detail_USA_vw] as G
--Left Join [dbo].[tmp_ats_usa_qty_1] as H on G.upc = H.upc
) as A --subtract qty_1 out the start qty 2 calc
join (
SELECT
B.upc,
Max(Row_num) as Max_Row
FROM [dbo].[ATS_item_detail_USA_vw] as B
GROUP BY B.upc
) AS C ON A.upc = C.upc
) AS F ON E.upc = F.upc and E.Look_up_Ref_1 = F.Look_up_Ref

Building Monthly aggregate from Daily table

I have written the code below to check for unique customers for the last 30 days. How would I repurpose this code to check for unique customers on a month start date. I am trying to build a monthly aggregate using table_billing that has a daily grain. Can you please guide me.
select 'context.processingDate' as rptg_dt, COALESCE(item_type,'ALL_ITEMS') as item_type, unique_customers
from (select
(case when item_type_code in ('A') then 'Books'
when item_type_code in ('B','C') then 'Toys'
else 'Fruits' end
) as item_type,
count(distinct person_id) as unique_customers
from table_billing
where rptg_dt between cast('context.processingDate' as date format 'YYYY-MM-DD')-30
AND cast('context.processingDate' as date format 'YYYY-MM-DD')
and item_type_code in ('A','B','C','D','E')
group by CUBE(1)
) a;
Desired Output:
Monthly Start Date | Item Type | Unique Customers
5/1/14 | Books | 100
5/1/14 | Toys | 80
5/1/14 | Fruits | 25
5/1/14 | ALL_ITEMS | 175
6/1/14 | Books | 80
6/1/14 | Toys | 60
6/1/14 | Fruits | 40
6/1/14 | ALL_ITEMS | 95
I am looking to re-write this query as follows:
select 'context.processingDate' as month_start_dt,
COALESCE(item_type,'ALL_ITEMS') as item_type, unique_customers
from (select
(case when item_type_code in ('A') then 'Books'
when item_type_code in ('B','C') then 'Toys'
else 'Fruits' end
) as item_type,
count(distinct person_id) as unique_customers
from table_billing
where month_start_dt = cast('context.processingDate' as date format'YYYY-MM-DD') and item_type_code in ('A','B','C','D','E') group by CUBE(1)) a;
How would I tweak the query to make this possible? Thank you!

Remove duplicate rows query result except for one in Microsoft SQL Server?

How would I delete all duplicate month from a Microsoft SQL Server Table?
For example, with the following syntax I just created:
SELECT * FROM Cash WHERE Id = '2' AND TransactionDate between '2014/07/01' AND '2015/02/28'
and the query result is:
+----+-------------------------+
|Id | TransactionDate |
+----+-------------------------+
| 2 | 2014-07-22 00:00:00.000 |
| 2 | 2014-08-09 00:00:00.000 |
| 2 | 2014-08-25 00:00:00.000 |
| 2 | 2014-08-29 00:00:00.000 |
| 2 | 2015-01-27 00:00:00.000 |
| 2 | 2015-01-28 00:00:00.000 |
+----+-------------------------+
How would I remove duplicates month which is only return any 1 value for any 1 month each, like this result:
+----+-------------------------+
|Id | TransactionDate |
+----+-------------------------+
| 2 | 2014-07-22 00:00:00.000 |
| 2 | 2014-08-09 00:00:00.000 |
| 2 | 2015-01-27 00:00:00.000 |
+----+-------------------------+

You can do it with the help of ROW_NUMBER.
This will tell you which are the rows you are going to keep
SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28'
You can delete the other rows with a CTE.
with myCTE (id,transactionDate, firstTrans) AS (
SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28'
)
delete from myCTE where firstTrans <> 1
Will only keep one transaction for each month of each year.
EDIT:
filter by the row_number and will only return the rows you want
select id, transactionDate from (SELECT id,transactionDate, ROW_NUMBER() OVER ( PARTITION BY YEAR(TransactionDate ),MONTH(TransactionDate ) ORDER BY TransactionDate ) firstTrans
FROM Cash
WHERE Id = '2' AND
TransactionDate between '2014/07/01' AND '2015/02/28') where firstTrans = 1

When you run this query you will get the highest Id for each month in each year.
SELECT MAX(<IdColumn>) AS Id, YEAR(<DateColumn>) AS YE, MONTH(<DateColumn>) AS MO FROM <YourTable>
GROUP BY YEAR(<DateColumn>), MONTH(<DateColumn>)
If needed, for example, you can late delete rows that their Id is not in this query.

Select only the first row per month
SELECT *
FROM Cash c
WHERE c.Id = '2'
AND c.TransactionDate between '2014/07/01' AND '2015/02/28'
AND NOT EXISTS ( SELECT 'a'
FROM Cash c2
WHERE c2.Id = c.Id
AND YEAR(c2.TransactionDate) * 100 + MONTH(c2.TransactionDate) = YEAR(c.TransactionDate) * 100 + MONTH(c.TransactionDate)
AND c2.TransactionDate < c.TransactionDate
)

Select distinct records with Min Date from two tables with Left Join

I'm trying to retrieve all distinct AccountId’s as well as the earliest InsertDate for each. Occasionally the AccountId is not known and although the transactions may be distinct I want to bucket all of the ‘-1’s into their own group.
This is what I have attempted so far along with the schemas.
CREATE TABLE #tmpResults (
Trans Varchar(12),
AccountId Varchar(50),
EarlyDate DateTime DEFAULT getdate(), CardType Varchar(16))
insert #tmpResults
select [Trans] = convert(varchar(12),'CashSale')
, [AccountId] = b.AccountId
, [EarlyDate] = min(b.InsertDate)
, case when c.name LIKE '%VISA%' then 'VISA'
when c.name LIKE '%MasterCard%' then 'MasterCard'
when c.name LIKE '%AMEX%' then 'AMEX'
else 'Other'
end as [CardType]
from TransBatch b
left join CardVer_3 c WITH (NOLOCK) ON c.Id = B.BatchId
left join TransBatch b2
on (b.accountid = b2.accountid and (b.InsertDate > b2.InsertDate or b.InsertDate = b2.InsertDate))
and b2.accountid is NULL
group by b.accountid, b.InsertDate,c.name
order by b.accountid DESC
select * from #tmpResults
The table schemas are like so:
**TransBatch**
RecordId |BatchId |InsertDate | AccountId | AccNameHolder
6676 | 11 | 2012-11-01 05:19:04.000 | 12345 | Account1
6677 | 11 | 2012-11-01 05:19:04.000 | 12345 | Account1
6678 | 11 | 2012-11-01 05:19:04.000 | 55555 | Account2
6679 | 11 | 2012-11-01 05:19:04.000 | -1 | NULL
6680 | 12 | 2012-11-02 05:20:04.000 | 12345 | Account1
6681 | 12 | 2012-11-02 05:20:04.000 | 55555 | Account2
6682 | 13 | 2012-11-04 06:20:04.000 | 44444 | Account3
6683 | 14 | 2012-11-05 05:30:04.000 | 44444 | Account3
6684 | 14 | 2012-11-05 05:31:04.000 | -1 | NULL
**CardVer_3**
BatchId |Name
11 |MasterCard
12 |Visa
13 |AMEX
14 |GoCard
This will be an intermediate table, the output is planned to look like the attached.

Gordon, I made some very minor changes to your suggestion and believe I have the correct output: http://www.sqlfiddle.com/#!3/cfbc3/7/0 . Thank you very much. I'm not at all familiar with the windows functions so I'm going to brush up on these.
The code is here:
select 'CashSale' as [Trans],
AccountId,
min(InsertDateTime),
(case when name LIKE '%VISA%' then 'VISA'
when name LIKE '%MasterCard%' then 'MasterCard'
when name LIKE '%AMEX%' then 'AMEX'
else 'Other'
end) as [CardType]
from (select AccountId, InsertDateTime, c.name,
row_number() over (partition by AccountId order by insertDateTime asc) as seqnum
from TransBatch b left join
CardVer_3 c WITH (NOLOCK) ON c.batchId = B.BatchId
) t
where seqnum = 1
group by t.accountid, t.name
The next steps are to dump this into a temp table and try and get the output looking like the attached excel screen.

It sounds like you are trying to get the full record for the minimum insert date time. For this, you want to use windows functions:
select 'CashSale' as Trans,
AccountId,
min(InsertDate),
(case when name LIKE '%VISA%' then 'VISA'
when name LIKE '%MasterCard%' then 'MasterCard'
when name LIKE '%AMEX%' then 'AMEX'
else 'Other'
end) as [CardType]
from (select AccountId, InsertDate, c.name,
row_number() over (partition by AccountId order by insertDate desc) as seqnum
from TransBatch b left join
CardVer_3 c WITH (NOLOCK)
ON c.Id = B.BatchId
) t
where seqnum = 1
I'm taking a guess that "CashSale" means that the credit card did not match. The TransId is then either the recordId or "CashSale".

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

building historic table in hiveql with expiredate - hive

Related

SQL Query Inactive Users with last end date

SQL logic to determine unsold inventory and corresponding available dates (Available to sell)

Building Monthly aggregate from Daily table

Remove duplicate rows query result except for one in Microsoft SQL Server?

Select distinct records with Min Date from two tables with Left Join

Categories

Resources