New Column repeating value based on max value from another column - hive

I have a table 'mytable' which has results similar to the below
currenttime racetype raceid
2018-01-01 03:15:00 gold 22
2018-01-01 04:15:00 silver 22
2019-01-01 04:15:00 bronze 22
2017-01-02 11:44:00 platinum 22
I am trying to create another column based on the max current time. It should grab the value of racetype from the max current time and repeate that entry for all items in the new column similar to the below
currenttime racetype raceid besttype
2018-01-01 03:15:00 gold 22 bronze
2018-01-01 04:15:00 silver 22 bronze
2019-01-01 04:15:00 bronze 22 bronze
2017-01-02 11:44:00 platinum 22 bronze
And if there are other race id's it should do the same for those
ex
currenttime racetype raceid besttype
2018-01-01 03:15:00 gold 22 bronze
2018-01-01 04:15:00 silver 22 bronze
2019-01-01 04:15:00 bronze 22 bronze
2017-01-02 11:44:00 platinum 22 bronze
2011-01-01 03:15:00 gold 09 silver
2022-01-01 04:15:00 silver 09 silver
2002-01-01 04:15:00 bronze 09 silver
Currently i have a query
select mt.raceid, tt.racetype, MAX(tt.currenttime)
OVER (PARTITION by mt.raceid)
from mytable mt
join tabletwo tt on mt.id = tt.id
where mt.raceid = 22
This query is not putting out the expected it is outputting
raceid racetype col0
22 gold 2019-01-01 04:15:00
22 silver 2019-01-01 04:15:00
22 platinum 2019-01-01 04:15:00
22 bronze 2019-01-01 04:15:00
How can i achieve the above expected results shown in the 2nd and 3rd examples?

Use first_value analytic function:
select currenttime, racetype, raceid,
first_value(racetype) over(partition by raceid order by currenttime desc) as besttype
from mytable
Or last_value:
select currenttime, racetype, raceid,
last_value(racetype) over(partition by raceid order by currenttime) as besttype
from mytable

Related

SQL-Aggregate Timeseries Table (HourOfDay, Val) to Average Value of HourOfDay by Weeekday (fi. Avg of Mondays 10:00-11:00, 11:00-12:00,...,Tue...)

So far I made an SQL query that provides me with a table containing the amount of customers handled for each hour of the day - given a arbitrary start and an end datetime value (from Grafana interface). The result might be over many weeks. My goal is to implement an hourly heatmap by weekday with averaged values.
How do I aggregate those customer per hour to show the average value of that hours per weekday?
So let's say I got 24 values per day over 19 days. How do I aggregate so I get 24 values for each mon, tue, wed, thu, fri, sat, sun - each hour representing the average value for those days?
Also only use data of full weeks, so strip leading and trailing days, that are not part of a fully represented week (so same amount of individual weekdays representing an average value).
Here is a segment on how the return of my SQL query looks so far. (hour of each day, number of customers):
...
2021-12-13 11:00:00 | 0
2021-12-13 12:00:00 | 3
2021-12-13 13:00:00 | 4
2021-12-13 14:00:00 | 4
2021-12-13 15:00:00 | 7
2021-12-13 16:00:00 | 17
2021-12-13 17:00:00 | 12
2021-12-13 18:00:00 | 18
2021-12-13 19:00:00 | 15
2021-12-13 20:00:00 | 8
2021-12-13 21:00:00 | 10
2021-12-13 22:00:00 | 1
2021-12-13 23:00:00 | 0
2021-12-14 00:00:00 | 0
2021-12-14 01:00:00 | 0
2021-12-14 02:00:00 | 0
2021-12-14 03:00:00 | 0
2021-12-14 04:00:00 | 0
2021-12-14 05:00:00 | 0
2021-12-14 06:00:00 | 0
2021-12-14 07:00:00 | 0
2021-12-14 08:00:00 | 0
2021-12-14 09:00:00 | 0
2021-12-14 10:00:00 | 12
2021-12-14 11:00:00 | 12
2021-12-14 12:00:00 | 19
2021-12-14 13:00:00 | 11
2021-12-14 14:00:00 | 11
2021-12-14 15:00:00 | 12
2021-12-14 16:00:00 | 9
2021-12-14 17:00:00 | 2
...
So (schematically, example data) startDate 2021-12-10 11:00 to endDate 2021-12-31 17:00
-------------------------------
...
Mon 2021-12-13 12:00 | 3
Mon 2021-12-13 13:00 | 4
Mon 2021-12-13 14:00 | 4
...
Mon 2021-12-20 12:00 | 1
Mon 2021-12-20 13:00 | 6
Mon 2021-12-20 13:00 | 2
...
Mon 2021-12-27 12:00 | 2
Mon 2021-12-27 13:00 | 2
Mon 2021-12-27 13:00 | 3
...
-------------------------------
into this:
strip leading fri 10., sat 11., sun 12.
strip trailing tue 28., wen 29., thu 30., fri 31.
average hours per weekday
-------------------------------
...
Mon 12:00 | 2
Mon 13:00 | 4
Mon 14:00 | 3
...
Tue 12:00 | x
Tue 13:00 | y
Tue 13:00 | z
...
-------------------------------
My approach so far:
WITH CustomersPerHour as (
SELECT dateadd(hour, datediff(hour, 0, Systemdatum),0) as DayHour, Count(*) as C
FROM CustomerList
WHERE CustomerID > 0
AND Datum BETWEEN '2021-12-010T11:00:00Z' AND '2021-12-31T17:00:00Z'
AND EntryID IN (62,65)
AND CustomerID IN (SELECT * FROM udf_getActiveUsers())
GROUP BY dateadd(hour, datediff(hour, 0, Systemdatum), 0)
)
-- add null values on missing data/insert missing hours
SELECT DATEDIFF(second, '1970-01-01', dt.Date) AS time, C as Customers
FROM dbo.udf_generateHoursTable('2021-12-03T18:14:56Z', '2022-03-13T18:14:56Z') as dt
LEFT JOIN CustomersPerHour cPh ON dt.Date = cPh.DayHour
ORDER BY
time ASC
Hi simpliest solution is just do what you have written in example. Create custom base for aggregation.
So first step is to prepare your data in aggregated table with Date & Hour precision & customer count.
Then create base.
This is example of basic idea:
-- EXAMPLE
SELECT
DATENAME(WEEKDAY, GETDATE()) + ' ' + CAST(DATEPART(HOUR, GETDATE()) + ':00' AS varchar(8))
-- OUTPUT: Sunday 21:00
You can concatenate data and then use it in GROUP BY clause.
Adjust this query for your use case:
SELECT
DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00' as base
,SUM(...) as sum_of_whatever
,AVG(...) as avg_of_whatever
FROM <YOUR_AGG_TABLE>
GROUP BY DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00'
This create base exactly as you wanted.
You can use this logic to create other desired agg. bases.

unstack start and end date to month

cust_id start end subs_price_p_month
1 2019-01-01 2019-12-10 50.00
1 2020-02-03 2020-08-05 39.99
2 2019-12-11 2020-11-08 29.99
I would like to "unstack" the table above, so that each row contains the subs price for 1 month:
cust_id month subs_price_p_month
1 2019-01-01 50.00
1 2019-02-01 50.00
1 2019-03-01 50.00
....
1 2019-12-01 50.00
1 2020-02-01 39.99
1 2020-03-01 39.99
1 2020-04-01 39.99
....
1 2020-08-01 39.99
2 2019-12-01 29.99
2 2020-01-01 29.99
2 2020-02-01 29.99
...
2 2020-11-01 29.99
Text explanation:
Customer ID 1 has 2 subscriptions with different prices. The first one starts from 1 January 2020 until December 2020, second one from 3 February 2021 to 5 August 2020.
Customer ID 2 has only 1 subscription, from December 2019 to November 2020.
I want each row equals 1 customer ID, 1 month for easier data manipulation.
generate_series() generates the sequence of dates that you need. However, it is tricky to get the date arithmetic just right for your results.
You seem to want:
select t.cust_id, yyyymm, t.subs_price_p_month
from t cross join lateral
generate_series( date_trunc('month', startd),
date_trunc('month', endd),
interval '1 month'
) gs(yyyymm);
However, if there are multiple rows within the same month, you would get duplicates. This question does not clarify what to do in that case. If you need to handle that case, I would suggest asking a new question.
Here is a db<>fiddle.
Use generate_series with an interval of 1 month and the range from start and end, e.g.:
SELECT
cust_id,
generate_series(start,end,interval '1 month'),
subs_price_p_month
FROM t;
cust_id | generate_series | subs_price_p_month
---------+------------------------+--------------------
1 | 2019-01-01 00:00:00+01 | 50.00
1 | 2019-02-01 00:00:00+01 | 50.00
1 | 2019-03-01 00:00:00+01 | 50.00
1 | 2019-04-01 00:00:00+02 | 50.00
1 | 2019-05-01 00:00:00+02 | 50.00
1 | 2019-06-01 00:00:00+02 | 50.00
1 | 2019-07-01 00:00:00+02 | 50.00
...
Perhaps even formatting the dates to 'Month YYYY' would better display your result set:
SELECT
cust_id,
to_char(generate_series(start_,end_,interval '1 month'),'Month YYYY')
subs_price_p_month
FROM t;
cust_id | to_char | subs_price_p_month
---------+----------------+--------------------
1 | January 2019 | 50.00
1 | February 2019 | 50.00
1 | March 2019 | 50.00
1 | April 2019 | 50.00
1 | May 2019 | 50.00
1 | June 2019 | 50.00
1 | July 2019 | 50.00
...
Demo: db<>fiddle

How to find the Effective end Date for the below table using select statement only

How to find the Effective end Date for the below table using select statement only
This is the actual table:
EMID ENAME DEPT_NO EFDT
101 ANUJ 10 1/1/2018
101 ANUJ 11 1/1/2020
101 ANUJ 12 5/1/2020
102 KUNAL 12 1/1/2019
102 KUNAL 14 1/1/2020
102 KUNAL 15 5/1/2020
103 AJAY 11 1/1/2018
103 AJAY 12 1/1/2020
104 RAJAT 10 1/1/2018
104 RAJAT 12 1/1/2020
This is desired output:
EMID ENAME DEPTNO EFDT EF_ENDT
101 ANUJ 10 1/1/2018 12/31/2019
101 ANUJ 11 1/1/2020 4/30/2020
101 ANUJ 12 5/1/2020 NULL
102 KUNAL 12 1/1/2019 12/31/2019
102 KUNAL 14 1/1/2020 4/30/2020
102 KUNAL 15 5/1/2020 NULL
103 AJAY 11 1/1/2018 12/31/2019
103 AJAY 12 1/1/2020 NULL
104 RAJAT 10 1/1/2018 12/31/2019
104 RAJAT 12 1/1/2020 NULL
The EF_ENDT needs to be populated using the statement only.
How can we do this?
This code can be generic for all Database
Basically, you want a lead and then to subtract one day. The standard SQL for this is:
select t.*,
lead(efdt) over (partition by emid order by efdt) - interval '1 day' as ef_enddt
from t;
Date/time function vary significantly among databases. All provide some method for subtracting one day. You'll probably have to adapt this to your particular (unstated) database.

LAG / OVER / PARTITION / ORDER BY using conditions - SQL Server 2017

I have a table that looks like this:
Date AccountID Amount
2018-01-01 123 12
2018-01-06 123 150
2018-02-14 123 11
2018-05-06 123 16
2018-05-16 123 200
2018-06-01 123 18
2018-06-15 123 17
2018-06-18 123 110
2018-06-30 123 23
2018-07-01 123 45
2018-07-12 123 116
2018-07-18 123 60
This table has multiple dates and IDs, along with multiple Amounts. For each individual row, I want grab the last Date where Amount was over a specific value for that specific AccountID. I have been trying to use the LAG( Date, 1 ) in combination with several variatons of CASE and OVER ( PARTITION BY AccountID ORDER BY Date ) statements but I've had no luck. Ultimately, this is what I would like my SELECT statement to return.
Date AccountID Amount LastOverHundred
2018-01-01 123 12 NULL
2018-01-06 123 150 2018-01-06
2018-02-14 123 11 2018-01-06
2018-05-06 123 16 2018-01-06
2018-05-16 123 200 2018-05-16
2018-06-01 123 18 2018-05-16
2018-06-15 123 17 2018-05-16
2018-06-18 123 110 2018-06-18
2018-06-30 123 23 2018-06-18
2018-07-01 123 45 2018-06-18
2018-07-12 123 116 2018-07-12
2018-07-18 123 60 2018-07-12
Any help with this would be greatly appreciated.
Use a cumulative conditional max():
select t.*,
max(case when amount > 100 then date end) over (partition by accountid order by date) as lastoverhundred
from t;

grouping a table with different dates

I have a table like below,
SalesId ItemId DateSale USDVal
ABC 01A 2018-04-01 52
ABC 01B 2018-04-01 300
ABC 01C 2018-04-01 12
ABC 01D 2018-04-01 62
ABC 01A 2018-03-23 66
MNB 01A 2018-01-01 584
MNB 01A 2018-02-20 320
MNB 01F 2018-02-20 5
I want to write a query that selects the last date for each SalesId and shows those records so the result would look something like below.
Result
SalesId ItemId DateSale USDVal
ABC 01A 2018-04-01 52
ABC 01B 2018-04-01 300
ABC 01C 2018-04-01 12
ABC 01D 2018-04-01 62
MNB 01A 2018-02-20 320
MNB 01F 2018-02-20 5
In SQL Server, the fastest way is often a correlated subquery:
select t.*
from t
where t.datesale = (select max(t2.datesale) from t t2 where t2.salesid = t.salesid);