Below is my data where am looking to generate sum of revenues per month basis using columns event_time and price.
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
| oct_data.event_time | oct_data.event_type | oct_data.product_id | oct_data.category_id | oct_data.category_code | oct_data.brand | oct_data.price | oct_data.user_id | oct_data.user_session |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
| 2019-10-01 00:00:00 UTC | cart | 5773203 | 1487580005134238553 | | runail | 2.62 | 463240011 | 26dd6e6e-4dac-4778-8d2c-92e149dab885 |
| 2019-10-01 00:00:03 UTC | cart | 5773353 | 1487580005134238553 | | runail | 2.62 | 463240011 | 26dd6e6e-4dac-4778-8d2c-92e149dab885 |
| 2019-10-01 00:00:07 UTC | cart | 5881589 | 2151191071051219817 | | lovely | 13.48 | 429681830 | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9 |
| 2019-10-01 00:00:07 UTC | cart | 5723490 | 1487580005134238553 | | runail | 2.62 | 463240011 | 26dd6e6e-4dac-4778-8d2c-92e149dab885 |
| 2019-10-01 00:00:15 UTC | cart | 5881449 | 1487580013522845895 | | lovely | 0.56 | 429681830 | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9 |
| 2019-10-01 00:00:16 UTC | cart | 5857269 | 1487580005134238553 | | runail | 2.62 | 430174032 | 73dea1e7-664e-43f4-8b30-d32b9d5af04f |
| 2019-10-01 00:00:19 UTC | cart | 5739055 | 1487580008246412266 | | kapous | 4.75 | 377667011 | 81326ac6-daa4-4f0a-b488-fd0956a78733 |
| 2019-10-01 00:00:24 UTC | cart | 5825598 | 1487580009445982239 | | | 0.56 | 467916806 | 2f5b5546-b8cb-9ee7-7ecd-84276f8ef486 |
| 2019-10-01 00:00:25 UTC | cart | 5698989 | 1487580006317032337 | | | 1.27 | 385985999 | d30965e8-1101-44ab-b45d-cc1bb9fae694 |
| 2019-10-01 00:00:26 UTC | view | 5875317 | 2029082628195353599 | | | 1.59 | 474232307 | 445f2b74-5e4c-427e-b7fa-6e0a28b156fe |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
I have used the below query but the sum does not seem to occur. Please suggest best approaches to generate the desired output.
select date_format(event_time,'MM') as Month,
sum(price) as Monthly_Revenue
from oct_data_new
group by date_format(event_time,'MM')
order by Month;
Note: event_time field is in TIMESTAMP format.
First convert the timestamp to date and then apply date_format():
select date_format(cast(event_time as date),'MM') as Month,
sum(price) as Monthly_Revenue
from oct_data_new
group by date_format(cast(event_time as date),'MM')
order by Month;
This will work if all the dates are of the same year.
If not then you should also group by year.
Your code should work -- unless you are using an old version of Hive. date_format() has accepted a timestamp argument since 1.1.2 -- released in early 2016. That said, I would strongly suggest that you include the year:
select date_format(event_time, 'yyyy-MM') as Month,
sum(price) as Monthly_Revenue
from oct_data_new
group by date_format(event_time, 'yyyy-MM')
order by Month;
I have different tables with data. In some tables data is loaded quearterly, in others monthly/daily etc.
Every table has ReportedDate column. What I like to do is to be able to filter only the last N periods. If it is days for example, the last 3 days. The problem is I cannot use GETDATE() - 3 for example, because the data is loaded for workdays and not holidays and weekends.
I have tried to use ROW_NUMBER() PARTITION BY ReportedDate but it works really slow.
I would appreciate suggestions.
A sample of a table:
+-----------+-----------------------------+
| Indicator | ReportedDate |
+-----------+-----------------------------+
| 0.2917 | 2020-08-12 00:00:00.0000000 |
| 0.261919 | 2020-08-13 00:00:00.0000000 |
| 0.259211 | 2020-08-14 00:00:00.0000000 |
| 0.201075 | 2020-08-17 00:00:00.0000000 |
| 0.250153 | 2020-08-18 00:00:00.0000000 |
| 0.333093 | 2020-08-19 00:00:00.0000000 |
| 0.976495 | 2020-08-20 00:00:00.0000000 |
| 0.759739 | 2020-08-21 00:00:00.0000000 |
| 1.17279 | 2020-08-24 00:00:00.0000000 |
| 0.285365 | 2020-08-25 00:00:00.0000000 |
+-----------+-----------------------------+
SELECT *
FROM (SELECT Indicator, ReportedDate, ROW_NUMBER() OVER(PARTITION BY ReportedDate ORDER BY ReportedDate desc) as periods
FROM indicatorTable) a
where periods <= 2
Another example - table with stock prices:
+--------+--------+-------------------------+
| Ticker | Price | Date |
+--------+--------+-------------------------+
| AAPL | 116.03 | 2020-11-25 00:00:00.000 |
| AAPL | 115.17 | 2020-11-24 00:00:00.000 |
| AAPL | 113.85 | 2020-11-23 00:00:00.000 |
| AAPL | 117.34 | 2020-11-20 00:00:00.000 |
| AAPL | 118.64 | 2020-11-19 00:00:00.000 |
| AAPL | 118.03 | 2020-11-18 00:00:00.000 |
| AAPL | 119.39 | 2020-11-17 00:00:00.000 |
| AAPL | 120.3 | 2020-11-16 00:00:00.000 |
| AAPL | 119.26 | 2020-11-13 00:00:00.000 |
| AAPL | 119.21 | 2020-11-12 00:00:00.000 |
| IBM | 124.2 | 2020-11-25 00:00:00.000 |
| IBM | 124.42 | 2020-11-24 00:00:00.000 |
| IBM | 120.09 | 2020-11-23 00:00:00.000 |
| IBM | 116.94 | 2020-11-20 00:00:00.000 |
| IBM | 117.18 | 2020-11-19 00:00:00.000 |
| IBM | 116.77 | 2020-11-18 00:00:00.000 |
| IBM | 117.7 | 2020-11-17 00:00:00.000 |
| IBM | 118.36 | 2020-11-16 00:00:00.000 |
| IBM | 116.85 | 2020-11-13 00:00:00.000 |
| IBM | 114.5 | 2020-11-12 00:00:00.000 |
| MSFT | 213.87 | 2020-11-25 00:00:00.000 |
| MSFT | 213.86 | 2020-11-24 00:00:00.000 |
| MSFT | 210.11 | 2020-11-23 00:00:00.000 |
| MSFT | 210.39 | 2020-11-20 00:00:00.000 |
| MSFT | 212.42 | 2020-11-19 00:00:00.000 |
| MSFT | 211.08 | 2020-11-18 00:00:00.000 |
| MSFT | 214.46 | 2020-11-17 00:00:00.000 |
| MSFT | 217.23 | 2020-11-16 00:00:00.000 |
| MSFT | 216.51 | 2020-11-13 00:00:00.000 |
| MSFT | 215.44 | 2020-11-12 00:00:00.000 |
+--------+--------+-------------------------+
What I want is to take the results for the last two periods, in this case:
+--------+--------+-------------------------+
| Ticker | Price | Date |
+--------+--------+-------------------------+
| AAPL | 116.03 | 2020-11-25 00:00:00.000 |
| AAPL | 115.17 | 2020-11-24 00:00:00.000 |
| IBM | 124.2 | 2020-11-25 00:00:00.000 |
| IBM | 124.42 | 2020-11-24 00:00:00.000 |
| MSFT | 213.87 | 2020-11-25 00:00:00.000 |
| MSFT | 213.86 | 2020-11-24 00:00:00.000 |
+--------+--------+-------------------------+
Use dense_rank instead row_number
SELECT *
FROM (SELECT Indicator, ReportedDate, dense_rank() OVER(PARTITION BY (select 1) ORDER BY ReportedDate desc) as periods
FROM #t) a
where periods <= 2
What if:
declare
#t table (Indicator decimal(37,12), ReportedDate datetime)
insert into #t
select 0.2917 , cast('2020-08-12 00:00:00' as datetime)
union
select 0.261919 , cast('2020-08-13 00:00:00' as datetime)
union
select 0.259211 , cast('2020-08-14 00:00:00' as datetime)
union
select 0.201075 , cast('2020-08-17 00:00:00' as datetime)
union
select 0.250153 , cast('2020-08-18 00:00:00' as datetime)
union
select 0.333093 , cast('2020-08-19 00:00:00' as datetime)
union
select 0.976495 , cast('2020-08-20 00:00:00' as datetime)
union
select 0.759739 , cast('2020-08-21 00:00:00' as datetime)
union
select 1.17279 , cast('2020-08-24 00:00:00' as datetime)
union
select 0.285365, cast('2020-08-25 00:00:00' as datetime)
select top 3 * from #t
order by 2 desc
I have a table and I already create the lead values for the next date in each product cluster. In addition to that I created a delta value which displays the difference between date and lead_date.
+---------+------------+------------+------------+
| Product | Date | LeadDate | delta_days |
+---------+------------+------------+------------+
| A | 2018-01-15 | 2018-01-23 | 8 |
| A | 2018-01-23 | 2018-02-19 | 27 |
| A | 2018-02-19 | 2017-05-25 | -270 |
| B | 2017-05-25 | 2017-05-30 | 5 |
| B | 2017-05-30 | 2016-01-01 | -515 |
| C | 2016-01-01 | 2016-01-02 | 1 |
| C | 2016-01-02 | 2016-01-03 | 1 |
| C | 2016-01-03 | NULL | NULL |
+---------+------------+------------+------------+
What I want to do is that I need to update the last record of each product cluster and set Lead_date and delta_days to null. How should I do this?
This is my goal:
+---------+------------+------------+------------+
| Product | Date | LeadDate | delta_days |
+---------+------------+------------+------------+
| A | 2018-01-15 | 2018-01-23 | 8 |
| A | 2018-01-23 | 2018-02-19 | 27 |
| A | 2018-02-19 | NULL | NULL |
| B | 2017-05-25 | 2017-05-30 | 5 |
| B | 2017-05-30 | NULL | NULL |
| C | 2016-01-01 | 2016-01-02 | 1 |
| C | 2016-01-02 | 2016-01-03 | 1 |
| C | 2016-01-03 | NULL | NULL |
+---------+------------+------------+------------+
Lag/Lead have a default value if it can't find the next/previous value:
LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )
Just specify that you want the [default] to be NULL in your code to produce your lead column
In your code (guess since we don't have it):
SELECT date,
LEAD([date], 1, NULL) OVER(PARTITION BY Product ORDER BY [date]) as your_new_col
IMO, this is better than running an actual update since this will be dynamic in case you insert a new record that will change the existing order of your records.
You can use updatable cte with last_value() function :
with updatable as (
select *, last_value(date) over (partition by product order by date) as last_val
from table
)
update updatable
set LeadDate = null, delta_days = null
where Date = last_val;
I have a table with campaign data and need to get a list of 'spend_perc' min and max values while grouping by the client_id AND timing of these campaigns.
sample data being:
camp_id | client_id | start_date | end_date | spend_perc
7257 | 35224 | 2017-01-16 | 2017-02-11 | 100.05
7284 | 35224 | 2017-01-16 | 2017-02-11 | 101.08
7308 | 35224 | 2017-01-16 | 2017-02-11 | 101.3
7309 | 35224 | 2017-01-16 | 2017-02-11 | 5.8
6643 | 35224 | 2017-02-08 | 2017-02-24 | 79.38
6645 | 35224 | 2017-02-08 | 2017-02-24 | 6.84
6648 | 35224 | 2017-02-08 | 2017-02-24 | 100.01
6649 | 78554 | 2017-02-09 | 2017-02-27 | 2.5
6650 | 78554 | 2017-02-09 | 2017-02-27 | 18.5
6651 | 78554 | 2017-02-09 | 2017-02-27 | 98.5
what I'm trying to get is the rows with min and max 'spend_perc' values per each client_id AND within the same campaign timing (identical start/end_date):
camp_id | client_id | start_date | end_date | spend_perc
7308 | 35224 | 2017-01-16 | 2017-02-11 | 101.3
7309 | 35224 | 2017-01-16 | 2017-02-11 | 5.8
6645 | 35224 | 2017-02-08 | 2017-02-24 | 6.84
6648 | 35224 | 2017-02-08 | 2017-02-24 | 100.01
6649 | 78554 | 2017-02-09 | 2017-02-27 | 2.5
6651 | 78554 | 2017-02-09 | 2017-02-27 | 98.5
smth like:?
with a as
(select distinct
camp_id,client_id,start_date,end_date,max(spend_perc) over (partition by start_date,end_date),min(spend_perc) over (partition by start_date,end_date)
from tn
)
select camp_id,client_id,start_date,end_date,case when spend_perc=max then max when spend_perc = min then min end spend_perc
from a
order by camp_id,client_id,start_date,end_date,spend_perc
I think you will want to get rid of the camp_id field because that will be meaningless in this case. So you want something like:
SELECT client_id, start_date, end_date,
min(spend_perc) as min_spend_perc, max(spend_perc) as max_spend_perc
FROM mytable
GROUP BY client_id, start_date, end_date;
Group by the criteria you want to, and select min and max as columns per unique combination of these values (i.e. per row).
Good people at StackOverflow,
please be so kind to provide some help...
So what we have here is let's say a table of sort... containing phone calls from customers to some Contact Center (HelpDesk or whatever).
-----------------------------------------------------------------------
| DateD | DateM | Date_Time |EMPL_ID| PHONE_NO |FIRST_REP |
|----------------------------------------------------------------------
|2016-12-12| 2016-12-01| 2016-12-12 15:55| 16652 | 123456789| First |
|2016-12-22| 2016-12-01| 2016-12-22 10:42| 18178 | 123456789| First |
|2016-12-22| 2016-12-01| 2016-12-22 10:54|112981 | 123456789| Repeat |
|2016-12-22| 2016-12-01| 2016-12-22 10:57| 18179 | 123456789| Repeat |
|2016-12-23| 2016-12-01| 2016-12-23 12:27| 16653 | 123456789| Repeat |
|2017-01-05| 2017-01-01| 2017-01-05 15:20| 17896 | 123456789| First |
|2017-01-11| 2017-01-01| 2017-01-11 15:48| 17909 | 123456789| Repeat |
|2017-01-18| 2017-01-01| 2017-01-18 10:07| 18175 | 123456789| Repeat |
|2016-12-03| 2016-12-01| 2016-12-03 20:32| 17745 | 111222333| First |
|2016-12-21| 2016-12-01| 2016-12-21 18:47| 10982 | 111222333| First |
|2016-12-22| 2016-12-01| 2016-12-22 15:53| 17820 | 111222333| Repeat |
|2016-12-28| 2016-12-01| 2016-12-28 13:07| 15976 | 111222333| Repeat |
|2016-12-29| 2016-12-01| 2016-12-29 21:35| 17896 | 111222333| Repeat |
|2016-12-29| 2016-12-01| 2016-12-29 21:46| 15498 | 111222333| Repeat |
|2017-01-02| 2017-01-01| 2017-01-02 16:24| 13117 | 111222333| Repeat |
-----------------------------------------------------------------------
What I would like to do is figure out, how many calls are repeated, meaning that the customer called again.
Now the tricky part is that the repeated calls is defined as a call that originated from the 'first call' and is being repeated consecutively in the time span of 7 days from each interaction after the first call, so for instance:
----------------------------------------------------------------------------
| DateD | DateM | Date_Time |EMPL_ID| PHONE_NO |FIRST_REP |
|---------------------------------------------------------------------------
|2016-12-01 | 2016-12-12 | 2016-12-12 15:55 | 16652 | 123456789 | First |
|2016-12-01 | 2016-12-22 | 2016-12-22 10:42 | 18178 | 123456789 | First |
|2016-12-01 | 2016-12-22 | 2016-12-22 10:54 | 112981| 123456789 | Repeat |
|2016-12-01 | 2016-12-22 | 2016-12-22 10:57 | 18179 | 123456789 | Repeat |
|2016-12-01 | 2016-12-23 | 2016-12-23 12:27 | 16653 | 123456789 | Repeat |
|2017-01-01 | 2017-01-05 | 2017-01-05 15:20 | 17896 | 123456789 | First |
|2017-01-01 | 2017-01-11 | 2017-01-11 15:48 | 17909 | 123456789 | Repeat |
|2017-01-01 | 2017-01-18 | 2017-01-18 10:07 | 18175 | 123456789 | Repeat |
----------------------------------------------------------------------------
we've got :
1st row that is a First Call with no repeated calls,
2sd row that is a First Call with 3 repeated calls as every interaction is in the time span of 7 days from each previous one beginning from the First Calls
3rd row that is a First Call with 2 repeated calls just like above.
And now what we want to say is that employee (1st row) with the ID 16652 generated 0 repeated calls but the employee withe the ID 18178 generated on the other hand 3 repeated calls.
Finally it would be great to have some method that would allow to create an output like this:
| DateM | DateD | Date_Time |EMP_ID | PHONE_NO |FIRST_REP | DateM_REP | DateD_REP | Date_Time_REP | EMP_ID_REP | PHONE_NO_REP | FIRST_REP_REP
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|2016-12-01 | 2016-12-12 | 2016-12-12 15:55 | 16652 | 123456789 | First | null | null | null | null | null | null
|2016-12-01 | 2016-12-22 | 2016-12-22 10:42 | 18178 | 123456789 | First | 2016-12-01 | 2016-12-22 | 2016-12-22 10:54 | 112981 | 123456789 | Repeat
|2016-12-01 | 2016-12-22 | 2016-12-22 10:42 | 18178 | 123456789 | First | 2016-12-01 | 2016-12-22 | 2016-12-22 10:57 | 18179 | 123456789 | Repeat
|2016-12-01 | 2016-12-22 | 2016-12-22 10:42 | 18178 | 123456789 | First | 2016-12-01 | 2016-12-23 | 2016-12-23 12:27 | 16653 | 123456789 | Repeat
|2017-01-01 | 2017-01-05 | 2017-01-05 15:20 | 17896 | 123456789 | First | 2017-01-01 | 2017-01-11 | 2017-01-11 15:48 | 17909 | 123456789 | Repeat
|2017-01-01 | 2017-01-05 | 2017-01-05 15:20 | 17896 | 123456789 | First | 2017-01-01 | 2017-01-18 | 2017-01-18 10:07 | 18175 | 123456789 | Repeat
Please help, I'm not that good at writing CTE and as I imagine that's a kind problem that has the potential of being solved with CTE.
much oblidged
LuKI.
edit:
CREATE TABLE t_calls
(
[DateM] date,
[DateD] date,
[Date_Time] datetime2(7),
[EMPL_ID] int,
[INTERACTION_ID] numeric(25,0),
[PHONE_NO] numeric(9,0),
[FIRST_REP] varchar(10)
)
Insert Into t_calls
([DateM],[DateD],[Date_Time],[EMPL_ID],[INTERACTION_ID],[PHONE_NO],[FIRST_REP])
Values
('2016-12-01 00:00:00','2016-12-12 00:00:00','2016-12-12 15:55:36',16652,340680165,123456789,'First')
,('2016-12-01 00:00:00','2016-12-22 00:00:00','2016-12-22 10:42:45',18178,343736497,123456789,'First')
,('2016-12-01 00:00:00','2016-12-22 00:00:00','2016-12-22 10:54:46',112981,343750151,123456789,'Repeat')
,('2016-12-01 00:00:00','2016-12-22 00:00:00','2016-12-22 10:57:29',18179,343750151,123456789,'Repeat')
,('2016-12-01 00:00:00','2016-12-23 00:00:00','2016-12-23 12:27:56',16653,344071359,123456789,'Repeat')
,('2017-01-01 00:00:00','2017-01-05 00:00:00','2017-01-05 15:20:47',17896,347063121,123456789,'First')
,('2017-01-01 00:00:00','2017-01-11 00:00:00','2017-01-11 15:48:20',17909,348429965,123456789,'Repeat')
,('2017-01-01 00:00:00','2017-01-18 00:00:00','2017-01-18 10:07:45',18175,350243945,123456789,'Repeat')
,('2016-12-01 00:00:00','2016-12-03 00:00:00','2016-12-03 20:32:37',17745,338392721,111222333,'First')
,('2016-12-01 00:00:00','2016-12-21 00:00:00','2016-12-21 18:47:12',10982,343633967,111222333,'First')
,('2016-12-01 00:00:00','2016-12-22 00:00:00','2016-12-22 15:53:59',17820,343885389,111222333,'Repeat')
,('2016-12-01 00:00:00','2016-12-28 00:00:00','2016-12-28 13:07:19',15976,344944219,111222333,'Repeat')
,('2016-12-01 00:00:00','2016-12-29 00:00:00','2016-12-29 21:35:44',17896,345396945,111222333,'Repeat')
,('2016-12-01 00:00:00','2016-12-29 00:00:00','2016-12-29 21:46:43',15498,345398005,111222333,'Repeat')
,('2017-01-01 00:00:00','2017-01-02 00:00:00','2017-01-02 16:24:12',13117,346045147,111222333,'Repeat')
If I understand your question what you want to know for every call, how many calls were repeated within 7 days.
SELECT
a.date_time
,a.emp_id
,a.phone_no
,count(b.phone_no) as repeat_calls --count any non-null field
,min(b.date_time) as first_repeat_call_at
FROM t_calls a
LEFT JOIN t_calls b
ON a.phone_no = b.phone_no --same phone
AND datediff(d, a.date_time, b.date_time) between 0 AND 6 --a repeat comes in today + 6 days
AND a.date_time < b.date_time --prevents self join
GROUP BY
a.date_time
,a.emp_id
,a.phone_no
For any call with 0 repeats, there'll be nothing to join, thus nothing to count so repeat_calls = 0 and first_repeat_call_at is NULL.