max(sum(field query in Hive/SQL - sql

I have a table with lots of transactions for users across a month.
I need to take the hour from each day where Sum(cost) is at its highest.
I've tried MAX(SUM(Cost)) but get an error.
How would I go about doing this please?
here is some sample data
+-------------+------+----------+------+
| user id | hour | date | Cost |
+-------------+------+----------+------+
| 343252 | 13 | 20170101 | 21.5 |
| 32532532 | 13 | 20170101 | 22.5 |
| 35325325 | 13 | 20170101 | 30.5 |
| 325325325 | 13 | 20170101 | 10 |
| 64643643 | 12 | 20170101 | 22 |
| 643643643 | 12 | 20170101 | 31 |
| 436325234 | 13 | 20170101 | 15 |
| 213213213 | 13 | 20170101 | 12 |
| 53265436436 | 17 | 20170101 | 19 |
+-------------+------+----------+------+
Expected Output:
I need just one row per day, where it shows the total cost from the 'most expensive' hour. In this case, 13:00 had a total cost of 111.5

select hr
,dt
,total_cost
from (select dt
,hr
,sum(cost) as total_cost
,row_number () over
(
partition by dt
order by sum(cost) desc
) as rn
from mytable
group by dt,hr
) t
where rn = 1
+----+------------+------------+
| hr | dt | total_cost |
+----+------------+------------+
| 13 | 2017-01-01 | 111.5 |
+----+------------+------------+

Try this:
select AVG(hour) as 'Hour',date as 'Date',sum(cost) as 'TotalCost' from dbo.Table_3 group by date

Related

SQL (Redshift) get start and end values for consecutive data in a given column

I have a table that has the subscription state of users on any given day. The data looks like this
+------------+------------+--------------+
| account_id | date | current_plan |
+------------+------------+--------------+
| 1 | 2019-08-01 | free |
| 1 | 2019-08-02 | free |
| 1 | 2019-08-03 | yearly |
| 1 | 2019-08-04 | yearly |
| 1 | 2019-08-05 | yearly |
| ... | | |
| 1 | 2020-08-02 | yearly |
| 1 | 2020-08-03 | free |
| 2 | 2019-08-01 | monthly |
| 2 | 2019-08-02 | monthly |
| ... | | |
| 2 | 2019-08-31 | monthly |
| 2 | 2019-09-01 | free |
| ... | | |
| 2 | 2019-11-26 | free |
| 2 | 2019-11-27 | monthly |
| ... | | |
| 2 | 2019-12-27 | monthly |
| 2 | 2019-12-28 | free |
+------------+------------+--------------+
I would like to have a table that gives the start and end dats of a subscription. It would look something like this:
+------------+------------+------------+-------------------+
| account_id | start_date | end_date | subscription_type |
+------------+------------+------------+-------------------+
| 1 | 2019-08-03 | 2020-08-02 | yearly |
| 2 | 2019-08-01 | 2019-08-31 | monthly |
| 2 | 2019-11-27 | 2019-12-27 | monthly |
+------------+------------+------------+-------------------+
I started by doing a LAG windown function with a bunch of WHERE statements to grab the "state changes", but this makes it difficult to see when customers float in and out of subscriptions and i'm not sure this is the best method.
lag as (
select *, LAG(tier) OVER (PARTITION BY account_id ORDER BY date ASC) AS previous_plan
, LAG(date) OVER (PARTITION BY account_id ORDER BY date ASC) AS previous_plan_date
from data
)
SELECT *
FROM lag
where (current_plan = 'free' and previous_plan in ('monthly', 'yearly'))
This is a gaps-and-islands problem. I think a difference of row numbers works:
select account_id, current_plan, min(date), max(date)
from (select d.*,
row_number() over (partition by account_id order by date) as seqnum,
row_number() over (partition by account_id, current_plan order by date) as seqnum_2
from data
) d
where current_plan <> free
group by account_id, current_plan, (seqnum - seqnum_2);

SQL - Calculate number of occurrences of previous day?

I want to calculate the number of people who also had occurrence the previous day on a daily basis, but I'm not sure how to do this?
Sample Table:
| ID | Date |
+----+-----------+
| 1 | 1/10/2020 |
| 1 | 1/11/2020 |
| 2 | 2/20/2020 |
| 3 | 2/20/2020 |
| 3 | 2/21/2020 |
| 4 | 2/23/2020 |
| 4 | 2/24/2020 |
| 5 | 2/22/2020 |
| 5 | 2/23/2020 |
| 5 | 2/24/2020 |
+----+-----------+
Desired Output:
| Date | Count |
+-----------+-------+
| 1/11/2020 | 1 |
| 2/21/2020 | 1 |
| 2/23/2020 | 1 |
| 2/24/2020 | 2 |
+-----------+-------+
Edit: Added desired output. The output count should be unique to the ID, not the number of date occurrences. i.e. an ID 5 can appear on this list 10 times for dates 2/23/2020 and 2/24/2020, but that would count as "1".
Use lag():
select date, count(*)
from (select t.*, lag(date) over (partition by id order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date)
group by date;

Find rows with adjourning date ranges and accumulate their durations

My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.
Example Data
locations
SELECT id, name, is_federal_state
FROM locations
WHERE is_federal_state = true;
| id | name | is_federal_state |
|----|-------------------|------------------|
| 2 | Baden-Württemberg | true |
| 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types;
| id | name |
|----|-----------------------|
| 1 | Herbst |
| 8 | Wochenende |
"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".
periods
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id
FROM periods
WHERE location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 8 |
Task
I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:
SELECT id, starts_on, ends_on,
(ends_on - starts_on + 1) AS duration,
holiday_or_vacation_type_id
FROM periods
| id | starts_on | ends_on | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).
Problem 1
Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
Problem 2
It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.
The final operation is some aggregation (using window functions):
SELECT p.*,
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id,
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods
) p
) p;

How to partition by a customized sum value?

I have a table with the following columns: customer_id, event_date_time
I'd like to figure out how many times a customer triggers an event every 12 hours from the start of an event. In other words, aggregate the time between events for up to 12 hours by customer.
For example, if a customer triggers an event (in order) at noon, 1:30pm, 5pm, 2am, and 3pm, I would want to return the noon, 2am, and 3pm record.
I've written this query:
select
cust_id,
event_datetime,
nvl(24*(event_datetime - lag(event_datetime) over (partition BY cust_id ORDER BY event_datetime)),0) as difference
from
tbl
I feel like I'm close with this. Is there a way to add something like
over (partition BY cust_id, sum(difference)<12 ORDER BY event_datetime)
EDIT: I'm adding some sample data:
+---------+-----------------+-------------+---+
| cust_id | event_datetime | DIFFERENCE | X |
+---------+-----------------+-------------+---+
| 1 | 6/20/2015 23:35 | 0 | x |
| 1 | 6/21/2015 0:09 | 0.558611111 | |
| 1 | 6/21/2015 0:49 | 0.667777778 | |
| 1 | 6/21/2015 1:30 | 0.688333333 | |
| 1 | 6/21/2015 9:38 | 8.133055556 | |
| 1 | 6/21/2015 10:09 | 0.511111111 | |
| 1 | 6/21/2015 10:45 | 0.600555556 | |
| 1 | 6/21/2015 11:09 | 0.411111111 | |
| 1 | 6/21/2015 11:32 | 0.381666667 | |
| 1 | 6/21/2015 11:55 | 0.385 | x |
| 1 | 6/21/2015 12:18 | 0.383055556 | |
| 1 | 6/21/2015 12:23 | 0.074444444 | |
| 1 | 6/22/2015 10:01 | 21.63527778 | x |
| 1 | 6/22/2015 10:24 | 0.380555556 | |
| 1 | 6/22/2015 10:46 | 0.373611111 | |
+---------+-----------------+-------------+---+
The "x" are the records that should be pulled since they're the first records in the 12 hour block.
If I understand correctly, you want the first record in each 12-hour block where the blocks of time are defined by the first event time.
If so, you need to modify your query to get the difference from the *first * time for each customer. The rest is just arithmetic. The query would look something like this:
with t as (
select cust_id, event_datetime,
(24 * (event_datetime -
coalesce(min(event_datetime) over (partition by cust_id ), 0)
) as difference
from tbl
)
select t.*
from (select t.*,
row_number() over (partition by cust_id, floor(difference / 12)
order by difference) as seqnum
from t
) t
where seqnum = 1;

How to calculate running total (month to date) in SQL Server 2008

I'm trying to calculate a month-to-date total using SQL Server 2008.
I'm trying to generate a month-to-date count at the level of activities and representatives. Here are the results I want to generate:
| REPRESENTATIVE_ID | MONTH | WEEK | TOTAL_WEEK_ACTIVITY_COUNT | MONTH_TO_DATE_ACTIVITIES_COUNT |
|-------------------|-------|------|---------------------------|--------------------------------|
| 40 | 7 | 7/08 | 1 | 1 |
| 40 | 8 | 8/09 | 1 | 1 |
| 40 | 8 | 8/10 | 1 | 2 |
| 41 | 7 | 7/08 | 2 | 2 |
| 41 | 8 | 8/08 | 4 | 4 |
| 41 | 8 | 8/09 | 3 | 7 |
| 41 | 8 | 8/10 | 1 | 8 |
From the following tables:
ACTIVITIES_FACT table
+-------------------+------+-----------+
| Representative_ID | Date | Activity |
+-------------------+------+-----------+
| 41 | 8/03 | Call |
| 41 | 8/04 | Call |
| 41 | 8/05 | Call |
+-------------------+------+-----------+
LU_TIME table
+-------+-----------------+--------+
| Month | Date | Week |
+-------+-----------------+--------+
| 8 | 8/01 | 8/08 |
| 8 | 8/02 | 8/08 |
| 8 | 8/03 | 8/08 |
| 8 | 8/04 | 8/08 |
| 8 | 8/05 | 8/08 |
+-------+-----------------+--------+
I'm not sure how to do this: I keep running into problems with multiple-counting or aggregations not being allowed in subqueries.
A running total is the summation of a sequence of numbers which is
updated each time a new number is added to the sequence, simply by
adding the value of the new number to the running total.
I THINK He wants a running total for Month by each Representative_Id, so a simple group by week isn't enough. He probably wants his Month_To_Date_Activities_Count to be updated at the end of every week.
This query gives a running total (month to end-of-week date) ordered by Representative_Id, Week
SELECT a.Representative_ID, l.month, l.Week, Count(*) AS Total_Week_Activity_Count
,(SELECT count(*)
FROM ACTIVITIES_FACT a2
INNER JOIN LU_TIME l2 ON a2.Date = l2.Date
AND a.Representative_ID = a2.Representative_ID
WHERE l2.week <= l.week
AND l2.month = l.month) Month_To_Date_Activities_Count
FROM ACTIVITIES_FACT a
INNER JOIN LU_TIME l ON a.Date = l.Date
GROUP BY a.Representative_ID, l.Week, l.month
ORDER BY a.Representative_ID, l.Week
| REPRESENTATIVE_ID | MONTH | WEEK | TOTAL_WEEK_ACTIVITY_COUNT | MONTH_TO_DATE_ACTIVITIES_COUNT |
|-------------------|-------|------|---------------------------|--------------------------------|
| 40 | 7 | 7/08 | 1 | 1 |
| 40 | 8 | 8/09 | 1 | 1 |
| 40 | 8 | 8/10 | 1 | 2 |
| 41 | 7 | 7/08 | 2 | 2 |
| 41 | 8 | 8/08 | 4 | 4 |
| 41 | 8 | 8/09 | 3 | 7 |
| 41 | 8 | 8/10 | 1 | 8 |
SQL Fiddle Sample
As I understand your question:
SELECT af.Representative_ID
, lt.Week
, COUNT(af.Activity) AS Qnt
FROM ACTIVITIES_FACT af
INNER JOIN LU_TIME lt ON lt.Date = af.date
GROUP BY af.Representative_ID, lt.Week
SqlFiddle
Representative_ID Week Month_To_Date_Activities_Count
41 2013-08-01 00:00:00.000 1
41 2013-08-08 00:00:00.000 3
USE tempdb;
GO
IF OBJECT_ID('#ACTIVITIES_FACT','U') IS NOT NULL DROP TABLE #ACTIVITIES_FACT;
CREATE TABLE #ACTIVITIES_FACT
(
Representative_ID INT NOT NULL
,Date DATETIME NULL
, Activity VARCHAR(500) NULL
)
IF OBJECT_ID('#LU_TIME','U') IS NOT NULL DROP TABLE #LU_TIME;
CREATE TABLE #LU_TIME
(
Month INT
,Date DATETIME
,Week DATETIME
)
INSERT INTO #ACTIVITIES_FACT(Representative_ID,Date,Activity)
VALUES
(41,'7/31/2013','Chat')
,(41,'8/03/2013','Call')
,(41,'8/04/2013','Call')
,(41,'8/05/2013','Call')
INSERT INTO #LU_TIME(Month,Date,Week)
VALUES
(8,'7/31/2013','8/01/2013')
,(8,'8/01/2013','8/08/2013')
,(8,'8/02/2013','8/08/2013')
,(8,'8/03/2013','8/08/2013')
,(8,'8/04/2013','8/08/2013')
,(8,'8/05/2013','8/08/2013')
--Begin Query
SELECT AF.Representative_ID
,LU.Week
,COUNT(*) AS Month_To_Date_Activities_Count
FROM #ACTIVITIES_FACT AS AF
INNER JOIN #LU_TIME AS LU
ON AF.Date = LU.Date
Group By AF.Representative_ID
,LU.Week