I have a table like so:
Unique #
Cost
Date
12352
2165.5
2022-01-01 12:20
35256
2360.5
2022-01-01 12:20
12352
3254.0
2022-01-04 18:20
35256
3460.5
2022-01-04 18:20
But, I am trying to get columns as the date name (something like a pivot table):
Unique #
2022-01-01
2022-01-04
...
12352
2165.5
3254.0
--
35256
2360.5
3460.5
--
I understand this is fairly easy to do in Pandas using groupby and unstack, but I am trying to achieve this in SQL.
WITH CTE AS(
select 12352 unique_id, 2165.5 cost ,'2022-01-01 12:20'::TIMESTAMP DATES, DATES::DATE DAY_AGG UNION ALL
select 35256 unique_id, 2360.5 cost ,'2022-01-01 12:20'::TIMESTAMP DATES, DATES::DATE DAY_AGG UNION ALL
select 12352 unique_id, 3254.0 cost ,'2022-01-04 18:20'::TIMESTAMP DATES, DATES::DATE DAY_AGG UNION ALL
select 35256 unique_id, 3460.5 cost ,'2022-01-04 18:20'::TIMESTAMP DATES, DATES::DATE DAY_AGG)
,CTE2 AS(SELECT UNIQUE_ID, DAY_AGG, SUM(COST) DAILY_COST
FROM CTE GROUP BY UNIQUE_ID, DAY_AGG )
SELECT * FROM CTE2
PIVOT(SUM(DAILY_COST)FOR DAY_AGG IN('2022-01-01','2022-01-04')) ;
Related
I have this table:
book_name
borrow_date
return_date
A
2022-08-01
2022-08-03
B
2022-08-03
2022-09-01
C
2022-08-15
2022-09-25
D
2022-09-15
2022-09-18
E
2022-09-17
2022-10-15
And table of first date of the month
summary_month
2022-08-01
2022-09-01
2022-10-01
I would like to count how many books are currently borrowed based on the summary_month. The result I am looking for is:
summary_month
count_book
list_book
2022-08-01
3
A,B,C
2022-09-01
4
B,C,D,E
2022-10-01
1
E
I am stuck with only able to aggregate them based on the borrowed date with query:
count(distinct case when summary_month = date_trunc(borrow_date,month) then book_name end) count_book
Is it possible to get the result I am hoping for? Really need anyone's help and advice. Thank you.
Consider below option
select summary_month,
count(distinct book_name) as count_book,
string_agg(book_name) as list_book
from your_table, unnest(generate_date_array(
date_trunc(borrow_date, month),
date_trunc(return_date, month),
interval 1 month)
) as summary_month
group by summary_month
if applied to sample data in your question -output is
Something like this can work:
with
input as (
select 'A' book_name, cast('2022-08-01' as date) borrow_date , cast('2022-08-03' as date) return_date union all
select 'B', '2022-08-03', '2022-09-01' union all
select 'C', '2022-08-15', '2022-09-25' union all
select 'D', '2022-09-15', '2022-09-18' union all
select 'E', '2022-09-17', '2022-10-15'
),
list_month as (
select distinct
* except(days_borrowed),
date_trunc(days_borrowed, month) as month
from input,
unnest(generate_date_array(borrow_date, return_date)) as days_borrowed
)
select
month,
count(distinct book_name) as count_distinct_book,
string_agg(distinct book_name) as book_name_list
from list_month
group by 1
order by 1
Here is the data I'm working with here
Accountid
Month
123
08/01/2021
123
09/01/2021
123
03/01/2022
123
04/01/2022
123
05/01/2022
123
06/01/2022
I'm trying to insert into a new table where the data is like this
Accountid
Start Month
End Month
123
08/01/2021
09/01/2021
123
03/01/2022
06/01/2022
I'm not sure how to separate them with the gap, and group by the account id in this case.
Thanks in advance
In 12c+ you may also use match_recognize for gaps-and-islands problems to define grouping rules (islands) in a more readable and natural way.
select *
from input_
match_recognize(
partition by accountid
order by month asc
measures
first(month) as start_month,
last(month) as end_month
/*Any month followed by any number of subsequent month */
pattern(any_ next*)
define
/*Next is the month right after the previous one*/
next as months_between(month, prev(month)) = 1
)
ACCOUNTID
START_MONTH
END_MONTH
123
2021-08-01
2021-09-01
123
2022-03-01
2022-06-01
db<>fiddle here
That's a gaps and islands problem; one option to do it is:
Sample data:
SQL> with test (accountid, month) as
2 (select 123, date '2021-01-08' from dual union all
3 select 123, date '2021-01-09' from dual union all
4 select 123, date '2021-01-03' from dual union all
5 select 123, date '2021-01-04' from dual union all
6 select 123, date '2021-01-05' from dual union all
7 select 123, date '2021-01-06' from dual
8 ),
Query begins here:
9 temp as
10 (select accountid, month,
11 to_char(month, 'J') - row_number() Over
12 (partition by accountid order by month) diff
13 from test
14 )
15 select accountid,
16 min(month) as start_month,
17 max(month) as end_Month
18 from temp
19 group by accountid, diff
20 order by accountid, start_month;
ACCOUNTID START_MONT END_MONTH
---------- ---------- ----------
123 03/01/2021 06/01/2021
123 08/01/2021 09/01/2021
SQL>
Although related to MS SQL Server, have a look at Introduction to Gaps and Islands Analysis; should be interesting reading for you, I presume.
I have a query as below:
SELECT
"2022-05-10 00:00:00 UTC" AS date_,
COUNT(salesId) AS total-sales
FROM
`project1.sales.sales-growth`
WHERE
(promoDate BETWEEN "2022-05-10 00:00:00 UTC"
AND "2022-05-11 00:00:00 UTC")
OR
(purchaseDate BETWEEN "2022-05-10 00:00:00 UTC"
AND "2022-05-11 00:00:00 UTC")
Which shows the total sale for a particular date (2022-05-11) as below:
date_ total-sales
2022-05-10 560
I am wondering how I can change the query to show all the May month sales per day (desired output):
date_ total-sales
2022-05-01 567
2022-05-02 687
2022-05-03 878
... ...
2022-05-31 500
One option: generate a date array for the target time range, group by those dates and compare those dates in the WHERE clause with your two date columns.
With an assumed table of yours:
WITH your_table AS
(
SELECT TIMESTAMP("2022-05-01 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT NULL AS promoDate, TIMESTAMP("2022-05-01 18:30:00+00") AS purchaseDate, 1 AS salesId
UNION ALL
SELECT TIMESTAMP("2022-05-02 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT TIMESTAMP("2022-05-03 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT TIMESTAMP("2022-05-04 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT NULL AS promoDate, TIMESTAMP("2022-05-04 18:30:00+00") AS purchaseDate, 1 AS salesId
)
SELECT
date_,
COUNT(salesId) AS total_sales
FROM
UNNEST(GENERATE_DATE_ARRAY("2022-05-01", "2022-05-31")) AS date_, your_table
WHERE
date_ = EXTRACT(DATE FROM promoDate)
OR
date_ = EXTRACT(DATE FROM purchaseDate)
GROUP BY
date_
Output:
Row
date_
total_sales
1
2022-05-01
2
2
2022-05-02
1
3
2022-05-03
1
4
2022-05-04
2
Could SQL get list of date of last 15 days date in a single query?
We can get today date with
select current_date()
We also can get last 15 days date with
select date_add(current_date(), -15)
But how to show the list of last 15 days date?
For example the output is
2020-05-17,
2020-05-18,
2020-05-19,
2020-05-20,
2020-05-21,
2020-05-22,
2020-05-23,
2020-05-24,
2020-05-25,
2020-05-26,
2020-05-27,
2020-05-28,
2020-05-29,
2020-05-30,
2020-05-31
In Hive or Spark-SQL:
select date_add (date_add(current_date,-15),s.i) as dt
from ( select posexplode(split(space(15),' ')) as (i,x)) s
Result:
2020-05-18
2020-05-19
2020-05-20
2020-05-21
2020-05-22
2020-05-23
2020-05-24
2020-05-25
2020-05-26
2020-05-27
2020-05-28
2020-05-29
2020-05-30
2020-05-31
2020-06-01
2020-06-02
See also this answer.
WITH
cte AS ( SELECT 1 num UNION ALL SELECT 2 UNION ALL ... UNION ALL SELECT 15 )
SELECT DATEADD(CURRENT_DATE(), -num)
FROM cte;
Or, for example
WITH
cte1 AS ( SELECT 1 num UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 ),
cte2 AS ( SELECT 0 num
UNION ALL SELECT 1
UNION ALL SELECT 2 )
SELECT DATEADD(CURRENT_DATE(), -cte1.num - cte2.num * 5)
FROM cte1, cte2;
I have a base table like below:
score_upd (Upd_dt,Url,Score) AS (
SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
)
Upd_dt URL Score
2019-07-26 A x
2019-07-26 B alpha
2019-08-01 A y
2019-08-01 B beta
2019-08-03 A z
2019-08-03 B gamma
And I want to create a table in daily-url level, using most previous date's value for the new rows, result should look like below:
score_upd (Upd_dt,Url,Score) AS (
SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-07-27','A','x'
UNION ALL SELECT DATE '2019-07-27','B','alpha'
UNION ALL SELECT DATE '2019-07-28','A','x'
UNION ALL SELECT DATE '2019-07-28','B','alpha'
UNION ALL SELECT DATE '2019-07-29','A','x'
UNION ALL SELECT DATE '2019-07-29','B','alpha'
UNION ALL SELECT DATE '2019-07-30','A','x'
UNION ALL SELECT DATE '2019-07-30','B','alpha'
UNION ALL SELECT DATE '2019-07-31','A','x'
UNION ALL SELECT DATE '2019-07-31','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-02','A','y'
UNION ALL SELECT DATE '2019-08-02','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
UNION ALL SELECT DATE '2019-08-04','A','z'
UNION ALL SELECT DATE '2019-08-04','B','gamma'
UNION ALL SELECT DATE '2019-08-05','A','z'
UNION ALL SELECT DATE '2019-08-05','B','gamma'
)
Which looks like:
Upd_dt URL Score
2019-07-26 A x
2019-07-26 B alpha
2019-07-27 A x
2019-07-27 B alpha
2019-07-28 A x
2019-07-28 B alpha
2019-07-29 A x
2019-07-29 B alpha
2019-07-30 A x
2019-07-30 B alpha
2019-07-31 A x
2019-07-31 B alpha
2019-08-01 A y
2019-08-01 B beta
2019-08-02 A y
2019-08-02 B beta
2019-08-03 A z
2019-08-03 B gamma
2019-08-04 A z
2019-08-04 B gamma
2019-08-05 A z
2019-08-05 B gamma
.
.
.
Current process is:
I built a daily dimension table since 7/26/2019 till today by:
/*
SELECT CAST(slice_time AS DATE) dates
FROM testcalendar mtc
TIMESERIES slice_time as '1 day'
OVER (ORDER BY CAST(mtc.dates as TIMESTAMP));
*/
so I get:
Dates
2019-07-26
2019-07-27
2019-07-28
2019-07-29
.
.
.
2019-10-12 (today)
I'm thinking if I can use function such as "interpolate previous value" to join my first table by dates, to generate missing days by using values from most previous date data, while it failed.
The result didn't generate rows for missing days.
Please let me know if anyone has any better idea on this.
Thanks!
As a starting warning : only store a "daily photograph" when it really, really is necessary. In my past, I once ended up having 364 rows too many per year, as the values only changed once a year. In Vertica, that costs license, and CPU and clock time for joining and grouping ...
But, for the rest - Good start.
But you could apply the TIMESERIES without having to build a calendar.
The trick is to "extrapolate" manually what you can INTERPOLATE automatically.
Add an in-line 'padding' table, which contains the newest value per URL, but give it CURRENT_DATE instead of the newest actual date - using Vertica's peculiar analytic limit clause LIMIT 1 OVER(PARTITION BY url ORDER BY upd_dt DESC) .
UNION SELECT that padding table with your input, and apply the TIMESERIES clause to that UNION SELECT.
Like so:
WITH
-- your input ...
score_upd (Upd_dt,Url,Score) AS (
SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
)
-- real WITH clause would start here ...
,
-- newest row per Url, just with current date
pad_newest AS (
SELECT
CURRENT_DATE
, url
, score
FROM score_upd
LIMIT 1 OVER(PARTITION BY url ORDER BY upd_dt DESC)
)
,
with_newest AS (
SELECT
*
FROM score_upd
UNION ALL
SELECT *
FROM pad_newest
)
SELECT
ts_dt::DATE AS upd_dt
, url AS url
, TS_FIRST_VALUE(score) AS score
FROM with_newest
TIMESERIES ts_dt AS '1 day' OVER (
PARTITION BY url ORDER BY upd_dt::TIMESTAMP
)
ORDER BY 1,2
;