Databricks: replicate columns - dataframe

Suppose I am having the following Dataframe :
YEAR MONTH Value
2019 JAN 100
2019 JAN 200
2019 MAR 400
2019 MAR 100
And I do the pivot group by YEAR. ( df.groupBy().pivot()....)
YEAR JAN MAR
2019 300 500
But I also wanted to replicate the column of the Months through out the year even there are no data in that month ...
which means I would like to have
YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2019 300 0 500 0 0 0 0 0 0 0 0 0
Thanks

Related

SQL group by 7am to 7am

How do I simply group by a 24 hour interval from 7am to 7am in a manner similar to:
select format(t_stamp,'yyyy-MMM')
from mytable
group by format(t_stamp,'yyyy-MMM')
if input is like
3,Wed Mar 23 20:40:40 EDT 2022
3,Wed Mar 23 20:40:39 EDT 2022
4,Wed Mar 23 03:36:10 EDT 2022
3,Wed Mar 22 15:46:44 EST 2022
3,Tue Mar 22 04:16:52 EST 2022
4,Sat Mar 22 03:13:08 EDT 2022
3,Sat Mar 22 03:13:05 EDT 2022
4,Sat Mar 21 04:10:36 EDT 2022
output should be like
6, Mar 23
7, Mar 22
10, Mar 21
4, Mar 20

SQL Custom unique Ordering with repeated sequence

I have a datetime column (data type of timestamp without time zone) named time. I can best explain my issue with a example:
Example I've the following data in this column (pretifying timestamp for this example)
ID TIME
1 1 Mar 2022 - 1PM
2 1 Mar 2022 - 2PM
3 1 Mar 2022 - 1PM
4 1 Mar 2022 - 3PM
5 1 Mar 2022 - 2PM
6 2 Mar 2022 - 2PM
7 2 Mar 2022 - 1PM
8 2 Mar 2022 - 3PM
9 2 Mar 2022 - 1PM
10 1 Mar 2022 - 3PM
11 2 Mar 2022 - 2PM
12 2 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
14 3 Mar 2022 - 3PM
15 3 Mar 2022 - 3PM
16 3 Mar 2022 - 4PM
If i do ORDER BY time, i get the following result:
ID TIME
1 1 Mar 2022 - 1PM
3 1 Mar 2022 - 1PM
2 1 Mar 2022 - 2PM
5 1 Mar 2022 - 2PM
4 1 Mar 2022 - 3PM
10 1 Mar 2022 - 3PM
7 2 Mar 2022 - 1PM
9 2 Mar 2022 - 1PM
6 2 Mar 2022 - 2PM
11 2 Mar 2022 - 2PM
8 2 Mar 2022 - 3PM
12 2 Mar 2022 - 3PM
14 3 Mar 2022 - 3PM
15 3 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
16 3 Mar 2022 - 4PM
But i want the result in this way:
ID TIME
1 1 Mar 2022 - 1PM
2 1 Mar 2022 - 2PM
4 1 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
3 1 Mar 2022 - 1PM
5 1 Mar 2022 - 2PM
10 1 Mar 2022 - 3PM
16 3 Mar 2022 - 4PM
7 2 Mar 2022 - 1PM
6 2 Mar 2022 - 2PM
8 2 Mar 2022 - 3PM
9 2 Mar 2022 - 1PM
11 2 Mar 2022 - 2PM
12 2 Mar 2022 - 3PM
14 3 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
As you can see first 4 rows have unique timestamp and the sequence should repeat based on Time (1PM, 2PM, 3PM).
How can we do this in SQL? I'm using postresql as my DB. I'm using Rails for my Backend.
EDIT:
Have added more context to example to explain my scenario.
One way you can try to use ROW_NUMBER window function with REPLACE function
SELECT time
FROM (
SELECT *,REPLACE(time,'PM','') val,
ROW_NUMBER() OVER(PARTITION BY REPLACE(time,'PM','')) rn
FROM T
) t1
ORDER BY rn,val
For example, sequence of the col a
with tbl(a, othercol) as
(
SELECT 1,1 UNION ALL
SELECT 1,2 UNION ALL
SELECT 1,3 UNION ALL
SELECT 2,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 2,6 UNION ALL
SELECT 3,7 UNION ALL
SELECT 3,8 UNION ALL
SELECT 3,9
),
cte as (
SELECT *, row_number() over(partition by a order by a) rn
from tbl
)
select a, othercol
from cte
order by rn, a
The problem you have at hand is a direct result of not choosing the correct data type for the values you store.
To get the sorting correct, you need to convert the string to a proper time value. There is no to_time() function in Postgres, but you can convert it to a timestamp then cast it to a time:
order by to_timestamp("time", 'hham')::time
You should fix your database design and convert that column to a proper time type. Which will also prevent storing invalid values ('3 in the afternoon' or '128foo') in that column

SQL Calculate field based on three rows

How could I calculate a field based on values from previous and next rows?
I have this list from users with a date (month and year) and a field indicating if the user has 1+ purchases in that month-year
id_user
Date
Has_purchases
Active
15678
Jan 2021
0
1
15678
feb 2021
1
1
15678
mar 2021
0
1
15678
Apr 2021
0
1
15678
may 2021
0
0
15678
jun 2021
0
1
15678
jul 2021
0
1
15678
Aug 2021
1
1
15678
sep 2021
0
1
15678
oct 2021
0
1
15678
nov 2021
0
1
15678
Dec 2021
1
1
I need to calculate if the user was active on a date (month-year). An active user is defined as an user who has at least one purchase on the last 3 months.
Eg. User 15678 is 'active' on march because user has purchases on february, the same user in unactive on may beacause it does not have purchases on march and april and also does not have purchases on june and july

ordering a column based on calculated value

I have below values in a column
Q1 2018
Q2 2018
Q3 2018
Q4 2018
feb 2018
mar 2018
Q1 2019
Q2 2019
Q3 2019
jan 2018
sep 2018
dec 2018
jan 2019
feb 2019
mar 2019
I have above values which gets calculated on some parameters. for some data this value comes with month and for some this comes as quarter.
Is there any way to order them using order by when all the values are on the same column, means monthly values and quarterly value both should be sorted.
output should be like
Q1 2018
Q2 2018
Q3 2018
Q4 2018
Q1 2019
Q2 2019
Q3 2019
jan 2018
feb 2018
mar 2018
sep 2018
dec 2018
jan 2019
feb 2019
mar 2019
I think this does what you want:
order by len(col), -- put the quarters first
substr(col, 4), -- order by year
(case when col not like 'Q%'
then to_date(col, 'MON YYYY')
end), -- order months by date
col -- order quarters by quarter

Extract values from IN parameter and store it in local variables in postgreSQL

I have :listDate:String IN parameter that I'm passing to my proc which contains the dynamic values separated by ,
This :listDate:String is passed to proc from Java front end and it contains the value selected by the user
There can be many combinations
Case 1 listDate can have
1. 30 Dec 2013 to 05 Jan 2014
so v_start_date is 30 Dec 2013 and
v_end_date is 05 Jan 2014
Case 2 listdate can have
30 Dec 2013 to 05 Jan 2014,
06 Jan 2014 to 12 Jan 2014
so v_start_date is 30 Dec 2013 and
v_end_date is 12 Jan 2014
Case 3 listDate can have
06 Jan 2014 to 12 Jan 2014,
13 Jan 2014 to 19 Jan 2014,
20 Jan 2014 to 26 Jan 2014
so v_start_date is 06 Jan 2014 and
v_end_date is 26 Jan 2014
How can I extract the values as shown above into v_start_date and v_end_date ?
I was able to resolve it by using
select substring(listDate from '............$') as v_end_date,
substring(listDate from '^...............') as v_start_date