Divide data from other days by data from one particular day - sql

I am a bit stuck on one problem for a few hours now.
Let`s say I have a table with the following data:
month outstanding
01/05/2012 35 678 956
02/05/2012 33 678 956
03/05/2012 31 678 956
04/05/2012 27 678 956
05/05/2012 24 678 956
i need to get the ratio of say, day 05/05/2012 results to the first day of that month
E.G. Outstanding of05/05/2012 divided by outstanding 01/05/2012 (24 678 956/35 678 956)
What function should i use?
Tried doing over partition by / by result of to_char(trunc(trunc(a.date_,'MM'), 'MM'),'DD-MM-YYYY')
Didnt seem to work for me

create table temp (month date , outstanding number);
insert into temp values(to_date('01/05/2012','dd/mm/yyyy'),35678956);
insert into temp values(to_date('02/05/2012','dd/mm/yyyy'),33678956);
insert into temp values(to_date('03/05/2012','dd/mm/yyyy'),31678956);
insert into temp values(to_date('04/05/2012','dd/mm/yyyy'),27678956);
insert into temp values(to_date('05/05/2012','dd/mm/yyyy'),24678956);
insert into temp values(to_date('01/06/2012','dd/mm/yyyy'),44678956);
insert into temp values(to_date('02/06/2012','dd/mm/yyyy'),41678956);
The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY
SELECT month
,outstanding
,outstanding/(FIRST_VALUE(outstanding)
OVER (PARTITION BY to_char(month,'mm')
ORDER BY month
)) as ratio
FROM temp
ORDER BY month;
OUTPUT
MONTH OUTSTANDING RATIO
--------- ----------- ----------
01-MAY-12 35678956 1
02-MAY-12 33678956 .943944548
03-MAY-12 31678956 .887889096
04-MAY-12 27678956 .775778193
05-MAY-12 24678956 .691695015
01-JUN-12 44678956 1
02-JUN-12 41678956 .932854295
7 rows selected.
SQLFIDDLE link

Try this:
SELECT t1.month,
t1.outstanding / t2.outstanding o2
FROM your_table t1
INNER JOIN
(SELECT *
FROM your_table
WHERE trunc(MONTH, 'mm') = MONTH) t2 ON trunc(t1.MONTH, 'mm') = t2.MONTH

Related

Unpivoting for large dataset and greater number of unique columns

The pivot and unpivot functions in snowflake are not efficient for processing 30+ unique columns into row based.
Use case : I have 35 different month columns which needs to be rows based , another 35 columns will be quantity for the corresponding month .
So at the and there will be 2 columns(one for month data and another for quantity) for 70 unique columns
there would be aggregation of quantity based on month
But unpivoting is not at all efficient. The below query is scanning 15 GB of data from the main table used
select part_num ,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year,
sum(quantity) as quantities
from table_name
unpivot(dates for cols in (month_1, 30 other uniue cols)),
unpivot(quantity for cols in (qunatity_1, 30 other uniue cols)),
group by part_num, month_year
Is there any other approach to unpivot large dataset.
Thanks
Alternative approach could be using conditional aggregation:
with cte as (
select part_num
,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year
,sum(quantity) as quantities
from table_name
group by part_num, month_year
)
SELECT part_num
-- lowest date
,'2020-01' AS "2020-01"
,MAX(IFF(month_year='2020-01', quantities, NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,'2022-04' AS "2022-04"
,MAX(IFF(month_year='2022-04', quantities, NULL) AS "quantities_2022-04"
FROM cte
GROUP BY part_num;
Version using single GROUP BY and TO_VARCHAR with format:
SELECT part_num
-- lowest date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2020-01',NULL) AS "2020-01"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2020-01',quantities,NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2022-04',NULL) AS "2022-04"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2022-04',quantities,NULL) AS "quantities_2022-04"
FROM table_name
GROUP BY part_num;
So if we get some example DATA and test is what is happening is what is wanted..
Here is a trival and tiny CTE worth of data
with table_name(part_num, month_1, month_2, month_3, qunatity_1, qunatity_2, qunatity_3) as (
select * from values
(1, '2022-01-01'::date, '2022-02-01'::date, '2022-03-01'::date, 4, 5, 6)
)
now pointing your SQL at it (after making it compile)
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(quantity) as quantities
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(quantity for quan in (qunatity_1, qunatity_2, qunatity_3))
group by part_num, month_year
gives:
PART_NUM
MONTH_YEAR
QUANTITIES
1
2022-01
15
1
2022-02
15
1
2022-03
15
which is not what I think you are after.
If we look at the un aggregated rows:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_1
2022-01-01
QUNATITY_2
5
1
MONTH_1
2022-01-01
QUNATITY_3
6
1
MONTH_2
2022-02-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_2
2022-02-01
QUNATITY_3
6
1
MONTH_3
2022-03-01
QUNATITY_1
4
1
MONTH_3
2022-03-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
we are getting a cross join, which is not what I believe you are wanting.
my understanding is you want a relationship between month (1-35) and quantity (1-35)
thus a mix like:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
Guessed Answer:
My guess at what you really are wanting is:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,array_construct(qunatity_1, qunatity_2, qunatity_3)[split_part(month,'_',2)::number - 1] as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
order by 1,2;
which gives (for the same above CTE data):
PART_NUM
MONTH_YEAR
QUNATITY
1
2022-01
4
1
2022-02
5
1
2022-03
6
Another way to way to get than guessed answer:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(iff(split_part(month,'_',2)=split_part(q_name,'_',2), q_val, null)) as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(q_val for q_name in (qunatity_1, qunatity_2, qunatity_3))
group by 1,2
order by 1,2;
which uses the double unpivot, so might be slow, but then only aggregates the values if they match. Which feels somewhat almost as gross as the build an array, to rip it apart, but that version is not needing to do large joins, just some per row grossness.
Assuming your data is already aggregated at part_num level, you could divide and conquer like this
with year_month as
(select a.part_num, b.index+1 as month_num, left(b.value,7) as year_month
from my_table a,table(flatten(input=>array_construct(m1,m2,m3...))) b),
quantities as
(select a.part_num, b.index+1 as month_num, b.value::int as quantity
from my_table a,table(flatten(input=>array_construct(q1,q2,q3...))) b)
select a.part_num, a.year_month, b.quantity
from year_month a
join quantities b on a.part_num=b.part_num and a.month_num=b.month_num

How to calculate average monthly number of some action in some perdion in Teradata SQL?

I have table in Teradata SQL like below:
ID trans_date
------------------------
123 | 2021-01-01
887 | 2021-01-15
123 | 2021-02-10
45 | 2021-03-11
789 | 2021-10-01
45 | 2021-09-02
And I need to calculate average monthly number of transactions made by customers in a period between 2021-01-01 and 2021-09-01, so client with "ID" = 789 will not be calculated because he made transaction later.
In the first month (01) were 2 transactions
In the second month was 1 transaction
In the third month was 1 transaction
In the nineth month was 1 transactions
So the result should be (2+1+1+1) / 4 = 1.25, isn't is ?
How can I calculate it in Teradata SQL? Of course I showed you sample of my data.
SELECT ID, AVG(txns) FROM
(SELECT ID, TRUNC(trans_date,'MON') as mth, COUNT(*) as txns
FROM mytable
-- WHERE condition matches the question but likely want to
-- use end date 2021-09-30 or use mth instead of trans_date
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id, mth) mth_txn
GROUP BY id;
Your logic translated to SQL:
--(2+1+1+1) / 4
SELECT id, COUNT(*) / COUNT(DISTINCT TRUNC(trans_date,'MON')) AS avg_tx
FROM mytable
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id;
You should compare to Fred's answer to see which is more efficent on your data.

Count distinct customers who bought in previous period and not in next period Bigquery

I have a dataset in bigquery which contains order_date: DATE and customer_id.
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
and the next periods shouldn't include the previous.
I tried the code below but it works in another way
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
Thanks in advance.
If I understand correctly, you are trying to count values on a window of time, and for that I recommend using window functions - docs here and here a great article explaining how it works.
That said, my recommendation would be:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
I believe from this you can build some customizations to improve the aggregations you want.
You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;

SQL - Get column value based on another column's average between select rows

I've got a table something like..
[DateValueField][Hour][Value]
2014-09-01 1 200
...
2014-09-01 24 400
2014-09-02 1 220
...
2014-09-02 24 200
...
I need the same value for each DateValueField based on the average Value for Hour between 6-12 for example but have that display for all hours, not just 6-12. For instance...
[DateValueField][Hour][Value]
2014-09-01 1 300
...
2014-09-01 24 300
2014-09-02 1 190
...
2014-09-02 24 190
...
Query I'm trying is...
select DateValueField, Hour,
(select avg(Value) as Value from MyTable where Hour
between 6 and 12) as Value from MyTable
where DateValueField between '2014' and '2015'
group by DateValueField, Hour
order by DateValueField, Hour
But it gives me the Value as an average of ALL Values but I need it averaged out for that particular day between the hours I specify.
I'd appreciate some help/advice. Thanks!
You can use a derived table to get the average value between hours 6 and 12 grouped by date and then join that to your original table
select t1.DateValueField, t1.Hour, t2.avg_value
from MyTable t1
join (
select DateValueField, avg(Value) avg_value
from MyTable
where hour between 6 and 12
group by DateValueField
) t2 on t2.DateValueField = t1.DateValueField
order by t1.DateValueField, t1.Hour
Note: You may want to use a left join if some of your dates don't have values between hours 6 and 12 but you still want to retrieve all rows from MyTable.

Calculate MonthlyVolume - PIVOT SQL

I am using SQL server 2008 and I have following Table with millions of rows...Here are few sample records
Serial_Num ReadingDate M_Counter Dyn_Counter
XYZ 3/15/2014 100 190
XYZ 4/18/2014 140 240
XYZ 5/18/2014 200 380
ABC 3/12/2014 45 40
ABC 4/19/2014 120 110
ABC 5/21/2014 130 155
This table will always have only one reading for each month and no missing months....
and I would like calculate M_Counter and Dyn_Counter values for each month, For an example XYZ -> May month calculated counter value should be 60 = 200 (05/18/2014 value) - 140 (04/18/2014 value). I would like to insert data into another table in following way.
CalculatedYear CalculatedMonth Serial_Num M_Counter_Calc Dyn_Counter_Calc
2014 4 XYZ 40 50
2014 5 XYZ 60 140
2014 4 ABC 75 70
2014 5 ABC 10 45
Any help really appreciated!
If you're using MS SQL, something like this should work. The concept is to sort the dataset based on Serial_Num and ReadingDate. Add a sequential Row ID and store into a temp table. Join the table onto itself such that you match up the current row with the previous row where the serial numbers still match. If there wasn't a prior month's reading, the value will be null. We use Isnull( x, 0) to account for this when doing the calculations.
declare #Temp1 table
(
RowID int,
Serial_Num varchar(3),
ReadingDate datetime,
M_Counter int,
Dyn_Counter int
)
insert into #Temp1
select ROW_NUMBER() over (order by Serial_Num, ReadingDate), *
from MyTable T
select
Year(T1.ReadingDate) As CalculatedYear,
Month(T1.ReadingDate) as CalculatedMonth,
T1.Serial_Num,
T1.M_Counter - ISNULL(T2.M_Counter,0) as Calculated_M_Counter,
T1.Dyn_Counter - isnull(T2.Dyn_Counter,0) as Calculated_Dyn_Counter
from #Temp1 T1
left outer join #Temp1 T2 on T1.RowID = T2.RowID + 1 and T1.Serial_Num = T2.Serial_Num
order by T1.Serial_Num, Year(T1.ReadingDate), Month(T1.ReadingDate)