Calculate exact month-difference between two dates - sql

DB-Fiddle
CREATE TABLE inventory
(
id SERIAL PRIMARY KEY,
inventory_date DATE,
product_name VARCHAR(255),
product_value VARCHAR(255)
);
INSERT INTO inventory (inventory_date, product_name, product_value)
VALUES ('2020-10-19', 'Product_A', '400'),
('2020-10-22', 'Product_B', '400'),
('2020-11-20', 'Product_C', '900'),
('2020-11-25', 'Product_D', '300');
Expected result:
product_name | months_in_inventory
-------------+--------------------
Product_A | 2
Product_B | 1
Product_C | 1
Product_D | 0
I want to calculate the months_in_inventory by calculating the difference between a fixed_date and the inventory_date.
In the example the fixed_date is '2020-12-20' and I am using it my query.
So far I am able to calculate the difference in days:
SELECT
iv.product_name,
'2020-12-20'::date - MAX(iv.inventory_date::date) AS days_in_inventory
FROM
inventory iv
GROUP BY
1
ORDER BY
1;
However, I could not figure out how to change it to a difference in month. Do you have any idea?
NOTE
I know that one way to approach this would be extracting the month from the fixed_date and inventory_date and subtract both numbers. However, this would not give me the correct result because I need it exactly based on the dates.
For example Product_B is only 1 month in inventory because 2020-10-22 is not two months compared to 2020-12-20.

You can use age(). If the value is always less than 12 months, then one method is:
SELECT iv.product_name,
extract(month form age('2020-12-20'::date, MAX(iv.inventory_date::date))) AS months_in_inventory
FROM inventory iv
GROUP BY 1
ORDER BY 1;
A more accurate calculation takes the year into account:
SELECT iv.product_name,
(extract(year from age('2020-12-20'::date, MAX(iv.inventory_date::date))) * 12 +
extract(month from age('2020-12-20'::date, MAX(iv.inventory_date::date)))
) AS months_in_inventory
FROM inventory iv
GROUP BY 1
ORDER BY 1;
Here is a db<>fiddle.

Related

Unpivoting for large dataset and greater number of unique columns

The pivot and unpivot functions in snowflake are not efficient for processing 30+ unique columns into row based.
Use case : I have 35 different month columns which needs to be rows based , another 35 columns will be quantity for the corresponding month .
So at the and there will be 2 columns(one for month data and another for quantity) for 70 unique columns
there would be aggregation of quantity based on month
But unpivoting is not at all efficient. The below query is scanning 15 GB of data from the main table used
select part_num ,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year,
sum(quantity) as quantities
from table_name
unpivot(dates for cols in (month_1, 30 other uniue cols)),
unpivot(quantity for cols in (qunatity_1, 30 other uniue cols)),
group by part_num, month_year
Is there any other approach to unpivot large dataset.
Thanks
Alternative approach could be using conditional aggregation:
with cte as (
select part_num
,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year
,sum(quantity) as quantities
from table_name
group by part_num, month_year
)
SELECT part_num
-- lowest date
,'2020-01' AS "2020-01"
,MAX(IFF(month_year='2020-01', quantities, NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,'2022-04' AS "2022-04"
,MAX(IFF(month_year='2022-04', quantities, NULL) AS "quantities_2022-04"
FROM cte
GROUP BY part_num;
Version using single GROUP BY and TO_VARCHAR with format:
SELECT part_num
-- lowest date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2020-01',NULL) AS "2020-01"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2020-01',quantities,NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2022-04',NULL) AS "2022-04"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2022-04',quantities,NULL) AS "quantities_2022-04"
FROM table_name
GROUP BY part_num;
So if we get some example DATA and test is what is happening is what is wanted..
Here is a trival and tiny CTE worth of data
with table_name(part_num, month_1, month_2, month_3, qunatity_1, qunatity_2, qunatity_3) as (
select * from values
(1, '2022-01-01'::date, '2022-02-01'::date, '2022-03-01'::date, 4, 5, 6)
)
now pointing your SQL at it (after making it compile)
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(quantity) as quantities
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(quantity for quan in (qunatity_1, qunatity_2, qunatity_3))
group by part_num, month_year
gives:
PART_NUM
MONTH_YEAR
QUANTITIES
1
2022-01
15
1
2022-02
15
1
2022-03
15
which is not what I think you are after.
If we look at the un aggregated rows:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_1
2022-01-01
QUNATITY_2
5
1
MONTH_1
2022-01-01
QUNATITY_3
6
1
MONTH_2
2022-02-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_2
2022-02-01
QUNATITY_3
6
1
MONTH_3
2022-03-01
QUNATITY_1
4
1
MONTH_3
2022-03-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
we are getting a cross join, which is not what I believe you are wanting.
my understanding is you want a relationship between month (1-35) and quantity (1-35)
thus a mix like:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
Guessed Answer:
My guess at what you really are wanting is:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,array_construct(qunatity_1, qunatity_2, qunatity_3)[split_part(month,'_',2)::number - 1] as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
order by 1,2;
which gives (for the same above CTE data):
PART_NUM
MONTH_YEAR
QUNATITY
1
2022-01
4
1
2022-02
5
1
2022-03
6
Another way to way to get than guessed answer:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(iff(split_part(month,'_',2)=split_part(q_name,'_',2), q_val, null)) as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(q_val for q_name in (qunatity_1, qunatity_2, qunatity_3))
group by 1,2
order by 1,2;
which uses the double unpivot, so might be slow, but then only aggregates the values if they match. Which feels somewhat almost as gross as the build an array, to rip it apart, but that version is not needing to do large joins, just some per row grossness.
Assuming your data is already aggregated at part_num level, you could divide and conquer like this
with year_month as
(select a.part_num, b.index+1 as month_num, left(b.value,7) as year_month
from my_table a,table(flatten(input=>array_construct(m1,m2,m3...))) b),
quantities as
(select a.part_num, b.index+1 as month_num, b.value::int as quantity
from my_table a,table(flatten(input=>array_construct(q1,q2,q3...))) b)
select a.part_num, a.year_month, b.quantity
from year_month a
join quantities b on a.part_num=b.part_num and a.month_num=b.month_num

Count distinct customers who bought in previous period and not in next period Bigquery

I have a dataset in bigquery which contains order_date: DATE and customer_id.
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
and the next periods shouldn't include the previous.
I tried the code below but it works in another way
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
Thanks in advance.
If I understand correctly, you are trying to count values on a window of time, and for that I recommend using window functions - docs here and here a great article explaining how it works.
That said, my recommendation would be:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
I believe from this you can build some customizations to improve the aggregations you want.
You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;

SQL query: get total values for each month

I have a table that stores, number of fruits sold on each day. Stores number of items sold on particular date.
CREATE TABLE data
(
code VARCHAR2(50) NOT NULL,
amount NUMBER(5) NOT NULL,
DATE VARCHAR2(50) NOT NULL,
);
Sample data
code |amount| date
------+------+------------
aple | 1 | 01/01/2010
aple | 2 | 02/02/2010
orange| 3 | 03/03/2010
orange| 4 | 04/04/2010
I need to write a query, to list out, how many apple and orange sold for jan and february?
--total apple for jan
select sum(amount) from mg.drum d where date >='01/01/2010' and cdate < '01/02/2020' and code = 'aple';
--total apple for feb
select sum(amount) from mg.drum d where date >='01/02/2010' and cdate < '01/03/2020' and code = 'aple';
--total orange for jan
select sum(amount) from mg.drum d where date >='01/01/2010' and cdate < '01/02/2020' and code = 'orange';
--total orange for feb
select sum(amount) from mg.drum d where date >='01/02/2010' and cdate < '01/03/2020' and code = 'orange';
If I need to calculate for more months, more fruits, its tedious.is there a short query to write?
Can I combine at least for the months into 1 query? So 1 query to get total for each month for 1 fruit?
You can use conditional aggregation such as
SELECT TO_CHAR("date",'MM/YYYY') AS "Month/Year",
SUM( CASE WHEN code = 'apple' THEN amount END ) AS apple_sold,
SUM( CASE WHEN code = 'orange' THEN amount END ) AS orange_sold
FROM data
WHERE "date" BETWEEN date'2020-01-01' AND date'2020-02-29'
GROUP BY TO_CHAR("date",'MM/YYYY')
where date is a reserved keyword, cannot be a column name unless quoted.
Demo
select sum(amount), //date.month
from mg.drum
group by //date.month
//data.month Here you can give experssion which will return month number or name.
If you are dealing with months, then you should include the year as well. I would recommend:
SELECT TRUNC(date, 'MON') as yyyymm, code,
SUM(amount)
FROM t
GROUP BY TRUNC(date, 'MON'), code;
You can add a WHERE clause if you want only some dates or codes.
This will return a separate row for each row that has data. That is pretty close to the results from your four queries -- but this does not return 0 values.
select to_char(date_col,'MONTH') as month, code, sum(amount)
from mg.drum
group by to_char(date_col,'MONTH'), code

Avoid cartesian product using sum

I want to sum up the stake from tickets table, grouping it by customer_id and date_trunc('day') from bonus table.
The problem is that rows are being multiplied and I don't know how to solve it.
https://www.db-fiddle.com/f/yWCvFamMAY9uGtoZupiAQ/4
CREATE TABLE tickets (
ticket_id integer,
customer_id integer,
stake integer,
reg_date date
);
CREATE TABLE bonus (
bonus_id integer,
customer_id integer,
reg_date date
);
insert into tickets
values
(1,100, 12,'2019-01-10 11:00'),
(2,100, 10,'2019-01-10 12:00'),
(3,100, 30,'2019-01-10 13:00'),
(4,100, 10,'2019-01-11 14:00'),
(5,100, 15,'2019-01-11 15:00'),
(6,102, 25,'2019-01-10 10:00'),
(7,102, 25,'2019-01-10 11:10'),
(8,102, 13,'2019-01-11 12:40'),
(9,102, 9,'2019-01-12 15:00'),
(10,102, 7,'2019-01-13 18:00'),
(13,103, 15,'2019-01-12 19:00'),
(14,103, 11,'2019-01-12 22:00'),
(15,103, 11,'2019-01-14 02:00'),
(16,103, 11,'2019-01-14 10:00')
;
insert into bonus
values
(200,100,'2019-01-10 05:00'),
(201,100,'2019-01-10 06:00'),
(202,100,'2019-01-10 15:00'),
(203,100,'2019-01-10 15:50'),
(204,100,'2019-01-10 16:10'),
(205,100,'2019-01-10 16:15'),
(206,100,'2019-01-10 16:22'),
(207,100,'2019-01-11 10:10'),
(208,100,'2019-01-11 16:10'),
(209,102,'2019-01-10 10:00'),
(210,102,'2019-01-10 11:00'),
(211,102,'2019-01-10 12:00'),
(212,102,'2019-01-10 13:00'),
(213,103,'2019-01-11 11:00'),
(214,103,'2019-01-11 18:00'),
(215,103,'2019-01-12 15:00'),
(216,103,'2019-01-12 16:00'),
(217,103,'2019-01-14 02:00')
select
customer_id,
date_trunc('day', b.reg_date),
sum(t.stake)
from tickets t
join bonus b using (customer_id)
where date_trunc('day', b.reg_date) = date_trunc('day', t.reg_date)
group by 1,2
order by 1
Output for customer 102 should be:
102,2019-01-10, 50
OK, I think you want to get the summary data of column stake in tickets table and the records's customer_id, reg_date pairs have appeared in the second table bonus, and all business has nothing to do with the bonus_id, am I right? The customer_id, reg_date pairs in bonus is duplicated, so you need a distinct on it, and then join the sum data from tickets.The complete SQL and result as below:
with stake_sum as (
select
customer_id,
reg_date,
sum(stake)
from
tickets
group by
customer_id,
reg_date
)
,bonus_date_distinct as (
select
distinct customer_id,
reg_date
from
bonus
)
select
a.*
from
stake_sum a
join
bonus_date_distinct b on a.customer_id = b.customer_id and a.reg_date = b.reg_date order by customer_id, reg_date;
customer_id | reg_date | sum
-------------+------------+-----
100 | 2019-01-10 | 52
100 | 2019-01-11 | 25
102 | 2019-01-10 | 50
103 | 2019-01-12 | 26
103 | 2019-01-14 | 22
(5 rows)

Hourly sum of values

I have a table with the following structure and sample data:
STORE_ID | INS_TIME | TOTAL_AMOUNT
2 07:46:01 20
3 19:20:05 100
4 12:40:21 87
5 09:05:08 5
6 11:30:00 12
6 14:22:07 100
I need to get the hourly sum of TOTAL_AMOUNT for each STORE_ID.
I tried the following query but i don't know if it's correct.
SELECT STORE_ID, SUM(TOTAL_AMOUNT) , HOUR(INS_TIME) as HOUR FROM VENDAS201302
WHERE MINUTE(INS_TIME) <=59
GROUP BY HOUR,STORE_ID
ORDER BY INS_TIME;
Not sure why you are not considering different days here. You could get the hourly sum using Datepart() function as below in Sql-Server:
DEMO
SELECT STORE_ID, SUM(TOTAL_AMOUNT) HOURLY_SUM
FROM t1
GROUP BY STORE_ID, datepart(hour,convert(datetime,INS_TIME))
ORDER BY STORE_ID
SELECT STORE_ID,
HOUR(INS_TIME) as HOUR_OF_TIME,
SUM(TOTAL_AMOUNT) as AMOUNT_SUM
FROM VENDAS201302
GROUP BY STORE_ID, HOUR_OF_TIME
ORDER BY INS_TIME;