Bigquery: new column - sql

I have the following table structure
+----+-------------+------------+
| id | transaction | time |
+----+-------------+------------+
| 1 | 10 | 01.01.2018 |
| 1 | 20 | 10.01.2018 |
| 2 | 20 | 05.01.2018 |
| 2 | 30 | 15.01.2018 |
| 2 | 5 | 03.02.2018 |
+----+-------------+------------+
What I want to do now, is to calculate the sum of transaction for each id. However, I would like to do it with a rolling sum for each let's say month of time separately. So I would like to end with something like:
+----+-------+-------+
| id | sum_1 | sum_2 |
+----+-------+-------+
| 1 | 30 | 30 |
| 2 | 50 | 55 |
+----+-------+-------+
So that means, I would like to group time monthly, and calculate the sum for each id up to this point. So it's not like a classic partition I assume. Of course I could just do it separately and then join, but as I have quite many monthly or maybe weekly partitions, this might not be feasible. Maybe someone has an idea.

Below is example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 10 transaction, '01.01.2018' time UNION ALL
SELECT 1, 20, '10.01.2018' UNION ALL
SELECT 2, 20, '05.01.2018' UNION ALL
SELECT 2, 30, '15.01.2018' UNION ALL
SELECT 2, 5, '03.02.2018'
)
SELECT id, month,
SUM(transactions) OVER(PARTITION BY id ORDER BY month) rolling_transactions
FROM (
SELECT id,
DATE_TRUNC(PARSE_DATE('%d.%m.%Y', time), MONTH) month,
SUM(transaction) transactions
FROM `project.dataset.table`
GROUP BY id, month
)
ORDER BY id, month
with result as
Row id month rolling_transactions
1 1 2018-01-01 30
2 2 2018-01-01 50
3 2 2018-02-01 55
It is more recommended to have flatten result as it scales to any number of months or weeks or whatever else time period you need and then you can further pivot result in your application
Note: for weekly case - just change MONTH to WEEK in DATE_TRUNC

Related

recursive moving average with sql

supose we have the next table:
table example
and what i need is:
frst iteration: calculate the moving average 5 days before the last day including the last day = (2+1+2+3+4)/5 = 2.4 and "save" this result, that result will be a prediction for the next day.
scnd iteration: calculate the moving average 5 days before the last, day where the last day basal cell is the value calculated in the previous iteration. (1+2+3+4+2.4)/5 = 2.48
..
and so on.. the recursion will stop for a concrete future day for example: 2022-12-9
deseable output for future day: 2022-12-9
| date_ | art_id | basal_sell |
| ------------| -----------|------------|
| 2022-12-01 | 1 | 2 |
| 2022-12-02 | 1 | 1 |
| 2022-12-03 | 1 | 2 |
| 2022-12-04 | 1 | 3 |
| 2022-12-05 | 1 | 4 |
| 2022-12-06 | 1 | 2.4 |
| 2022-12-07 | 1 | 2.48 |
| 2022-12-08 | 1 | 2.776 |
| 2022-12-09 | 1 | 2.9312 |
this is the partial problem, in the real problem will be a bunch of arts_ids but i think the idea for this parcial problem will be the solution for the big problem (with some little changes).
what i think:
I thought a recursive cte where in the recursive part of the cte i have a union that will be union the temporary table with the new row that i calculated.
Something like:
with MiCte as (
select *
from sells
union all
(
select * from MiCte
)
union
(
select dateadd(day, 1, date_), art_id, basal_sell
from(
select top 1 c.date_, c.art_id,
AVG(c.basal_sell) OVER (partition by c.art_id
ORDER BY c.date_
rows BETWEEN 4 PRECEDING AND current row) basal_sell
from MiCte c
order by c.date_ desc
) as tmp
)
) select * from MiCte
Obviously if I contemplate having more than one art_id I have to take this into account when making top 1 (which I still couldn't think of how to solve).
the example table:
CREATE TABLE sells
(date_ DATETIME,
art_id int,
basal_sell int)
;
INSERT INTO sells
(date_, art_id , basal_sell)
VALUES ('2022-12-1', 1, 2),
('2022-12-2', 1, 1),
('2022-12-3', 1, 2),
('2022-12-4', 1, 3),
('2022-12-5', 1, 4);

How do I summarize sales data in SQL by month for last 24months?

I have big number of rows with sales for different products on various days.
I want to retrieve the sum for each product and per month. For the last 24months.
How do I write a WHERE function showing the last 24 months (based on latest date in table not actual date)?
How is that summarized and shown by month instead of individual days like 2018-01-24?
**Sample Data Table**
| SalesDate | Product | SLSqty |
| 2018-01-24 | Product A | 25 |
| 2019-06-10 | Product B | 10 |
| 2019-10-07 | Product C | 4 |
| 2020-03-05 | Product A | 20 |
| 2021-09-01 | Product A | 50 |
| 2021-09-01 | Product B | 10 |
| 2021-09-02 | Product C | 3 |
| 2021-09-04 | Product A | 50 |
| 2021-09-07 | Product B | 10 |
**Expected Result**
| SalesMONTH | Product | SLSqty |
| 2019-10-31 | Product C | 4 |
| 2020-03-31 | Product A | 20 |
| 2021-09-30 | Product A | 100|
| 2021-09-30 | Product A | 20 |
| 2021-09-30 | Product B | 3 |
I would make a parameter that stores the value of the latest date in your table. Then you can impute the parameter in you WHERE clause.
IF OBJECT_ID('TEMPDB..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #TEMP(
[SalesDate] DATE
,[product] NVARCHAR(20)
,[SLSqty] INT
)
INSERT INTO #TEMP([SalesDate],[product],[SLSqty])
VALUES('2018-01-24','Product A',25)
,('2019-06-10','Product B',10)
,('2019-10-07','Product C',4 )
,('2020-03-05','Product A',20)
,('2021-09-01','Product A',50)
,('2021-09-01','Product B',10)
,('2021-09-02','Product C',3 )
,('2021-09-04','Product A',50)
,('2021-09-07','Product B',10)
DECLARE #DATEVAR AS DATE = (SELECT MAX(#TEMP.SalesDate) FROM #TEMP)
The last line declares the variable. If you select #DATEVAR, you get the output of a single date defined by the select statement:
Then you impute it into a where clause. Since you want 24 months prior to the latest date, I would use a DATEDIFF(MONTH,,) function in your where clause. It outputs an integer of months and you simply constrain it to be 24 months or less.
SELECT #TEMP.SalesDate
,#TEMP.product
,#TEMP.SLSqty
,DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) [# of months Diff]
FROM #TEMP
WHERE DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) <= 24
OUTPUT:
Now you have to aggregate the sales grouped by the year-month and product.
I compute year-month by calculating an integer like 202109 (Sept. 2021)
SELECT --#TEMP.SalesDate --(YOU HAVE TO TAKE THIS OUT FOR THE GROUP BY)
YEAR(#TEMP.SalesDate)*100+MONTH(#TEMP.SalesDate) [year-month for GROUP BY]
,#TEMP.product
,SUM(#TEMP.SLSqty) SLSqty
-- ,DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) [# of months Diff] --(YOU HAVE TO TAKE THIS OUT FOR THE GROUP BY)
FROM #TEMP
WHERE DATEDIFF(MONTH,#TEMP.SalesDate,#DATEVAR) <= 24
GROUP BY YEAR(#TEMP.SalesDate)*100+MONTH(#TEMP.SalesDate)
,#TEMP.product
Output:
Here is some oracle sql:
With data ( SalesDate,Product,SLSqty)as(
Select to_date('2018-01-24'),'Product A',25 from dual union all
Select to_date('2019-06-10'),'Product B',10 from dual union all
Select to_date('2019-10-07'),'Product C',4 from dual union all
Select to_date('2020-03-05'),'Product A',20 from dual union all
Select to_date('2021-09-01'),'Product A',50 from dual union all
Select to_date('2021-09-01'),'Product B',10 from dual union all
Select to_date('2021-09-02'),'Product C',3 from dual union all
Select to_date('2021-09-04'),'Product A',50 from dual union all
Select to_date('2021-09-07'),'Product B',10 from dual),
theLatest(SalesDate) as(
select max(SalesDate) from data
)
select to_char(d.SalesDate,'YYYY-MM'),d.Product, sum(SLSqty)
from data d
Join theLatest on d.SalesDate >= add_months(theLatest.SalesDate,-24)
group by to_char(d.SalesDate,'YYYY-MM'),d.Product
order by to_char(d.SalesDate,'YYYY-MM')

SQL Query to apply a command to multiple rows

I am new to SQL and trying to write a statement similar to a 'for loop' in other languages and am stuck. I want to filter out rows of the table where for all of attribute 1, attribute2=attribute3 without using functions.
For example:
| Year | Month | Day|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 4 |
| 2 | 3 | 3 |
| 2 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
I would only want the row
| Year | Month | Day|
|:---- |:------:| -----:|
| 3 | 4 | 4 |
because it is the only where month and day are equal for all of the values of year they share.
So far I have
select year, month, day from dates
where month=day
but unsure how to apply the constraint for all of year
-- month/day need to appear in aggregate functions (since they are not in the GROUP BY clause),
-- but the HAVING clause ensure we only have 1 month/day value (per year) here, so MIN/AVG/SUM/... would all work too
SELECT year, MAX(month), MAX(day)
FROM my_table
GROUP BY year
HAVING COUNT(DISTINCT (month, day)) = 1;
year
max
max
3
4
4
View on DB Fiddle
So one way would be
select distinct [year], [month], [day]
from [Table] t
where [month]=[day]
and not exists (
select * from [Table] x
where t.[year]=x.[year] and t.[month] <> x.[month] and t.[day] <> x.[day]
)
And another way would be
select distinct [year], [month], [day] from (
select *,
Lead([month],1) over(partition by [year] order by [month])m2,
Lead([day],1) over(partition by [year] order by [day])d2
from [table]
)x
where [month]=m2 and [day]=d2

calculate running total with a date range in where clause using oracle sql

I have a table and I want to compute the running total with a date range in where clause
+----------------+------------------+----------+
| Transaction ID | Transaction Date | Quantity |
+----------------+------------------+----------+
| 1 | 03-May-20 | 3 |
| 2 | 06-May-20 | 5 |
| 3 | 05-Jun-20 | 10 |
| 4 | 06-Jul-20 | 2 |
| 5 | 07-Aug-20 | 8 |
+----------------+------------------+----------+
Now my oracle sql query is
select
transaction_id,
transaction_date,
sum(quantity) over (order by transaction_date) as running_total
from table
where transaction_date >= '03-May-20' and transaction_date <= '06-Jul-20'
After I run the query above I got the result below
+----------------+------------------+----------+---------------+
| Transaction ID | Transaction Date | Quantity | Running Total |
+----------------+------------------+----------+---------------+
| 1 | 03-May-20 | 3 | 28 |
| 2 | 06-May-20 | 5 | 33 |
| 3 | 05-Jun-20 | 10 | 43 |
| 4 | 06-Jul-20 | 2 | 45 |
+----------------+------------------+----------+---------------+
Which is wrong because the output I want is:
| Transaction ID | Transaction Date | Quantity | Running Total |
+----------------+------------------+----------+---------------+
| 1 | 03-May-20 | 3 | 3 |
| 2 | 06-May-20 | 5 | 8 |
| 3 | 05-Jun-20 | 10 | 18 |
| 4 | 06-Jul-20 | 2 | 20 |
+----------------+------------------+----------+---------------+
Please help on what query should I used to compute the running total with a date range in where clause.
Hi, I tried to use the to_date function, but I got the same result as above (2nd table). Please see the query below,
select
transaction_id,
transaction_date,
sum(quantity) over (order by transaction_date) as running_total
from table
where transaction_date between TO_DATE('2019-03-01', 'YYYY-MM-DD') AND TO_DATE('2020-10-25', 'YYYY-MM-DD')
Thanks,
Output you want (sample data in lines #1 - 7; query begins at line #8).
SQL> with test (transaction_id, transaction_date, quantity) as
2 (select 1, date '2020-05-03', 3 from dual union all
3 select 2, date '2020-05-06', 5 from dual union all
4 select 3, date '2020-06-05', 10 from dual union all
5 select 4, date '2020-07-06', 2 from dual union all
6 select 5, date '2020-08-07', 8 from dual
7 )
8 select transaction_id,
9 transaction_date,
10 quantity,
11 sum(quantity) over (order by transaction_date) running_total
12 From test
13 where transaction_date between date '2020-05-03' and date '2020-07-06'
14 order by transaction_id;
TRANSACTION_ID TRANSACTIO QUANTITY RUNNING_TOTAL
-------------- ---------- ---------- -------------
1 03.05.2020 3 3
2 06.05.2020 5 8
3 05.06.2020 10 18
4 06.07.2020 2 20
SQL>
Reason of your problems? I presume it is the fact that you're comparing dates to strings ('03-May-20' is a string, not a date). Oracle tries to implicitly convert datatypes; sometimes it succeeds, sometimes not. Your 03-May-20 might be 3rd of May 2020 or 20th of May 2003, for example.
Always have control over your data. Use dates when necessary, either by using to_date function with appropriate date format mask, or use date literal (as I did).
Your query is "correct" but your results are not, they contain the expected Running Total but it starts from the final Total (including transaction id 5). You ran a different SQL statement.
Note that you shouldn't be storing dates as strings or comparing them to strings. That said, Oracle will be saving your bacon if your current NLS date format setting (which is very easy to change) matches your string input. You would be getting completely different results if you were ordering by a string, your results have the expected transaction date values so one would assume that this comparison has worked. You should still fix your handling of dates, but that is not the problem here.
Something else is going on. The data is being summed correctly in terms of dates, so the date ordering does not appear to be an issue.
The total sum is starting at the overall sum minus the first value. So, the results look like the result of this query:
select transaction_id, transaction_date, quantity,
total_quantity + sum(quantity) over (order by transaction_date)as running_total
From (select t.*,
sum(quantity) over () - first_value(quantity) over (order by transaction_date) as total_quantity
from test t
) t
where transaction_date between date '2020-05-03' and date '2020-07-06'
order by transaction_date;
I cannot explain why this would be happening. It might be a bug in Oracle. Or it might be the result of an oversimplification of your query when you asked the question.

Oracle SQL Conditional Arithmetic

I have a data set that lists an employee id, code, hours and wages. Any 1 employee can have 1 of either OT1 or OT2, or they could have 1 row of both. The short of it is that I need to sum all of the wages, but if they have both codes to just take the amount for OT1. Then I want to divide total wages by the hours in the condition I stated. Example Data:
+ -------+------+-------+--------+
| ID | CODE | HOURS | AMOUNT |
+ -------+------+-------+--------+
| 123456 | OT1 | 10 | 80 |
| 789000 | OT1 | 8 | 120 |
| 789000 | OT2 | 8 | 60 |
| 654111 | OT2 | 4 | 40 |
+ -------+------+-------+--------+
I'm attempting to add a new column to divide the amount by the hours and will remove the code column so we can sum each employee to have a single record. The catch is, if the employee has both OT1 and OT2, I don't want to sum those , I just want the hours from OT1. That logic manually applied to my previous example
+ -------+-------+--------+---------+
| ID | HOURS | AMOUNT | AVERAGE |
+ -------+-------+--------+---------+
| 123456 | 10 | 80 | 8 |
| 789000 | 8 | 180 | 22.5 |
| 654111 | 4 | 40 | 10 |
+ -------+-------+--------+---------+
You can do this using conditional aggregation:
select id,
coalesce(sum(case when code = 'OT1' then hours end),
sum(hours)
) as hours,
sum(amount) as amount,
(sum(amount) /
coalesce(sum(case when code = 'OT1' then hours end),
sum(hours)
)
) as average
from t
group by id
order by id;
This method explicitly combines values from multiple rows, so it should work as expected if there are duplicates.
You get the hours for the first code with Oracle's KEEP FIRST:
select
id,
min(hours) keep (dense_rank first order by code) as hours,
sum(amount) as amount,
round(sum(amount) / min(hours) keep (dense_rank first order by code), 2) as average
from mytable
group by id
order by id;