Postgresql: Calculate monthly running total with some months missing - sql

I have a table like this:
| acct| month | total |
-------------------------------
| 123 | 02 | 100
| 123 | 03 | 100
| 123 | 04 | 100
| 123 | 06 | 100
| 123 | 07 | 100
I want to get a running total grouped by acct for each month. However as shown above the table does not have a record for month 5 (basically nothing changed in month 5), but I still want to create a row for month 5 that will be the same as the previous month 4 so that the result looks like:
| acct| month | total |
-------------------------------
| 123 | 02 | 100
| 123 | 03 | 200
| 123 | 04 | 300
| 123 | 05 | 300
| 123 | 06 | 400
| 123 | 07 | 500
Is there anyway to do this in Postgresql? I've explored using over and partition as described here Calculating Cumulative Sum in PostgreSQL but that is for the case where all months are present.

Assuming you really want a cumulative sum with missing months, use generate_series() to generate the dates and then left join and a cumulative sum:
select t.acct, gs.mon, sum(total) over (order by mon)
from generate_series(2, 7, 1) gs(mon) left join
t
on gs.mon = t.mon;

Related

How to get greatest Date record and multiply to that dynamically

Hello Everyone I want to get greatest Date record and multiply to that dynamically below is sample structure of my table and here is DB fiddle https://dbfiddle.uk/?rdbms=oracle_18&fiddle=cde3fdc07915a2e8c23195be646c5a20
+-----+-------------+-----------+--------+----------------+
| ID | Sequence Id | Date | Amount | Frequency |
+-----+-------------+-----------+--------+----------------+
| 123 | 1 | 01-Jan-20 | 50 | Monthly |
| 123 | 2 | 01-Feb-20 | 50 | Monthly |
| 123 | 3 | 01-Mar-20 | 150 | Monthly |
| 123 | 4 | 01-Apr-20 | 200 | Monthly |
| 123 | 5 | 01-May-20 | 510 | Monthly |
| 123 | 1 | 01-Jan-20 | 510 | Quarterly |
| 123 | 2 | 01-Apr-20 | 300 | Quarterly |
| 123 | 1 | 01-Jan-20 | 600 | Semi-Annually |
+-----+-------------+-----------+--------+----------------+
I want to retrieve data dynamically with the help of filter and want to multiply amount according to Frequency. Get greatest record according to Date and multiply amount with 12 if frequency monthly or multiply 4 if frequency Quarterly or multiply 2 if frequency Semi-Annually
Ex. 1. If we run query select ID, Rent from Table where Date is greater than or equal 01-jan-2020 and less than or equal to 01-may-2020 and frequency equal to Monthly then out put should be like below -
+-----+-------------+
| ID | Rent |
+-----+-------------+
| 123 | 6,120 |
+-----+-------------+
2. If we run query select ID,Rent from Table where Date is greater than or equal 01-jan-2020 and less than or equal to 01-may-2020 and frequency equal to Quarterly then out put should be like below -
+-----+-------------+
| ID | Rent |
+-----+-------------+
| 123 | 1200 |
+-----+-------------+
3. If we run query select ID,Rent from Table where Date is greater than or equal 01-jan-2020 and less than or equal to 01-may-2020 and frequency equal to Semi-Annually then out put should be like below -
+-----+-------------+
| ID | Rent |
+-----+-------------+
| 123 | 1200 |
+-----+-------------+
If you want this over multiple ids and frequencies at once, then you can use row_number() like so:
select id,
amount * case frequency
when 'Monthly' then 12
when 'Quaterly' then 4
when 'Semi-Annually' then 2
end as rent
from (
select t.*, row_number() over(partition by id, frequency order by StartDate desc) rn
from table1 t
where StartDate between date '2020-01-01' and date '2020-05-01' and frequency = 'Monthly'
) t
where rn = 1

SQL | Display all rows where date is not current month

I have two tables
In one table there are my employees and when they changed the Department
In the second table there is my current date
Employee Table
+------------------+--------+-------------+-----------------+
| Personal Number | Salary | Department | MonthWhenJoined |
+------------------+--------+-------------+-----------------+
| 224 | 1000 | HR | 03 |
| 224 | 1500 | R&D | 07 |
| 578 | 1200 | Sales | 04 |
| 578 | 2000 | Engineering | 09 |
| 694 | 1400 | R&D | 04 |
| 694 | 1500 | Sales | 08 |
+------------------+--------+-------------+-----------------+
Table with current Date
+------------+-----+-------+------+
| Date | Day | Month | Year |
+------------+-----+-------+------+
| 01.09.2019 | 01 | 09 | 2019 |
+------------+-----+-------+------+
Now I want to only see all Employee that have no 'MonthWhenJoined' equal to the current Month.
So the Result would be something like this
+------------------+--------+-------------+-----------------+
| Personal Number | Salary | Departement | MonthWhenJoined |
+------------------+--------+-------------+-----------------+
| 224 | 1000 | HR | 03 |
| 224 | 1500 | R&D | 07 |
| 694 | 1400 | R&D | 04 |
| 694 | 1500 | Sales | 08 |
+------------------+--------+-------------+-----------------+
I know it can not be that hard, but I cant figure it out …
Thank you for your help!
in this case I would join two tables putting those dates as different on it
SELECT * FROM Employee
WHERE personalNumber NOT IN
(SELECT personalNumber
FROM Emplayee e
JOIN currentDate d ON e.MonthWhenJoined = d.month)
SELECT *
FROM Employee
WHERE PersonalNumber NOT IN (
SELECT PersonaNumber
FROM Employee
WHERE MonthWhenJoined =
SELECT Month
FROM currentDate
)
Simple,
SELECT
E.[Personal Number],
E.[Salary],
E.[Department],
E.[MonthWhenJoined]
FROM
[someSchema].[Employee] E
LEFT JOIN
[someSchema].[CurrentDate] C
ON C.[Month] = E.[MonthWhenJoined]
WHERE
C.[Date] IS NULL;
of course, there is no way to tell if that month was in the same year as the current date.
Simple:
SELECT * FROM employee WHERE `MonthWhenJoined` NOT IN(SELECT `Month` FROM date WHERE `Year` = YEAR(CURDATE()));

Filtering after a group by produces a different outcome than MySQL

I have the following table from which I try to extract all cust_id who have bought an item for the first time in January.
I found a way with MySQL but I'm working with Hive and it doesn't work
Consider this table:
| cust_id | created | year | month | item |
|---------|---------------------|------|-------|------|
| 100 | 2017-01-01 19:20:00 | 2017 | 01 | ABC |
| 100 | 2017-01-01 19:20:00 | 2017 | 01 | DEF |
| 100 | 2017-01-08 22:45:00 | 2017 | 01 | GHI |
| 100 | 2017-08-03 08:01:00 | 2017 | 08 | JKL |
| 100 | 2017-01-01 21:23:00 | 2017 | 01 | MNO |
| 130 | 2016-12-06 06:42:00 | 2016 | 12 | PQR |
| 140 | 2017-01-21 15:01:00 | 2017 | 01 | STU |
| 130 | 2017-01-29 13:20:00 | 2017 | 01 | VWX |
| 140 | 2017-04-10 09:15:00 | 2017 | 04 | YZZ |
With the following query, it works:
SELECT
cust_id,
year,
month,
MIN(STR_TO_DATE(created, '%Y-%m-%d %H:%i:%s')) AS min_date
FROM
t1
GROUP BY
cust_id
HAVING
year = '2017'
AND
month= '01'
And it returns this table:
| cust_id | year | month | min_date |
|---------|------|-------|---------------------|
| 100 | 2017 | 01 | 2017-01-01 19:20:00 |
| 140 | 2017 | 01 | 2017-01-21 15:01:00 |
But in Hive, I cannot filter the fields year and month with HAVING if they have not been grouped by previously. In other words, the previous query fails.
Instead, the following runs but don't produce the expected result:
SELECT
cust_id,
year,
month,
MIN(unix_timestamp(created, 'yyyy-MM-dd HH:mm:ss')) AS min_date
FROM
t1
GROUP BY
cust_id, year, month
HAVING
year = '2017'
AND
month= '01'
cust_id 130 shows up even if the first purchase happened in december 2016
| cust_id | year | month | min_date |
|---------|------|-------|---------------------|
| 100 | 2017 | 01 | 2017-01-01 19:20:00 |
| 130 | 2017 | 01 | 2017-01-29 13:20:00 |
| 140 | 2017 | 01 | 2017-01-21 15:01:00 |
Here is the fiddle : SQL fiddle
Thank you
Your MySQL query doesn't really work, even if it runs. Never have "bare" columns in the group by or having or order by (of an aggregation query). All non-aggregated columns should be the arguments to an aggregation function. In your case, year and month fall into this category.
What you appear to want in either database is something like this:
SELECT cust_id
FROM t1
GROUP BY cust_id
HAVING MIN(created) >= '2017-01-01' AND
MIN(created) < '2017-02-01';

How can I get average of groupings in a table and store the result back to the original table in SQL

I have the following table:
| Country | Month | Revenue |
|---------|-------|---------|
| US | Jan | 100 |
| US | Feb | 200 |
| US | Mar | 300 |
| Canada | Jan | 200 |
| Canada | Feb | 400 |
| Canada | Mar | 500 |
I need to get average revenue per country and store this value back to the original table to get the following output:
| Country | Month | Revenue | Average |
|---------|-------|---------|---------|
| US | Jan | 100 | 200 |
| US | Feb | 200 | 200 |
| US | Mar | 300 | 200 |
| Canada | Jan | 200 | 366.6 |
| Canada | Feb | 400 | 366.6 |
| Canada | Mar | 500 | 366.6 |
What is the best way to accomplish this in SQL? Is it better to use partition by?
The best way to do this uses window functions:
select t.*, avg(revenue) over (partition by country) as avg_revenue
from t;
To actually do the computation and store it back requires an update. Although there are other methods, the following is standard SQL:
update t
set average = (select avg(revenue) from t t2 where t.country = t2.country);
EDIT:
In T-SQL, you can somewhat improve the performance by doing:
with toupdate as (
select t.*,
avg(t.revenue) over (partition by t.country) as new_average
from t
)
update toupdate
set average = new_average;

SQL Query to display % Change

I have a database with sample data represented by Table 1 below. How do I write an SQL query to display them in either Table 2 or Table 3 format?
Table 1 Table 2
Date | Value Year | Week | Total Value | % Change
------------+------- ------+-----+--|---------------|----------
19/12/2011 | 60 2012 | 1 | 295 | 656.41%
20/12/2011 | 49 2012 | 0 | 39 | -80.98%
21/12/2011 | 42 2012 | 52 | 205 | -41.76%
22/12/2011 | 57 2011 | 51 | 352 |
23/12/2011 | 88
24/12/2011 | 18 Table 3
25/12/2011 | 38 Year | Week | SUM1 | Year | Week | SUM2 | % Change
26/12/2011 | 16 ------+--------+--------+--------+--------+--------+-----------
27/12/2011 | 66 2012 | 1 | 295 | 2012 | 0 | 39 | 656.41%
28/12/2011 | 21 2012 | 0 | 39 | 2011 | 52 | 205 | -80.98%
29/12/2011 | 79 2011 | 52 | 205 | 2011 | 51 | 352 | -41.76%
30/12/2011 | 7 2011 | 51 | 352 |
31/12/2011 | 16
01/01/2012 | 39
02/01/2012 | 17
03/01/2012 | 86
04/01/2012 | 55
05/01/2012 | 82
06/01/2012 | 0
07/01/2012 | 9
08/01/2012 | 46
My preference would be to run 1 query to aggregate Table 1 to the year/week level and then do the "% change" in another language, depending on your environment. However, if you truly needed a SQL-only solution, you could do something like this.
create table t1 as
select year(Date) as year, week(Date) as week, sum(Value) as totalvalue
from table1
group by year(Date) as year, week(Date) as week
order by Date desc
;
select a.year, a.month, a.totalvalue,
(a.totalvalue-b.totalvalue)/b.totalvalue as pct_change
from (
select year, month, totalvalue,
case when week>1 then week-1 else 52 end as prevweek,
case when week>1 then year else year-1 end as prevyear
from t1
) a
left outer join t1 b
on a.prevweek=b.week and a.prevyear =b.year
;