Postgresql: Split line into 4 lines - sql

My concern is to split a single line into 4 lines using a SQL script.
It is that I get in an SQL Result the year, the quarter, the month and a x worthy value. Now I would also like to spend the week of the month (1-4) without having to add this as a column of the table.
Likewise, the value should be divided by four.
Thus, from this result:
year | quarter | month | value
2016 | 1 | 1 | 78954
This result:
year | quarter | month | week | value
2016 | 1 | 1 | 1 | 19738,5
2016 | 1 | 1 | 2 | 19738,5
2016 | 1 | 1 | 3 | 19738,5
2016 | 1 | 1 | 4 | 19738,5
I have no idea how I could implement this.
I hope anyone can help me.
Best regards

You could do it with a cartesian join:
SELECT a.year, a.quarter, a.month, b.week, a.value
FROM a, (SELECT UNNEST(ARRAY[1, 2, 3, 4]) as week) b

Just use union:
select year, quarter, month, 1 as week, value / 4 as value
union all
select year, quarter, month, 2 as week, value / 4 as value
union all
select year, quarter, month, 3 as week, value / 4 as value
union all
select year, quarter, month, 4 as week, value / 4 as value

You can also use `generate_series() for that:
select t.year, t.quarter, t.month, w.week, t.value / 4
from the_table t
cross join generate_series(1,4) as w(week)
order by t.year, t.quarter, w.week;
Using generate_series() is more flexible if you need to change the number of repeated rows you want - although "weeks per quarter" doesn't really need that flexibility.

Or you can do it in very scientifically looking way :-)
WITH series as (select generate_series(1,4,1) as week ),
data as (SELECT 2016 as year, 1 as quarter, 1 as month, 78954 as value)
SELECT d.year, d.quarter, d.month, s.week, d.value/(SELECT count(*) FROM series)::numeric
FROM data d JOIN series s ON true

Related

How to calculate occurrence depending on months/years

My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here

Aggregate functions based on current Row value

I am working with data similar to below,
week | product | sale
1 | ABC | 2
1 | ABC | 1
2 | ABC | 1
3 | ABC | 5
4 | ABC | 1
2 | DEF | 5
Let us say that is my Orders table named tblOrders. Now, in each row, I want to aggregate the total sales from last week for that product - for instance, if I am on week 2 of product "ABC", I need to show the aggregated sales amount of week 1 for product ABC. so, the output should look something like below,
week | product | sale | ProductPreviousWeekSales
1 | ABC | 2 | 0
1 | ABC | 1 | 0
2 | ABC | 1 | 3
3 | ABC | 5 | 1
4 | ABC | 1 | 5
2 | DEF | 5 | 0
I was originally thinking I could solve this using Aggregates and Window Function, but doesn't look to be so. Another thought I was having is to use Conditional Aggregate - something like sum(case when x=currentRow.x then sale else 0 end), but that wouldn't work too.
Here is the SQLFiddle for above sample - http://sqlfiddle.com/#!18/890b7/2
Note: I need to calculate similar value for Last 4 weeks, so trying to avoid doing this as a sub-query or multiple joins (if possible), as the data set I am working with is very large, and don't want to add to much performance overhead trying to incorporate this change.
Here is one approach which first aggregates your table in a separate CTE and uses LAG to find the previous week's amount, for each week and product:
WITH cte AS (
SELECT week, product,
LAG(SUM(sale)) OVER (PARTITION BY product ORDER BY week) AS lag_total_sales
FROM yourTable
GROUP BY week, product
)
SELECT t1.week, t1.product, t1.sale,
COALESCE(t2.lag_total_sales, 0) AS ProductPreviousWeekSales
FROM yourTable t1
INNER JOIN cte t2
ON t2.week = t1.week AND
t2.product = t1.product
ORDER BY
t1.product,
t1.week;
Demo
DISCLAIMER
The query I am showing below doesn't work in SQL Server, unfortunately. Up to SQL Server version 2019 the DBMS lacks full support of the RANGE clause that is essential for the query to work. Running the query in SQL Server results in
Msg 4194 Level 16 State 1 Line 1 RANGE is only supported with UNBOUNDED and CURRENT ROW window frame delimiters.
I am not deleting this answer, because this is standard SQL and the approach may help future readers. It runs fine in a lot of DBMS, and maybe a future version of SQL Server will be able to deal with this, too. I've added demos to show that it runs in PostgreSQL, MySQL and Oracle, but fails in SQL Server 2019.
ORIGINAL ANSWER
Your query shown in the fiddle (select a.*, sum(sale) over(partition by product) ProductPreviousWeekSales from tblOrder a) is merely lacking the appropriate windowing clause. As you are dealing with ties here (more than one row per product and week) this needs to be a RANGE clause:
select a.*,
sum(sale) over(partition by product
order by week range between 1 preceding and 1 preceding
) as ProductPreviousWeekSales
from tblOrder a
order by product, week;
(Use COALESCE if you want to see a zero instead of NULL.)
Demos:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=149eddbff82500d539b2c615f4167cff
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a8453970efac08ad69275914910bb13e
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=64ed21150142caa0acb7f8c7ca7d9022
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=149eddbff82500d539b2c615f4167cff
You can do from following
; WITH cteorder AS
(
SELECT DISTINCT product, week FROM dbo.tblOrder
)
SELECT
cte.*,
SUM(ISNULL(b.sale,0)) ProductPreviousWeekSales
from tblOrder a
INNER JOIN cteorder cte ON cte.product = a.product AND cte.week = a.week
LEFT JOIN dbo.tblOrder b ON b.product = cte.product AND b.week = (a.week-1)
GROUP BY cte.product,
cte.week
You can run from : Fiddle
You need to select from TblOrders twice. Once, grouping by week and product and summing the sales, and the second time, a row-by-row scan against TblOrders, left-joining it with the grouping query on same product and week offset by 1:
If the join fails , the sales value of the joined grouping query returns NULL. You can put in 0 instead of NULL using COALESCE(), but ISNULL() has all chances of being faster, as it has a fixed number of parameters, while COALESCE() has a variable argument list, which comes at a certain cost.
WITH
tblorders(wk,product,sales) AS (
SELECT 1,'ABC',2
UNION ALL SELECT 1,'ABC',1
UNION ALL SELECT 2,'ABC',1
UNION ALL SELECT 3,'ABC',5
UNION ALL SELECT 4,'ABC',1
UNION ALL SELECT 2,'DEF',5
)
,
grp AS (
SELECT
wk
, product
, SUM(sales) AS sales
FROM tblorders
GROUP BY
wk
, product
)
SELECT
o.wk
, o.product
, o.sales
, ISNULL(g.sales,0) AS productpreviousweeksales
FROM tblorders o
LEFT
JOIN grp g
ON o.wk - 1 = g.wk
AND o.product= g.product
ORDER BY 2,1
;
wk | product | sales | productpreviousweeksales
----+---------+-------+--------------------------
1 | ABC | 2 | 0
1 | ABC | 1 | 0
2 | ABC | 1 | 3
3 | ABC | 5 | 1
4 | ABC | 1 | 5
2 | DEF | 5 | 0

Is it possible to do projection in Google Big Query?

I have a query (due to restrictions, it is using Legacy SQL) that produces a column that is the rolling average of last 3 days of sale (excluding today)
SELECT
id, date, sales, AVG(sales) OVER (PARTITION BY id ORDER BY date RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING) AS projected_sale
FROM tableA
tableA
+-------+---------+---------+
| id | date | sales |
+-------+---------+---------+
| 1 | 01-01-17| 5 |
| 1 | 01-02-17| 6 |
| 1 | 01-03-17| 7 |
| 1 | 01-04-17| 10 |
+-------+---------+---------+
The query produces
+-------+---------+---------+--------------+
| id | date | sales |projected_sale|
+-------+---------+---------+--------------+
| 1 | 01-01-17| 5 | . |
| 1 | 01-02-17| 6 | . |
| 1 | 01-03-17| 7 | . |
| 1 | 01-04-17| 10 | 6 |
+-------+---------+---------+--------------+
Since the average is excluding the current row, theoretically I can project the sale for 01-05-17 using the sales from (01-02 to 01-04). However since tableA doesn't actually have a entry with date 01-05-17, my query stops at 01-04-17 as the last row.
Is what I am trying to do possible in Big Query?
Thank you
First, I think using RANGE is incorrect here - it should be ROWS instead
Anyway, below is an example for BigQuery Legacy SQL that demonstrates how to achieve result you need.
#legacySQL
SELECT
id, dt, sales,
AVG(sales) OVER (
PARTITION BY id ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
) AS projected_sale
FROM tableA, (SELECT 1 id, '01-05-17' dt, 0 sales)
As you can see here you just simply adding (UNION ALL - comma in Kegacy SQL) that missing day. Of course you can transform that one such that it will add such missing row for all id's
Nevetherless - hope this is a good starting point for you
You can test / play with it using dummy data as in your question
#legacySQL
SELECT
id, dt, sales,
AVG(sales) OVER (
PARTITION BY id ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
) AS projected_sale
FROM (
SELECT * FROM
(SELECT 1 id, '01-01-17' dt, 5 sales),
(SELECT 1 id, '01-02-17' dt, 6 sales),
(SELECT 1 id, '01-03-17' dt, 7 sales),
(SELECT 1 id, '01-04-17' dt, 10 sales)
) tableA, (SELECT 1 id, '01-05-17' dt, 0 sales)
with result as
Row id dt sales projected_sale
1 1 01-01-17 5 null
2 1 01-02-17 6 5.0
3 1 01-03-17 7 5.5
4 1 01-04-17 10 6.0
5 1 01-05-17 0 7.0

sql group by personalised condition

Hi,I have a column as below
+--------+--------+
| day | amount|
+--------+---------
| 2 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
+--------+--------+
now I want something like this sum day 1- day2 as row one , sum day1-3 as row 2, and so on.
+--------+--------+
| day | amount|
+--------+---------
| 1-2 | 11 |
| 1-3 | 14 |
| 1-4 | 17 |
+--------+--------+
Could you offer any one help ,thanks!
with data as(
select 2 day, 2 amount from dual union all
select 1 day, 3 amount from dual union all
select 1 day, 4 amount from dual union all
select 2 day, 2 amount from dual union all
select 3 day, 3 amount from dual union all
select 4 day, 3 amount from dual)
select distinct day, sum(amount) over (order by day range unbounded preceding) cume_amount
from data
order by 1;
DAY CUME_AMOUNT
---------- -----------
1 7
2 11
3 14
4 17
if you are using oracle you can do something like the above
Assuming the day range in left column always starts from "1-", What you need is a query doing cumulative sum on the grouped table(dayWiseSum below). Since it needs to be accessed twice I'd put it into a temporary table.
CREATE TEMPORARY TABLE dayWiseSum AS
(SELECT day,SUM(amount) AS amount FROM table1 GROUP BY day ORDER BY day);
SELECT CONCAT("1-",t1.day) as day, SUM(t2.amount) AS amount
FROM dayWiseSum t1 INNER JOIN dayWiseSum
t2 ON t1.day > t2.day
--change to >= if you want to include "1-1"
GROUP BY t1.day, t1.amount ORDER BY t1.day
DROP TABLE dayWiseSum;
Here's a fiddle to test with:
http://sqlfiddle.com/#!9/c1656/1/0
Note: Since sqlfiddle isn't allowing CREATE statements, I've replaced dayWiseSum with it's query there. Also, I've used "Text to DDL" option to paste the exact text of the table from your question to generate the create table query :)

Get Monthly Totals from Running Totals

I have a table in a SQL Server 2008 database with two columns that hold running totals called Hours and Starts. Another column, Date, holds the date of a record. The dates are sporadic throughout any given month, but there's always a record for the last hour of the month.
For example:
ContainerID | Date | Hours | Starts
1 | 2010-12-31 23:59 | 20 | 6
1 | 2011-01-15 00:59 | 23 | 6
1 | 2011-01-31 23:59 | 30 | 8
2 | 2010-12-31 23:59 | 14 | 2
2 | 2011-01-18 12:59 | 14 | 2
2 | 2011-01-31 23:59 | 19 | 3
How can I query the table to get the total number of hours and starts for each month between two specified years? (In this case 2011 and 2013.) I know that I need to take the values from the last record of one month and subtract it by the values from the last record of the previous month. I'm having a hard time coming up with a good way to do this in SQL, however.
As requested, here are the expected results:
ContainerID | Date | MonthlyHours | MonthlyStarts
1 | 2011-01-31 23:59 | 10 | 2
2 | 2011-01-31 23:59 | 5 | 1
Try this:
SELECT c1.ContainerID,
c1.Date,
c1.Hours-c3.Hours AS "MonthlyHours",
c1.Starts - c3.Starts AS "MonthlyStarts"
FROM Containers c1
LEFT OUTER JOIN Containers c2 ON
c1.ContainerID = c2.ContainerID
AND datediff(MONTH, c1.Date, c2.Date)=0
AND c2.Date > c1.Date
LEFT OUTER JOIN Containers c3 ON
c1.ContainerID = c3.ContainerID
AND datediff(MONTH, c1.Date, c3.Date)=-1
LEFT OUTER JOIN Containers c4 ON
c3.ContainerID = c4.ContainerID
AND datediff(MONTH, c3.Date, c4.Date)=0
AND c4.Date > c3.Date
WHERE
c2.ContainerID is null
AND c4.ContainerID is null
AND c3.ContainerID is not null
ORDER BY c1.ContainerID, c1.Date
Using recursive CTE and some 'creative' JOIN condition, you can fetch next month's value for each ContainterID:
WITH CTE_PREP AS
(
--RN will be 1 for last row in each month for each container
--MonthRank will be sequential number for each subsequent month (to increment easier)
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY ContainerID, YEAR(Date), MONTH(DATE) ORDER BY Date DESC) RN
,DENSE_RANK() OVER (ORDER BY YEAR(Date),MONTH(Date)) MonthRank
FROM Table1
)
, RCTE AS
(
--"Zero row", last row in decembar 2010 for each container
SELECT *, Hours AS MonthlyHours, Starts AS MonthlyStarts
FROM CTE_Prep
WHERE YEAR(date) = 2010 AND MONTH(date) = 12 AND RN = 1
UNION ALL
--for each next row just join on MonthRank + 1
SELECT t.*, t.Hours - r.Hours, t.Starts - r.Starts
FROM RCTE r
INNER JOIN CTE_Prep t ON r.ContainerID = t.ContainerID AND r.MonthRank + 1 = t.MonthRank AND t.Rn = 1
)
SELECT ContainerID, Date, MonthlyHours, MonthlyStarts
FROM RCTE
WHERE Date >= '2011-01-01' --to eliminate "zero row"
ORDER BY ContainerID
SQLFiddle DEMO (I have added some data for February and March in order to test on different lengths of months)
Old version fiddle