Split rows to columns by certain number - sql

I have the following rows in the table:
dates
------------
"2021-01-02"
"2021-01-03"
"2021-01-11"
"2021-01-14"
...
I know that these rows present date ranges.
So, the first row is a range start, next row is a range end. Next row is a range start again and etc (the numbers of rows mod 2 = 0).
Is there a way to select such table as:
range_start | range_end
-------------+-------------
"2021-01-02" "2021-01-03"
"2021-01-11" "2021-01-14"
... ...
?
PostgreSQL version is 10.17

Use conditional aggregation with row_number():
select min(date), max(date)
from (select t.*, row_number() over (order by date) as seqnum
from t
) t
group by ceiling(seqnum / 2.0)

This should work:
WITH ordered_table AS (
SELECT
dates,
ROW_NUMBER () OVER (ORDER BY dates) AS creation_order
FROM your_table
)
SELECT
t1.dates AS range_start,
t2.dates AS range_end
FROM
ordered_table t1
INNER JOIN ordered_table t2 ON t2.creation_order = t1.creation_order + 1
WHERE (t1.creation_order % 2) = 1 -- counting from 1
For more details:
WITH in PG
row_number in PG

Related

Select duplicate rows based on time difference and occurrence count

I have a table like this :
As you can see, some records with the same farsi_pelak field have been added(detected) more than 1 time within a few seconds.
That's happened because of some application bug which has been fixed.
Now I need to select and then delete duplicate rows which have been added at the same time (+- few seconds)
And this is my query :
SELECT TOP 100 PERCENT
y.id, y.farsi_pelak , y.detection_date_p , y.detection_time
FROM dbo._tbl_detection y
INNER JOIN
(SELECT TOP 100 PERCENT
farsi_pelak , detection_date_p
FROM dbo._tbl_detection WHERE camera_id = 2
GROUP BY farsi_pelak , detection_date_p
HAVING COUNT(farsi_pelak)>1) dt
ON
y.farsi_pelak=dt.farsi_pelak AND y.detection_date_p =dt.detection_date_p
ORDER BY farsi_pelak , detection_date_p DESC
But I can't calculate the time difference because my detection_time field should not be grouped by.
If you use SQL Server 2012 or later, you can use LAG function to get the values from the "previous" row.
Then calculate the difference between adjacent timestamps and find those rows where this difference is small.
WITH
CTE
AS
(
SELECT
id
,farsi_pelak
,detection_date_p
,detection_time
,LAG(detection_time) OVER (PARTITION BY farsi_pelak
ORDER BY detection_date_p, detection_time) AS prev_detection_time
FROM dbo._tbl_detection
)
,CTE_Diff
AS
(
SELECT
id
,farsi_pelak
,detection_date_p
,detection_time
,prev_detection_time
,DATEDIFF(second, prev_detection_time, detection_time) AS diff
FROM CTE
)
SELECT
id
,farsi_pelak
,detection_date_p
,detection_time
,prev_detection_time
,diff
FROM CTE_Diff
WHERE
diff <= 10
;
When you run this query and verify that it returns only rows that you want to delete, you can change the last SELECT to DELETE:
WITH
CTE
AS
(
SELECT
id
,farsi_pelak
,detection_date_p
,detection_time
,LAG(detection_time) OVER (PARTITION BY farsi_pelak
ORDER BY detection_date_p, detection_time) AS prev_detection_time
FROM dbo._tbl_detection
)
,CTE_Diff
AS
(
SELECT
id
,farsi_pelak
,detection_date_p
,detection_time
,prev_detection_time
,DATEDIFF(second, prev_detection_time, detection_time) AS diff
FROM CTE
)
DELETE
FROM CTE_Diff
WHERE
diff <= 10
;
I guess you need rownumber to check time as below keeping the earliest time data and discarding the rest detection time for rownums greater than 1
Select y.id, y.farsi_pelak ,
y.detection_date_p , y.detection_time,
row_number() over (partition by
y.farsi_pelak,
y.detection_date_p order by
y.detection_time) rn
from ( the above query) where rn>1

Counting ID's for correct creation date time

I need to get the number of user ID's for each month, but they should only be counted for the month if the user's minimum month falls within that month.
So if customer A had a min(day) of 04/18 then for month and year, they would be counted.
My table looks like:
monthyear | id
02/18 A32
04/19 T39
05/19 T39
04/19 Y95
01/18 A32
12/19 I99
11/18 OPT
09/19 TT8
I was doing something like:
SELECT day, id
SUM(CASE WHEN month = min(day) THEN 1 ELSE 0)
FROM testtable
GROUP BY 1
But I'm not sure how to specify that for each user ID, so only user ID = 1, when their min(Day) = day
Goal table to be:
monthyear | count
01/18 1
02/18 0
11/18 1
04/19 2
05/19 0
09/19 1
12/19 1
Use window functions. Let me assume that your monthyear is really yearmonth, so it sorts correctly:
SELECT yearmonth, COUNT(*) as numstarts
FROM (SELECT tt.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY yearmonth) as seqnum
FROM testtable tt
) tt
WHERE seqnum = 1
GROUP BY yearmonth;
If you do have the absurd format of month-year, then you can use string manipulations. These depend on the database, but something like this:
SELECT yearmonth, COUNT(*) as numstarts
FROM (SELECT tt.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY RIGHT(monthyear, 2), LEFT(monthyear, 2) as seqnum
FROM testtable tt
) tt
WHERE seqnum = 1
GROUP BY yearmonth;
I assumed that you have a column that's a date (use of min() is necessary). You can do it by selecting a minimal date(subquery t2) for each id and then count only these rows that connect throught left join, so if there is no connection you will get zeros for these dates or monthyear as you have in your data.
select
monthyear
,count(t2.id) as cnt
from testtable t1
left join (
select
min(date) as date
,id
from testtable
group by id
) t2
on t2.date = t1.date
and t2.id = t1.id
group by monthyear
You are looking for the number of new users each month, yes?
Here is one way to do it.
Note that I had to use TO_DATE and TO_CHAR to make sure the month/year text strings sorted correctly. If you use real DATE columns that would be unnecessary.
An additional complexity was adding the empty months in (months with zero new users). Optimally that would not be done by using a SELECT DISTINCT on the base table to get all months.
create table x (
monthyear varchar2(20),
id varchar2(10)
);
insert into x values('02/18', 'A32');
insert into x values('04/19', 'T39');
insert into x values('05/19', 'T39');
insert into x values('04/19', 'Y95');
insert into x values('01/18', 'A32');
insert into x values('12/19', 'I99');
insert into x values('11/18', 'OPT');
insert into x values('09/19', 'TT8');
And the query:
with allmonths as(
select distinct monthyear from x
),
firstmonths as(
select id, to_char(min(to_date(monthyear, 'MM/YY')),'MM/YY') monthyear from x group by id
),
firstmonthcounts as(
select monthyear, count(*) cnt
from firstmonths group by monthyear
)
select am.monthyear, nvl(fmc.cnt, 0) as newusers
from allmonths am left join firstmonthcounts fmc on am.monthyear = fmc.monthyear
order by to_date(monthyear, 'MM/YY');

Value for a column is the sum of the next 4 values - SQL

ITEM LOCATION QTY WEEK
A X 30 1
A X 35 2
A X 40 3
A X 0 4
A X 10 5
A X 19 6
I need to create a new column with the computation like..
ITEM LOCATION QTY WEEK NEW_COLUMN
A X 30 1 AVG(WEEK2(qty)+WEEK3(qty)+WEEK4(qty)+WEEK5(qty))
A X 35 2 AVG(WEEK3(qty)+WEEK4(qty)+WEEK5(qty)+WEEK6(qty))
similarly for all the rows....
the average of 4 weeks is fixed,it wont change.
The first week will have the average of next 4 weeks i.e., 2,3,4 and 5 avg(35+40+0+10)
The 2nd week will have the average of next 4 weeks i.e., 3,4,5 and 6
avg(40+0+10+19).
I tried to create to bucket them based on the week number,say
Week 1-4 as 1
Week 5-8 as 2.
and tried to do the process,but i am getting the same avg for the each buckets,say same value for 1,2,3,4 line items..
Joining to the same table with a clause restricting the Weeks to be within your range should work. You'll have to decide what the right answer is for the last weeks (which won't have 4 weeks afterwards) and either COALESCE the right answer or INNER JOIN them out.
SELECT T.Item, T.Location, T.Week, AVG(N.Qty) as New_Column
FROM Table T
LEFT OUTER JOIN Table N ON
T.Item = N.Item
AND T.Location = N.Location
AND N.Week BETWEEN (T.Week + 1) AND (T.Week + 4)
GROUP BY T.Item, T.Location, T.Week
Some of the other answers work fine, but with 2012 it should be really easy:
SELECT *,New_Column = (SUM(Qty) OVER(ORDER BY Week ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING)*1.0)/4
FROM Table1
Demo: SQL Fiddle
If it's by item and location then just add PARTITION BY:
SELECT *,New_Column = (SUM(Qty) OVER(PARTITION BY Item, Location ORDER BY Week ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING)*1.0)/4
FROM Table1
To filter out records that don't have 4 subsequent records, you could use LEAD() for filtering:
;with cte AS ( SELECT *,New_Column = (SUM(Qty) OVER(PARTITION BY Item, Location ORDER BY Week ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING)*1.0)/4
,Lead4Col = LEAD(week,5) OVER(PARTITION BY Item,Location ORDER BY Week)
FROM Table1
)
SELECT *
FROM cte
WHERE Lead4Col IS NOT NULL
You could also use COUNT(Qty) OVER(PARTITION BY Item, Location ORDER BY Week ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING) instead of LEAD() to do your filtering to when 4 subsequent weeks exist.
Edit: I think you actually want to exclude this week from the calculation, so adjusted slightly.
You can self-join to the same table 4 times:
select t0.item, t0.location, t0.qty, t0.week,
(t1.qty + t2.qty + t3.qty + t4.qty) / 4.0
from [table] t0
left join [table] t1 on t0.item = t1.item and t0.location = t1.location
and t1.week = t0.week + 1
left join [table] t2 on t0.item = t2.item and t0.location = t2.location
and t2.week = t0.week + 2
left join [table] t3 on t0.item = t3.item and t0.location = t3.location
and t3.week = t0.week + 3
left join [table] t4 on t0.item = t4.item and t0.location = t4.location
and t4.week = t0.week + 4
You can simplify those joins if you have a better key available for the table.
Try this query:
SELECT
T1.ITEM,
T1.LOCATION,
T1.WEEK,
MAX(T1.QUANTITY) AS QUANTITY,
AVG(T2.QUANTITY) AS NEW_COLUMN
FROM TBL t1 LEFT JOIN TBL t2
ON
T1.ITEM=T2.ITEM AND T1.LOCATION=T2.LOCATION
AND T2.WEEKNUMBER >T1.WEEK AND T2.WEEKNUMBER<T1.WEEK+5
GROUP BY t1.ITEM, t1.LOCATION, T1.WEEK
Almost the same as earlier, but insted SUM()/4 better to use AVG
Also I use *1.0 to make decimal value from qty, cause if it's integer - you'll lost fraction part after AVG operation.
SELECT *,
new_column = ( Avg(qty * 1.0)
over(
PARTITION BY item, location
ORDER BY week ROWS BETWEEN 1 following AND 5 following
)
)
FROM table1
with x as
(select *, lead(qty) over(partition by item order by week) as next_1 from tablename)
, y as
(select *, lead(qty) over(partition by item order by week) as next_2 from x)
, z as
(select *, lead(qty) over(partition by item order by week) as next_3 from y)
, w as
(select *, lead(qty) over(partition by item order by week) as next_4 from z)
select item, location, qty, week, (next_1+next_2+next_3+next_4)/4 as new_column from w
This uses recursive cte's. lead function selects the next row's qty value. When you go from the first cte to the fourth, a new column gets added each time, so you will have all the next 4 week's values at the end. Then you just take the average.

group rows in plain sql

I have a Table with columns Date and Number, like so:
date Number
1-1-2012 1
1-2-2012 1
1-3-2012 2
1-4-2012 1
I want to make a sql query that groups the rows with the same Number and take the minimum date. The grouping only may occur when the value iof Number is the same as previous / next row. So the rsult is
date Number
1-1-2012 1
1-3-2012 2
1-4-2012 1
try this:
WITH CTE AS(
SELECT * ,ROW_NUMBER() OVER (ORDER BY [DATE] ) -
ROW_NUMBER() OVER (PARTITION BY NUMBER ORDER BY [DATE] ) AS ROW_NUM
FROM TABLE1)
SELECT NUMBER,MIN(DATE) AS DATE
FROM CTE
GROUP BY ROW_NUM,NUMBER
ORDER BY DATE
SQL fiddle demo
SELECT Number, MIN(date)
FROM table
GROUP BY Number
ORDER BY Number
since you requirement is a bit more specific, how about this? I have not checked it myself, but something that might work, considering you requirement..
SELECT date, Number FROM (
SELECT Number,
(SELECT MIN(date) FROM #table t2 WHERE t1.date <> t2.date AND t1.Number = t2.Number) AS date
FROM table t1
) AS a
GROUP BY number, date

Calculate arithmetic return from a table of values

I've created a table of index price levels (eg, S&P 500) that I'd like to calculate the daily return of. Table structure looks like this:
Date Value
2009-07-02 880.167341
2009-07-03 882.235134
2009-07-06 881.338052
2009-07-07 863.731494
2009-07-08 862.458985
I'd like to calculate the daily arithmetic return (ie, percentage return) of the index, defined as:
Daily Return = P(2)/P(1) - 1
Where P represents the index value in this case. Given the input table presented above, the desired output would look like this:
Date Return
2009-07-03 0.002349318
2009-07-06 -0.001016829
2009-07-07 -0.019977077
2009-07-08 -0.001473269
It occurs to me that a self join would work, but I'm not sure of the best way to increment the date on the second table to account for weekends.
Any thoughts on the best way to go about this?
WITH cteRank AS (
SELECT [Date], Value,
ROW_NUMBER() OVER(ORDER BY [Date]) AS RowNum
FROM YourTable
)
SELECT c1.[Date], c1.Value/c2.Value - 1 as [Return]
from cteRank c1
inner join cteRank c2
on c1.RowNum - 1 = c2.RowNum
where c1.RowNum > 1
A simple CROSS APPLY
SELECT
Tlater.Date, (Tlater.Value / TPrev2.Value) - 1
FROM
MyTable Tlater
CROSS APPLY
(
SELECT TOP 1 TPrev.Value
FROM MyTable TPrev
WHERE TPrev.Date < Tlater.Date
ORDER BY TPrev.Date
) TPrev2
Note: this becomes trivial in Denali (SQL Server 2012) with LAG (untested, may need a CTE)
SELECT
OrderDate,
(Value / (LAG(Value) OVER (ORDER BY Date))) -1
FROM
MyTable
Or
;WITH cPairs AS
(
SELECT
Date,
Value AS Curr,
LAG(Value) OVER (ORDER BY Date) AS Prev
FROM
MyTable
)
SELECT
Date,
(Curr / Prev) -1
FROM
cPairs
If you're using 2005+, you can use the ROW_NUMBER function combined with a CTE:
;with RowNums as ( select *, row_number() over (order by date) as RN from table )
select *, r1.Value / r.Value - 1 as Return
from RowNums r
inner join RowNums r1
on r.RN = r1.RN - 1