Split one row into multiple rows in SQL Server table - sql

Please see the attached screenshot. I'm trying to figure out how we can achieve that using SQL Server.
1
Thanks.

You can achieve this using recursive CTE,
For Ex : (I assumed your date column is in MM/DD/YYYY format)
;with orig_src as
(
select CAST('01/01/2018' AS DATETIME) As Dt, 'Alpha' Name, 3 Freq
UNION ALL
select CAST('12/01/2018' AS DATETIME) As Dt, 'Beta' Name, 2 Freq
), freq_cte as
(
--start/anchor row
select dt, name, 1 freq_new, Freq rn from orig_src
--recursion
union all
select DATEADD(MONTH, 1, a.dt), a.name, 1 freq_new, a.rn - 1 from freq_cte a
--terminator/constraint for recursion
where a.rn - 1 ! = 0
)
select convert(varchar, dt, 101) dt, name, freq_new from freq_cte
order by 2,1
The way this recursive logic works is,
First we get all the rows from the table in a CTE (freq_cte), then we recursively call this CTE and decrement rn (original freq) till the terminator condition is met that is when (rn -1) = 0

Related

Oracle get rank for only latest date

I have origin table A:
dt
c1
value
2022/10/1
1
1
2022/10/2
1
2
2022/10/3
1
3
2022/10/1
2
4
2022/10/2
2
6
2022/10/3
2
5
Currently I got the latest dt's percent_rank by:
select * from
(
select
*,
percent_rank() over (partition by c1 order by value) as prank
from A
) as pt
where pt.dt = Date'2022-10-3'
Demo: https://www.db-fiddle.com/f/rXynTaD5nmLqFJdjDSCZpL/0
the excepted result looks like:
dt
c1
value
prank
2022/10/3
1
3
1
2022/10/3
2
5
0.5
Which means at 2022-10-3, the value in c1 group's percent_rank in history is 100% while in c2 group is 66%.
But this sql will sort evey partition which I thought it's time complexity is O(n log n).
I just need the latest date's rank and I thought I could do that by calculating count(last_value > value)/count() which cost O(n).
Any suggestions?
Rather than hard-coding the maximum date, you can use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
PERCENT_RANK() OVER (PARTITION BY c1 ORDER BY value) AS prank,
ROW_NUMBER() OVER (PARTITION BY c1 ORDER BY dt DESC) AS rn
FROM table_name t
) t
WHERE rn = 1
Which, for the sample data:
CREATE TABLE table_name (dt, c1, value) AS
SELECT DATE '2022-10-01', 1, 1 FROM DUAL UNION ALL
SELECT DATE '2022-10-02', 1, 2 FROM DUAL UNION ALL
SELECT DATE '2022-10-03', 1, 3 FROM DUAL UNION ALL
SELECT DATE '2022-10-01', 2, 4 FROM DUAL UNION ALL
SELECT DATE '2022-10-02', 2, 6 FROM DUAL UNION ALL
SELECT DATE '2022-10-03', 2, 5 FROM DUAL;
Outputs:
DT
C1
VALUE
PRANK
RN
2022-10-03 00:00:00
1
3
1
1
2022-10-03 00:00:00
2
5
.5
1
fiddle
But this sql will sort every partition which I thought it's time complexity is O(n log n).
Whatever you do you will need to iterate over the entire result-set.
I just need the latest date's rank and I thought I could do that by calculating count(last_value > value)/count().
Then you will need to find the last value which (unless you are hardcoding the last date) will involve using an index- or table-scan over all the values in each partition and sorting the values and then to find a count of the greater values will require a second index- or table-scan. You can profile both solutions but I expect you would find that using analytic functions is going to be equally efficient, if not better, than trying to use aggregation functions.
For example:
SELECT c1,
dt,
value,
( SELECT ( COUNT(CASE WHEN value <= t.value THEN 1 END) - 1 )
/ ( COUNT(*) - 1 )
FROM table_name c
WHERE c.c1 = t.c1
) AS prank
FROM table_name t
WHERE dt = DATE '2022-10-03'
If going to access the table twice and you are likely to find that the I/O costs of table access are going to far outweight any potential savings from using a different method. However, if you look at the explain plan (fiddle) then the query is still performing an aggregate sort so there is not going to be any cost savings, only additional costs from this method.
Try this
select t.c1, t.dt, t.value
from TABLENAME t
inner join (
select c1, max(dt) as MaxDate
from TABLENAME
group BY dt
) tm on t.c1 = tm.c1 and t.dt = tm.MaxDate ORDER BY dt DESC;
Or as simple as
SELECT * from TABLENAME ORDER BY dt DESC;
I fiddled it a bit, it is almost the same answer as MT0 already put.
select dt, c1, val, prank*100 as percent_rank from (select
t1.*,
percent_rank() over (partition by c1 order by val) as prank,
row_number() over (partition by c1 order by dt desc) rn from t1) where rn=1;
result
DT C1 VAL PERCENT_RANK
2022-10-03 1 3 100
2022-10-03 2 5 50
http://sqlfiddle.com/#!4/ec60a/23
I used Row_number = 1 to get the latest date.
And also pushed the percent_rank as percent.
Is this what you desire?

sql - single line per distinct values in a given column

is there a way using sql, in bigquery more specifically, to get one line per unique value in a given column
I know that this is possible using a sequence of union queries where you have as much union as distinct values as there is in the column of interest. but i'm wondering if there is a better way to do it.
You can use row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) t
where seqnum = 1;
This returns an arbitrary row. You can control which row by adjusting the order by.
Another fun solution in BigQuery uses structs:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by col;
You can add an order by (order by X limit 1) if you want a particular row.
here is just a more formated format :
select tab.* except(seqnum)
from (
select *, row_number() over (partition by column_x order by column_x) as seqnum
from `project.dataset.table`
) as tab
where seqnum = 1
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
with result
Row id col
1 1 1
2 4 2
3 6 3

How to use windows function to write sql where value of current row depends on previos row

I have table with values
val1
1
2
3
4
I want following as output
1.00
2.50
4.25
6.12
Each value in a table is computed as val1+0.5*val1(from Previous row)
so for .eg.
row with 2 ---> output is computed as 2+0.5*1.00= 2.50
row with 3 ---> output is computed as 3+0.5*2.50 = 4.25
When I use following sql windows function
SELECT *
,val1+SUM(0.50*val1) OVER (ORDER BY val1 ROWS between 1 PRECEDING and 1 PRECEDING) AS r
FROM #a1
I get output as
1.00
2.500
4.000
5.500
This can be done with a recursive cte.
with rownums as (select val,row_number() over(order by val) as rnum
from tbl)
/* This is the recursive cte */
,cte as (select val,rnum,cast(val as float) as new_val from rownums where rnum=1
union all
select r.val,r.rnum,r.val+0.5*c.new_val
from cte c
join rownums r on c.rnum=r.rnum-1
)
/* End Recursive cte */
select val,new_val
from cte
Sample Demo
This is called exponential averaging. You can do it with some sort of power function, say it is called power() (this might differ among databases).
The following will work -- but I'm not sure about what happens if the sequences get long. Note that this has an id column to specify the ordering:
with t as (
select 1 as id, 1 as val union all
select 2, 2 union all select 3, 3 union all select 4, 4
)
select t.*,
( sum(p_seqnum * val) over (order by id) ) / p_seqnum
from (select t.*,
row_number() over (order by id desc) as seqnum,
power(cast(0.5 as float), row_number() over (order by id desc)) as p_seqnum
from t
) t;
Here is a rextester for Postgres. Here is a SQL Fiddle for SQL Server.
This works because exponential averaging is "memory-less". If this were not true, you would need a recursive CTE, and that can be much more expensive.

Add A,B,C letters to Duplicate values

I need big help from you I am using sql server 2008 and I want to get the output using sql query.
I have a following data in the table.
Id Code
-----------------
1 01012
2 01012
3 01012
4 01012
5 01013
6 01013
7 01014
I need Following output
Id Code
-----------------
1 01012
2 01012A
3 01012B
4 01012C
5 01013
6 01013A
7 01014
You can use ROW_NUMBER. When Rn = 1, retain the original Code else, add A, B, and so on.
To determine which letter to add, the formula is CHAR(65 - RN - 2).
WITH CTE AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Id)
FROM tbl
)
SELECT
Id,
Code = CASE
WHEN Rn = 1 THEN Code
ELSE Code + CHAR(65 + Rn - 2)
END
FROM CTE
SQL Server 2012+ Solution
Can be adpated to 2008 be replacing CONCAT with + and CHOOSE with CASE.
Data:
CREATE TABLE #tab(ID INT, Code VARCHAR(100));
INSERT INTO #tab
SELECT 1, '01012'
UNION ALL SELECT 2, '01012'
UNION ALL SELECT 3, '01012'
UNION ALL SELECT 4, '01012'
UNION ALL SELECT 5, '01013'
UNION ALL SELECT 6, '01013'
UNION ALL SELECT 7, '01014';
Query:
WITH cte AS
(
SELECT ID, Code,
[rn] = ROW_NUMBER() OVER(PARTITION BY Code ORDER BY id)
FROM #tab
)
SELECT
ID,
Code = CONCAT(Code, CHOOSE(rn, '', 'A', 'B', 'C', 'D', 'E', 'F')) -- next letters
FROM cte;
LiveDemo
select
case
when rownum > 1 then code + char(65+rownum-2)
else code
end as code,
id
from (
select *,
ROW_NUMBER() over( partition by code order by code) as rownum
from #tab
)c

Optimize select query (inner select + group)

My current version is :
SELECT DT, AVG(DP_H2O) AS Tx,
(SELECT AVG(Abs_P) / 1000000 AS expr1
FROM dbo.BACS_MinuteFlow_1
WHERE (DT =
(SELECT MAX(DT) AS Expr1
FROM dbo.BACS_MinuteFlow_1
WHERE DT <= dbo.BACS_KongPrima.DT ))
GROUP BY DT) AS Px
FROM dbo.BACS_KongPrima
GROUP BY DT
but it works very slow.
basically in inner select I'm selecting maximum near time to my time, then group by this nearest time.
Is there possible optimizations ? Maybe I can join it somehow , but the trouble I'm not sure how to group by this nearest date.
Thank you
You could try to rearrange it to use the code below using a cross apply. Am not sure if this will improve performance but generally I try to avoid at all costs using a query on a specific column and SQL Server is pretty good at optimising the Apply statement.
WITH Bacs_MinuteFlow_1 (Abs_P ,DT ) AS
(SELECT 5.3,'2011/10/10'
UNION SELECT 6.2,'2011/10/10'
UNION SELECT 7.8,'2011/10/10'
UNION SELECT 5.0,'2011/03/10'
UNION SELECT 4.3,'2011/03/10'),
BACS_KongPrima (DP_H2O ,DT)AS
(SELECT 2.3,'2011/10/15'
UNION SELECT 2.6,'2011/10/15'
UNION SELECT 10.2,'2011/03/15')
SELECT DT, AVG(DP_H2O) AS Tx,
a.Px
FROM BACS_KongPrima
CROSS APPLY
(
SELECT AVG(Abs_P) / 1000000 AS Px
FROM BACS_MinuteFlow_1
WHERE DT =
(SELECT MAX(DT) AS maxdt
FROM BACS_MinuteFlow_1
WHERE DT <= BACS_KongPrima.DT
)
) a
GROUP BY DT,a.Px
Cheers