SQL Server : how to merge or group up rows? - sql

In T-SQL, given input data such as
+------+------+--------+------+------+------+--------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 |
+------+------+--------+------+------+------+--------+------+
| 1 | 30 | 1.0000 | desc | NULL | NULL | NULL | NULL |
| 31 | 60 | 2.0000 | desc | NULL | NULL | NULL | NULL |
| 61 | 90 | 1.0000 | desc | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | 1 | 30 | 1.5000 | desc |
| NULL | NULL | NULL | NULL | 1 | 30 | 2.5000 | desc |
| NULL | NULL | NULL | NULL | 1 | 30 | 1.1000 | desc |
+------+------+--------+------+------+------+--------+------+
How can I obtain this output:
+------+------+--------+------+------+------+--------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 |
+------+------+--------+------+------+------+--------+------+
| 1 | 30 | 1.0000 | desc | 1 | 30 | 1.5000 | desc |
| 31 | 60 | 2.0000 | desc | 1 | 30 | 2.5000 | desc |
| 61 | 90 | 1.0000 | desc | 1 | 30 | 1.1000 | desc |
+------+------+--------+------+------+------+--------+------+
Rows 4, 5 and 6 from input "merge up" in order to get the desired output.
This should also work in case the total number of rows is not even.

Here is a solution. This doesn't work if the right half of the given table has more rows than the left. You can see what I'm doing and you can modify it to handle that case:
DECLARE #temp1 TABLE ( col1 INT, col2 INT, col3 DECIMAL(10,4), col4 NVARCHAR(20), col5 INT, col6 INT, col7 DECIMAL(10,4), col8 NVARCHAR(20) )
INSERT INTO #temp1 (col1,col2,col3,col4) VALUES
(1,30,1,'desc'),
(31,60,1,'desc'),
(61,90,1,'desc'),
(81,120,1,'desc')
INSERT INTO #temp1 (col5,col6,col7,col8) VALUES
(1,30,1.5,'desc'),
(1,30,2.5,'desc'),
(1,30,1.1,'desc')
SELECT col1,col2,col3,col4,col5,col6,col7,col8 FROM
(
SELECT col1,col2,col3,col4,ROW_NUMBER() OVER(ORDER BY col5) AS RowNumber FROM #temp1
WHERE col1 IS NOT NULL
) t1 LEFT JOIN
(
SELECT col5,col6,col7,col8,ROW_NUMBER() OVER(ORDER BY col1) AS RowNumber FROM #temp1
WHERE col5 IS NOT NULL
) t2 ON t1.RowNumber = t2.RowNumber
Results:

Related

Muliple "level" conditions on partition by SQL

I have to populate a teradata table from another source where that can be simplify like that:
+------+------+------------+------------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------------+------------+
| 1234 | 0 | 01/01/2009 | 01/04/2019 |
| 1234 | 3 | 01/01/2010 | 01/05/2020 |
| 2345 | 1 | 20/02/2013 | 01/04/2019 |
| 2345 | 0 | 20/02/2013 | 01/04/2018 |
| 2345 | 2 | 31/01/2009 | 01/04/2017 |
| 3456 | 0 | 01/01/2009 | 01/04/2019 |
| 3456 | 1 | 01/01/2015 | 01/04/2019 |
| 3456 | 1 | 01/01/2015 | 01/05/2017 |
| 3456 | 3 | 01/01/2015 | 01/04/2019 |
+------+------+------------+------------+
Col1 is duplicated in source so we have rules to select the right row (with col1 unique in final result)
For if value in col1 :
If value is duplicated then select the most recent date in Col3
If (and only if) it is still duplicated then select row with col2=1
If still duplicated then select most recent date in col4.
Considering the the previous table we should get the following result :
+------+------+------------+------------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------------+------------+
| 1234 | 3 | 01/01/2010 | 01/05/2020 |
| 2345 | 1 | 20/02/2013 | 01/04/2019 |
| 3456 | 1 | 01/01/2015 | 01/04/2019 |
+------+------+------------+------------+
I start using partition by to group each value occurrences in col 3 but i have no good idea on how to apply the conditions for each partion in a sql query
Thank you for your help
You can use QUALIFY in Teradata to simplify the syntax:
SELECT col1, col2, col3, col4
FROM mytable
QUALIFY ROW_NUMBER() OVER(
PARTITION BY col1 -- Group rows by "col1" values
ORDER BY col3 DESC, CASE WHEN col2 = 1 THEN 1 ELSE 2 END, col4 DESC -- Order rows
) = 1 -- Get "first" row in each group
Otherwise, this is the same as the answer above.
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by col1
order by col3 desc,
(case when col2 = 1 then 1 else 2 end),
col4 desc
) as seqnum
from t
) t
where seqnum = 1;

Round down to nearest of Multiple of N

I have sql table as follows
+-----------------------------+
| |col1 | col2 | col3| col4| |
+-----------------------------+
| _______________________ |
| | a | 3 | d1 | 10 | |
| | a | 6 | d2 | 15 | |
| | b | 2 | d2 | 8 | |
| | b | 30 | d1 | 50 | |
+-----------------------------+
I would like transform the above table into below, where the transformation is
col4 = col4 - (col4 % min(col2) group by col1)
+------------------------------+
| |col1 | col2 | col3| col4| |
+------------------------------+
| ____________________________ |
| |a | 3 | d1 | 9 | |
| |a | 6 | d2 | 15 | |
| |b | 2 | d2 | 8 | |
| |b | 30 | d1 | 50 | |
| |
+------------------------------+
I could read the above table in application code to do transformation manually, was wondering if it was possible to offload the transformation to sql
Just run a simple select query for this:
select col1, col2, col3,
col4 - (col4 % min(col2) over (partition by col1))
from t;
There is no need to actually modify the table.
You can use a multi-table UPDATE to achieve your desired result, joining your table to a table of MIN(col2) values:
UPDATE table1
SET col4 = col4 - (col4 % t2.col2min)
FROM (SELECT col1, MIN(col2) AS col2min
FROM table1
GROUP BY col1) t2
WHERE table1.col1 = t2.col1
Output:
col1 col2 col3 col4
a 3 d1 9
a 6 d2 15
b 2 d2 8
b 30 d1 50
Demo on dbfiddle

Oracle SQL conditional ranking

In my query, I am doing multiple types of ranking and for one of ranking types, I want to rank the row only if certain column is not null. Else I don't want ranking to happen.
For example here's a sample table:
+------+------------+------------+--------+--------+
| col1 | col2 | col3 | rank 1 | rank 2 |
+------+------------+------------+--------+--------+
| a | 2018-01-20 | 2018-03-04 | 2 | 2 |
| a | 2018-01-24 | 2018-04-04 | 1 | 1 |
| b | 2018-01-02 | 2018-05-03 | 1 | 1 |
| c | 2017-01-02 | 2017-05-08 | 3 | 2 |
| d | 2016-05-24 | null | 1 | null |
| c | 2018-02-05 | 2018-05-03 | 2 | 1 |
| c | 2018-07-28 | null | 1 | null |
+------+------------+------------+--------+--------+
rank1 is calculated alright based on partition by col1 order by col2 desc
rank 2 should be calculated the same way, but only when when col3 is null, else it should be null.
How can I achieve both ranks in a single query? I tried to use case statement for rank2, but it skips the ranking when col3 is null,
If I understand corrcly, you can try to use CASE WHEN with sum window function
CASE WHEN check col3 isn't null do accumulate else display NULL
CREATE TABLE T(
col1 VARCHAR(5),
col2 DATE,
col3 DATE
);
INSERT INTO T VALUES ( 'a' , to_date('2018-01-20','YYYY-MM-DD') , to_date('2018-03-04','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'a' , to_date('2018-01-24','YYYY-MM-DD') , to_date('2018-04-04','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'b' , to_date('2018-01-02','YYYY-MM-DD') , to_date('2018-05-03','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'c' , to_date('2017-01-02','YYYY-MM-DD') , to_date('2017-05-08','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'd' , TO_DATE('2016-05-24','YYYY-MM-DD') , null);
INSERT INTO T VALUES ( 'c' , TO_DATE('2018-02-05','YYYY-MM-DD') , to_date('2018-05-03','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'c' , TO_DATE('2018-07-28','YYYY-MM-DD') , null);
Query 1:
select t1.*,
rank() OVER(partition by col1 order by col2 desc) rank1,
(CASE WHEN COL3 IS NOT NULL THEN
SUM(CASE WHEN COL3 IS NOT NULL THEN 1 ELSE 0 END) OVER(partition by col1 order by col2 desc)
ELSE
NULL
END) rank2
FROM T t1
Results:
| COL1 | COL2 | COL3 | RANK1 | RANK2 |
|------|----------------------|----------------------|-------|--------|
| a | 2018-01-24T00:00:00Z | 2018-04-04T00:00:00Z | 1 | 1 |
| a | 2018-01-20T00:00:00Z | 2018-03-04T00:00:00Z | 2 | 2 |
| b | 2018-01-02T00:00:00Z | 2018-05-03T00:00:00Z | 1 | 1 |
| c | 2018-07-28T00:00:00Z | (null) | 1 | (null) |
| c | 2018-02-05T00:00:00Z | 2018-05-03T00:00:00Z | 2 | 1 |
| c | 2017-01-02T00:00:00Z | 2017-05-08T00:00:00Z | 3 | 2 |
| d | 2016-05-24T00:00:00Z | (null) | 1 | (null) |
I think you might want:
select count(col3) over (partition by col1 order by col2 desc)
Note that this is equivalent to row_number() rather than rank(). For your data these are equivalent.

Group by random column in ms access

I need something like this in MS ACCESS SQL
SELECT
ID,
col1,
col2,
random(col3)
FROM
table
GROUP BY
ID,
col1,
col2
NOTE:
I want to remove duplicates choosing random value of col3.
INPUT:
+----+------+------+------+
| Id | col1 | col2 | col3 |
+----+------+------+------+
| 1 | A | B | 7 |
+----+------+------+------+
| 1 | A | B | 10 |
+----+------+------+------+
RESULT:
+----+------+------+------+
| Id | col1 | col2 | col3 |
+----+------+------+------+
| 1 | A | B | 7 |
+----+------+------+------+
REQUERY:
+----+------+------+------+
| Id | col1 | col2 | col3 |
+----+------+------+------+
| 1 | A | B | 10 |
+----+------+------+------+

remove null values and merge sql server 2008 r2

I have a table (TestTable) as follows
PK | COL1 | COL2 | COL3
1 | 3 | NULL | NULL
2 | 3 | 43 | 1.5
3 | 4 | NULL | NULL
4 | 4 | NULL | NULL
5 | 4 | 48 | 10.5
6 | NULL | NULL | NULL
7 | NULL | NULL | NULL
8 | NULL | NULL | NULL
9 | 5 | NULL | NULL
10 | 5 | NULL | NULL
11 | 5 | 55 | 95
I would like a result as follows
PK | COL1 | COL2 | COL3
1 | 3 | 43 | 1.5
2 | 4 | 48 | 10.5
3 | 5 | 55 | 95
You can do this, But it won't give you a serial number for the PK:
SELECT
PK,
MAX(Col1) AS Col1,
MAX(Col2) AS Col2,
MAX(Col3) AS Col3
FROM TestTable
WHERE Col1 IS NOT NULL
AND Col2 IS NOT NULL
AND COL3 IS NOT NULL
GROUP BY PK;
| PK | COL1 | COL2 | COL3 |
|----|------|------|------|
| 2 | 3 | 43 | 1.5 |
| 5 | 4 | 48 | 10.5 |
| 11 | 5 | 55 | 95 |
If you want to generate a rownumber for the column pk, you can do this:
WITH CTE
AS
(
SELECT
PK,
MAX(Col1) AS Col1,
MAX(Col2) AS Col2,
MAX(Col3) AS Col3
FROM TestTable
WHERE Col1 IS NOT NULL
AND Col2 IS NOT NULL
AND COL3 IS NOT NULL
GROUP BY PK
), Ranked
AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY PK) AS RN
FROM CTE;
)
SELECT RN AS PK, Col1, COL2, COL3 FROM Ranked
SQL Fiddle Demo
This will give you:
| PK | COL1 | COL2 | COL3 |
|----|------|------|------|
| 1 | 3 | 43 | 1.5 |
| 2 | 4 | 48 | 10.5 |
| 3 | 5 | 55 | 95 |
This can be obtained in two steps like so:
1st step: Get rid of unnecessary rows:
delete from testTable
where Col1 is null
or Col2 is null
or Col3 is null
2nd step: Set the correck PK values using a CTE (update test table):
;with sanitizeCTE
as(
select ROW_NUMBER() over (order by PK) as PK,
Col1, Col2, Col3
from testTable
)
update t
set t.PK = CTE.PK
from testTable t
join sanitizeCTE cte
on t.Col1 = cte.Col1
and t.Col2 = cte.Col2
and t.Col3 = cte.Col3
Tested here: http://sqlfiddle.com/#!3/91e86/1