Lets say I have table with rows,
Id Value
----------
1 a
1 b
1 c
1 d
1 e
1 f
and the expected result should be,
Id Value1 Value2
-------------------
1 a b
1 c d
1 e f
I am very confused here.
Ok, there's definitely a simpler way to do this, but this works:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Value)
FROM dbo.YourTable
)
SELECT Id,
MIN(CASE WHEN RN % 2 = 1 THEN Value END) Value1,
MIN(CASE WHEN RN % 2 = 0 THEN Value END) Value2
FROM CTE
GROUP BY Id,
RN - ((RN - 1) % 2);
This is the result:
╔════╦════════╦════════╗
║ Id ║ Value1 ║ Value2 ║
╠════╬════════╬════════╣
║ 1 ║ a ║ b ║
║ 1 ║ c ║ d ║
║ 1 ║ e ║ f ║
╚════╩════════╩════════╝
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY value) AS RowNum
FROM YourTable
)
SELECT
c1.id
, c1.value as Value1
, c2.value as Value2
FROM cte c1
LEFT JOIN cte c2 ON c1.rownum = c2.rownum - 1
WHERE c1.RowNum % 2 = 1
select Id
,min(Value) as Value1
,max(Value) as Value2
from (select Id,Value
,(row_number () over
(partition by Id order by Value)+1)/2 as group_id
from mytable as t
) t
group by Id
,group_id
Related
I have a query that is calculating the sum of a partition by and giving me a running total by a category.
this part works well, now, I would like the sum of only the top 50% of the partition by.
maybe a table example will show:
╔═══════╦══════════════════════════╦════════════════════════════╗
║ col_1 ║ sum of partition by ║ sum of 50% of partition by ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 1 ║ 36 (this is 1+2+3+...8) ║ 10 (1+2+3+4) ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 2 ║ 35 (this is 2+3+4+....8) ║ 9 (2+3+4) ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 3 ║ 34 ║ 7 (3+4) ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 4 ║ 33 ║ 4 ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 5 ║ 32 ║ null ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 6 ║ 31 ║ null ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 7 ║ 30 ║ null ║
╠═══════╬══════════════════════════╬════════════════════════════╣
║ 8 ║ 29 ║ null ║
╚═══════╩══════════════════════════╩════════════════════════════╝
right now I'm doing
sum(col_) over(partition by <another col> order by <a third col>) as [sum of partition by ]
then I later need to add another column for this calculation over the 25% so you get the idea.
You can use conditional logic by enumerating the rows and filtering. The following uses standard SQL syntax:
select x,
sum(x) over (order by x desc),
sum(x) filter (where seqnum <= 0.5 * cnt) over (order by x desc),
sum(x) filter (where seqnum <= 0.25 * cnt) over (order by x desc)
from (select x, count(*) over () as cnt,
row_number() over (order by x) as seqnum
from generate_series(1, 8, 1) gs(x)
) x
order by x;
Here is a db<>fiddle.
Although standard, Postgres is the only database that supports filter. The logic can easily be replaced with sum(case . . .).
Here is a db<>fiddle using SQL Server instead. The corresponding code is:
with gs as (
select 1 as x union all
select x + 1 from gs where x < 8
)
select x,
sum(x) over (order by x desc),
sum(case when seqnum <= 0.5 * cnt then x end) over (order by x desc),
sum(case when seqnum <= 0.25 * cnt then x end) over (order by x desc)
from (select x, count(*) over () as cnt,
row_number() over (order by x) as seqnum
from gs
) x
order by x;
Using a table of events, I need to return the date and type for:
the first event
the most recent (non-null) event
The most recent event could have null values, which in that case needs to return the most recent non-null value
I found a few articles as well as posts here on SO that are similar (maybe even identical) but am not able to decode or understand the solution - i.e.
Fill null values with last non-null amount - Oracle SQL
https://www.itprotoday.com/sql-server/last-non-null-puzzle
https://koukia.ca/common-sql-problems-filling-null-values-with-preceding-non-null-values-ad538c9e62a6
Table is as follows - there are additional columns, but I am only including 3 for the sake of simplicity. Also note that the first Type and Date could be null. In this case returning null is desired.
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ Update ║ 2019-04-02 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
The output should be:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Update ║ 2019-04-02 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
The first method I tried was to join the table to itself using a subquery that finds the MIN and MAX dates using case statements:
select
Email,
max(case when T1.Date = T2.Min_Date then T1.Type end) as FirstType,
max(case when T1.Date = T2.Min_Date then T1.Date end) as FirstDate,
max(case when T1.Date = T2.Max_Date then T1.Type end) as LastType,
max(case when T1.Date = T2.Max_Date then T1.Date end) as LastDate,
from
T1
join
(select
EmailAddress,
max(Date) as Max_Date,
min(Date) as Min_Date
from
Table1
group by
Email
) T2
on
T1.Email = T2.Email
group by
T1.Email
This seemed to work for the MIN values, but the MAX values would return null.
To solve the problem of returning the last non-value I attempted this:
select
EmailAddress,
max(Date) over (partition by EmailAddress rows unbounded preceding) as LastDate,
max(Type) over (partition by EmailAddress rows unbounded preceding) as LastType
from
T1
group by
EmailAddress,
Date,
Type
However, this gives a result of 3 rows, instead of 1.
I'll admit I don't quite understand analytic functions since I have not had to deal with them at length. Any help would be greatly appreciated.
Edit:
The aforementioned example is an accurate representation of what the data could look like, however the below example is the exact sample data that I am using.
Sample:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
Desired Outcome:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
Additional Use-Case:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ null ║ null ║
║ A ║ Create ║ 2019-04-01 ║
╚═══════╩════════╩════════════╝
Desired Outcome:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ null ║ null ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
Use window functions and conditional aggregation:
select t.email,
max(case when seqnum = 1 then type end) as first_type,
max(case when seqnum = 1 then date end) as first_date,
max(case when seqnum_nonull = 1 and type is not null then type end) as last_type,
max(case when seqnum_nonull = 1 and type is not null then date end) as last_date
from (select t.*,
row_number() over (partition by email order by date) as seqnum,
row_number() over (partition by email, (case when type is null then 1 else 2 end) order by date) as seqnum_nonull
from t
) t
group by t.email;
As Spark SQL window functions support NULLS LAST|FIRST syntax you could use that then specify a pivot with multiple aggregates for rn values 1 and 2. I could do with seeing some more sample data but this work for your dataset:
%sql
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp;
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date), MAX(type) FOR rn In ( 1, 2 ) )
Rename the columns by supplying your required parts in the query, eg
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
Alternately supply a column list, eg
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
), cte2 AS
(
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
)
SELECT *
FROM cte2 AS (Email, FirstDate, FirstType, LastDate, LastType)
This simple query uses ROW_NUMBER to assign a row number to the dataset ordered by the date column, but using the NULLS LAST syntax to ensure null rows appear last in the numbering. The PIVOT then converts the rows to columns.
I have a select statement:
SELECT ID, A, B, C, D
FROM MyTable
GROUP BY ID, A, B, C, D
HAVING D >= '14/06/2013'
AND D <= '17/06/2013'
show this:
ID | A | B | C | D
--------------------------------------------
11 | 1370 | 0 | 0 | 14/06/2013
11 | 1370 | 100 | 0 | 15/06/2013
11 | 1470 | 400 | 0 | 16/06/2013
11 | 1870 | 0 | 300 | 17/06/2013
I Want the result is:
ID | min of D| Sum(B) | Sum(C) | max of D| MIN(D)
11 | 1370 | 500 | 300 | 1870 | 14/06/2013
How do I do that on SQL Server
Here is a way (assuming SQL Server 2005+):
;WITH CTE AS
(
SELECT *,
RN1 = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY D DESC),
RN2 = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY D)
FROM YourTable
WHERE D >= '20130614'
AND D <= '20130617'
)
SELECT ID,
MIN(CASE WHEN RN2 = 1 THEN A END) [min of D],
SUM(B) [Sum(B)],
SUM(C) [Sum(C)],
MIN(CASE WHEN RN1 = 1 THEN A END) [max of D],
MIN(D) [Min(D)]
FROM CTE
GROUP BY ID
Results:
╔════╦══════════╦════════╦════════╦══════════╦════════════╗
║ ID ║ MIN OF D ║ SUM(B) ║ SUM(C) ║ MAX OF D ║ MIN(D) ║
╠════╬══════════╬════════╬════════╬══════════╬════════════╣
║ 11 ║ 1370 ║ 500 ║ 300 ║ 1870 ║ 2013-06-14 ║
╚════╩══════════╩════════╩════════╩══════════╩════════════╝
And here is an sqlfiddle with a demo of this.
You can do that by a JOIN
SELECT T.ID ,
MAX(G.B) AS [SUM(B)],
MAX(G.C) AS [SUM(C)],
MAX(MINI)AS [MIN(D)] ,
MAX(CASE WHEN T.D = G.MINI THEN T.A ELSE NULL END ) AS [MIN OF D],
MAX(CASE WHEN T.D = G.MAXI THEN T.A ELSE NULL END ) AS [MAX OF D]
FROM TEST T
JOIN ( SELECT ID , SUM(B) B ,SUM(C) C ,MIN(D) AS MINI ,MAX(D) AS MAXI
FROM test
WHERE D >= '06/14/2013'
AND D <= '06/17/2013'
GROUP BY ID ) G ON G.ID = T.ID
GROUP BY T.ID
SQL Fiddle demo HERE
I have table with about 100000 records.I need update same fields like this.
For example this is my table
id name
1 sss
2 bbb
3 ccc
4 avg
5 bbb
6 bbb
7 sss
8 mmm
9 avg
After executing script I need get
id name
1 sss
2 bbb
3 ccc
4 avg
5 bbb-5
6 bbb-6
7 sss-7
8 mmm
9 avg-9
How can I do that?
By using CTE
WITH greaterRecord
AS
(
SELECT id,
name,
ROW_NUMBER() OVER(PARTITION BY name ORDER BY id) RN
FROM TableName
)
UPDATE greaterRecord
SET name = name + '-' + CAST(id AS VARCHAR(10))
WHERE RN > 1
SQLFiddle Demo
This is the common query that works on most RDBMS
UPDATE a
SET a.Name = a.Name + '-' + CAST(ID AS VARCHAR(10))
FROM tableName a
LEFT JOIN
(
SELECT MIN(ID) min_ID, name
FROM tableName
GROUP BY name
) b ON a.name = b.name AND
a.ID = b.Min_ID
WHERE b.Name IS NULL
SQLFiddle Demo
OUTPUT after the update statement has been executed
╔════╦═══════╗
║ ID ║ NAME ║
╠════╬═══════╣
║ 1 ║ sss ║
║ 2 ║ bbb ║
║ 3 ║ ccc ║
║ 4 ║ avg ║
║ 5 ║ bbb-5 ║
║ 6 ║ bbb-6 ║
║ 7 ║ sss-7 ║
║ 8 ║ mmm ║
║ 9 ║ avg-9 ║
╚════╩═══════╝
This should do:
;WITH CTE AS
(
SELECT id,
name,
RN = ROW_NUMBER() OVER(PARTITION BY name ORDER BY id)
FROM YourTable
)
UPDATE CTE
SET name = name + '-' + CAST(id AS VARCHAR(8))
WHERE RN > 1
I have a table named Table1 as shown below:
ID AccountNo Trn_cd
1 123456 P
2 123456 R
3 123456 P
4 12345 P
5 111 R
6 111 R
7 5625 P
I would like to display those records that accountNo appears more than one time (duplicate) and trn_cd has at least both P and R.
In this case the output should be at this way:
ID AccountNo Trn_cd
1 123456 P
2 123456 R
3 123456 P
I have done this sql but not the result i want:
select * from Table1
where AccountNo IN
(select accountno from table1
where trn_cd = 'P' or trn_cd = 'R'
group by AccountNo having count(*) > 1)
Result as below which AccountNo 111 shouldn't appear because there is no trn_cd P for 111:
ID AccountNo Trn_cd
1 123456 P
2 123456 R
3 123456 P
5 111 R
6 111 R
Any idea?
Use aggregation for this. To get the account numbers:
select accountNo
from table1
having count(*) > 1 and
sum(case when trn_cd = 'P' then 1 else 0 end) > 0 and
sum(case when trn_cd = 'N' then 1 else 0 end) > 0
To get the account information, use a join or in statement:
select t.*
from table1 t
where t.accountno in (select accountNo
from table1
having count(*) > 1 and
sum(case when trn_cd = 'P' then 1 else 0 end) > 0 and
sum(case when trn_cd = 'N' then 1 else 0 end) > 0
)
This problem is called Relational Division.
This can be solved by filtering the records which contains P and R and counting the records for every AccountNo returned, and filtering it again using COUNT(DISTINCT Trn_CD) = 2.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT AccountNo
FROM TableName
WHERE Trn_CD IN ('P','R')
GROUP BY AccountNo
HAVING COUNT(DISTINCT Trn_CD) = 2
) b ON a.AccountNO = b.AccountNo
SQLFiddle Demo
SQL of Relational Division
OUTPUT
╔════╦═══════════╦════════╗
║ ID ║ ACCOUNTNO ║ TRN_CD ║
╠════╬═══════════╬════════╣
║ 1 ║ 123456 ║ P ║
║ 2 ║ 123456 ║ R ║
║ 3 ║ 123456 ║ P ║
╚════╩═══════════╩════════╝
For faster performance, add an INDEX on column AccountNo.