SQL - sort of SUM with varchar - sql

have a (weird) table looking like this
ID Version Value1 Value2 Value3
1 1 Shaft
1 2 steel xy
2 1 Knife somethins
2 3 Super
Want to merge, need to have this result, by using Value from the highest Version, that has content:
ID Value1 Value2 Value3
1 Shaft steel xy
2 Super Knife somethin
as far as I know Group using Max(Version) would bring the NULL values of highest Version row.
something like SUM?

Second try... There are probably shorter and nicer solutions, but it should work:
with
v1 as
(
select w1.id, w1.value1 from weird w1
where w1.value1 is not null
and w1.version=(select max(w11.version) from weird w11 where w11.id=w1.id and w11.value1 is not null)
),
v2 as
(
select w2.id, w2.value2 from weird w2
where w2.value2 is not null
and w2.version=(select max(w22.version) from weird w22 where w22.id=w2.id and w22.value2 is not null)
),
v3 as
(
select w3.id, w3.value3 from weird w3
where w3.value3 is not null
and w3.version=(select max(w33.version) from weird w33 where w33.id=w3.id and w33.value3 is not null)
)
select v1.id, v1.value1, v2.value2, v3.value3
from v1, v2, v3
where v1.id=v2.id and v1.id=v3.id;

We can use UNPIVOT and PIVOT creatively to construct the data you want:
declare #t table (ID int not null, Version int not null, Value1 varchar(20) null,
Value2 varchar(20) null, Value3 varchar(20) null)
insert into #t(ID,Version,Value1,Value2,Value3) values
(1,1,'Shaft',null,null),
(1,2,null,'steel','xy'),
(2,1,null,'Knife','somethins'),
(2,3,'Super',null,null)
;With Numberable as (
select *,ROW_NUMBER() OVER (PARTITION BY ID,Val ORDER BY Version desc) rn
from #t t
unpivot (tdata for Val in (Value1,Value2,Value3)) u
), Selected as (
select ID,tdata,Val
from Numberable where rn = 1
)
select
*
from Selected s
pivot (MAX(tdata) for Val in (Value1,Value2,Value3)) u
The UNPIVOT automatically removes the NULLs. The ROW_NUMBER() identifies the values we want to keep. The Selected CTE hides the columns we no longer need so that the PIVOT creates the final result we want:
ID Value1 Value2 Value3
----------- -------------------- -------------------- --------------------
1 Shaft steel xy
2 Super Knife somethins
(I'm using MAX in the pivot but that's just to satisfy the optimizer. Because we've only selected one row for each ID, Val combination, we know that at most one value will be selected to appear in a final position in the grid formed by the pivot)
The above does make the assumption that Value1,Value2 and Value3 all have the same, or at least compatible, data types.

You can rank the values with row_number. The following query first builds such ranks. rn1 is built per id and value1 is null/not null in the descending order of the version. So per ID we get #1 for the last null value and the last filled value. Later we use rn1 = 1 to get the maximum of the two, which is the last filled value. Same for rn2/value2 and rn3/value3.
select
id,
min(case when rn1 = 1 then value1 end) as value1,
min(case when rn2 = 1 then value2 end) as value2,
min(case when rn3 = 1 then value3 end) as value3
from
(
select
id, value1, value2, value3,
row_number() over (partition by id, case when value1 is null then 0 else 1 end order by version desc) as rn1,
row_number() over (partition by id, case when value2 is null then 0 else 1 end order by version desc) as rn2,
row_number() over (partition by id, case when value3 is null then 0 else 1 end order by version desc) as rn3
from mytable
) ranked
group by id
order by id;

Used CASE WHEN to SELECT max(version) where value is not null and not blank and then joinedwith the original table on those versions. You can see it in action in link provided below the query
Use this query.
Select distinct a.*, b.value1, c.value2, d.value3
from
(
Select id, max(case when (value1 is not null and value1 <> ' ') then version else 0 end) as ver1,
max(case when (value2 is not null and value2 <> ' ') then version else 0 end) as ver2,
max(case when (value3 is not null and value3 <> ' ') then version else 0 end) as ver3
from
your_table
group by id
) a
inner join
your_table b,
your_table c,
your_table d
where (a.ver1=b.version and a.id=b.id)
and (a.ver2=c.version and a.id=c.id)
and (a.ver3=d.version and a.id=d.id)
See it in action here at this link

Related

TSQL - Rank Values but return Column Names

I have a table of data points that I need ranked by column.
I have about 500k Id's and 25 columns (as of now).
I'd like to make the query dynamic so that any added columns will not require a code change.
For each ID, I want to find column names of the top 3 values.
The results should be:
[Id]
[Rank1]
[Rank2]
[Rank3]
27807745
Value3
Value2
Value8
96448378
Value6
Value5
Value1
etc
My first attempt was to create a joined table of Id's and column names:
[Id]
[Value1]
[Value]
27807745
Value1
NULL
27807745
Value2
NULL
27807745
Value3
NULL
27807745
Value4
NULL
27807745
Value5
NULL
Then run a looped update, then sequence by Id, Value DESC.
This gets me there but is taking over 2 hours to complete.
I looked at PIVOT and UNPIVOT but both want to return the values, not the column names.
Is there any other way to do this?
Here is an option that will dynamically unpivot your data WITHOUT using Dynamic SQL
Example or dbFiddle
Select A.ID
,B.*
From YourTable A
Cross Apply (
Select Rank1 = max(case when Rnk=1 then [Key] end)
,Rank2 = max(case when Rnk=2 then [Key] end)
,Rank3 = max(case when Rnk=3 then [Key] end)
From (
Select [Key]
,Value
,Rnk = row_number() over (order by convert(int,value) desc)
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper ))
Where [Key] Not IN( 'ID','Other','Columns','ToExclude')
) B1
) B

sql transformation - rows to column

i have been trying to solve this one image
my initial idea is like this
select name,
CASE
when count(name) = 1 then get first distinct value
when count(name) = 2 then get first distinct value
else get first distinct value
END as val1,
CASE
when count(name) = 1 then null
when count(name) = 2 then get second distinct value
else get second distinct value
END as val2,
CASE
when count(name) = 1 then null
when count(name) = 2 then null
else get third distinct value
END as val3
into desired_table
from source_table
group by name
is my attempt feasible? if so, how do i access the first, second and third distinct values?
use pivot . Your output table was incorrect. The correct form is available in db<>fiddle.
select name,x as value1,y as value2,z as value3
from
(
select *
from t1
) as SourceTable
pivot
(
max(value) for value in(x,y,z)
) as PivotTable
demo in db<>fiddle
You can use conditional aggregation along with row_number():
select name,
max(case when seqnum = 1 then value end) as value_1,
max(case when seqnum = 2 then value end) as value_2,
max(case when seqnum = 3 then value end) as value_3
into desired_table
from (select s.*,
row_number() over (partition by name order by value) as seqnum
from source_table s
) s
group by name;

How to Get row values as columns in SQL?

I have a table Test with two columns.
Id Value
1 A
1 B
1 C
I want to get the result like below,
Id Value1 Value2 value3
1 A B C
How can I done this in SQL Server.
This is a pivot, but you don't have a column for the pivoting. row_number() can provide that. I usually use conditional aggregations for this.
select id,
max(case when seqnum = 1 then value end) as value1,
max(case when seqnum = 2 then value end) as value2,
max(case when seqnum = 3 then value end) as value3
from (select t.*,
row_number() over (partition by id order by (select null)) as seqnum
from t
) t
group by id;
Note that SQL tables represent unordered sets. So, there is no information about ordering and the values could be in any order. If a column does specify the ordering, then include that in the order by rather than select null.

SQL / Hive Select first rows with certain column value

Consider following 3 column table structure:
id, b_time, b_type
id is a string, there will be multiple rows with the same id in the table.
b_time is timestamp and b_type can have any one of 2 possible values - 'A' or 'B'.
I want to select all the rows that fulfill one of the 2 conditions, priority wise:
For all ids, select the row with highest timestamp, where b_type='A'.
If for an id, there are no rows where b_type='A', select the row with highest timestamp, irrespective of the b_type value.
Please suggest the sql query which should tackle this problem(even if it requires creation of temporary intermediate tables).
Figured out a simple and intuitive way to do this:
SELECT * FROM
(SELECT id
, b_time
, b_type
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY b_type ASC,b_time DESC) AS RN
FROM your_table
)
WHERE RN = 1
with nottypea as (select id, max(b_time) as mxtime
from tablename
group by id
having sum(case when b_type = 'A' then 1 else 0 end) = 0)
, typea as (select id, max(b_time) as mxtime
from tablename
group by id
having sum(case when b_type = 'A' then 1 else 0 end) >= 1)
select id,mxtime,'B' as typ from nottypea
union all
select id,mxtime,'A' as typ from typea

find row number by group in SQL server table with duplicated rows

I need to count the row number by group in a table with some duplications.
Table:
id va1ue1 value2
1 3974 39
1 3974 39
1 972 5
1 972 10
SQL:
select id, value1, value2, COUNT(*) cnt
FROM table
group by id, value1, value2
having COUNT(*) > 1
The code only count the duplicated rows.
I need:
id, value1, value2
1 972 5
1 972 10
I do not need to count the duplicated rows, I only need the rows that value1 has more than one distinct values in value2 column.
Thanks
Use DISTINCT:
select id, value1, count(distinct value2) cnt
from table
group by id, value1
having count(distinct value2) > 1
If you want detais then:
select * from table t1
cross apply(select cnt from(
select count(distinct value2) cnt
from table t2
where t1.id = t2.id and t1.value1 = t2.value1) t
where cnt > 1)ca
In SQL Server 2008, you can use a trick to count distinct values using window functions. You might find this a nice solution:
select t.id, t.value1, t.value2
from (select t.*, sum(case when seqnum = 1 then 1 else 0 end) over (partition by value1) as numvals
from (select t.*, row_number() over (partition by value1, value2 order by (select null)) as seqnum
from table t
) t
) t
where numvals > 1;
Try it this way without a GROUP BY:
select id, value1, value2
FROM table AS T1
where 1 < (
select COUNT(*)
FROM table AS T2
where T1.value1 = T2.value1)
Try this
;WITH CTE
AS ( SELECT id ,
value1 ,
value2 ,
COUNT(*) cnt
FROM table
GROUP BY id ,
value1 ,
value2
HAVING COUNT(*) > 1
)
SELECT *
FROM table1
WHERE value1 IN ( SELECT value1
FROM CTE )
Simply use a NOT after HAVING, which precisely gets you the rows which are NOT duplicated.
select id, value1, value2
FROM [table]
group by id, value1, value2
having NOT COUNT(*) > 1
Fiddle here.
If you want the actual rows from the table, not just the qualifying id, value1 pairs, you could do this:
WITH discrepancies AS (
SELECT,
id,
value1,
value2,
distinctcount = COUNT(DISTINCT value2) OVER (PARTITION BY id, value1)
FROM
dbo.atable
)
SELECT
id,
value1,
value2
FROM
discrepancies
WHERE
distinctcount > 1
;
if SQL Server 2008 supported COUNT(DISTINCT ...) with an OVER clause.
Basically, it would be the same idea as Giorgi Nakeuri's one, more or less, except you would not be hitting the table more than once.
Alas, there is no support for COUNT(DISTINCT ...) OVER ... in SQL Server so far. Still, you can use a different method, which will still allow you to touch the table just once and return detail rows nevertheless:
WITH discrepancies AS (
SELECT,
id,
value1,
value2,
minvalue2 = MIN(value2) OVER (PARTITION BY id, value1),
maxvalue2 = MAX(value2) OVER (PARTITION BY id, value1)
FROM
dbo.atable
)
SELECT
id,
value1,
value2
FROM
discrepancies
WHERE
minvalue2 <> maxvalue2
;
The idea here is to get MIN(value2) and MAX(value2) per each id, value1 and to see if those differ. If they do, that means you have a discrepancy in this id, value1 subset and you want that row to be returned.
The method takes advantage of aggregates with an OVER clause to avoid a self-join, and that is precisely the reason why the table is accessed just once here.