Rearrange data grouping and creating new columns in SQL Server - sql

I have a table like this:
email
id
albert
1
jped
2
rufus
3
rufuscomp
3
cousruf
3
peter
4
peter2
4
clarisse
5
johan
6
john
7
And I would like to obtain a table like this:
id
email_1
email_ 2
email_3
1
albert
NULL
NULL
2
jped
NULL
NULL
3
rufus
rufuscomp
cousruf
4
peter
peter2
NULL
5
clarisse
NULL
NULL
6
johan
NULL
NULL
7
john
NULL
NULL
I would like to do this in SQL language. So the algorythm should identify the maximum numbers of repetitions in company_id to prepare the number of columns it will have the new table, and then rearrange all the values.
I have found this SQL - Grouping creating new columns but it's not working in SQL Server.

I wanna share the solution I have found. First of all I had to use the ROW_NUMBER() window function, then I followed exactly this: https://www.sqlshack.com/dynamic-pivot-tables-in-sql-server/
So the final code is this one:
SELECT
*
FROM
(
SELECT
[id],
[number_of_row],
[email]
FROM
(
SELECT
id,
CONCAT(
'Email_',
ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY
email
)
) AS number_of_row,
email
FROM
original_table
) AS table_2
) AS table_3 PIVOT (
MAX([email]) FOR [number_of_row] IN (
[Email_1],
[Email_2],
[Email_3]
)
) AS pivot_table

Related

Average and sort by this based on other conditional columns in a table

I have a table in SQL Server 2017 like below:
Name Rank1 Rank2 Rank3 Rank4
Jack null 1 1 3
Mark null 3 2 2
John null 2 3 1
What I need to do is to add an average rank column then rank those names based on those scores. We ignore null ranks. Expected output:
Name Rank1 Rank2 Rank3 Rank4 AvgRank FinalRank
Jack null 1 1 3 1.66 1
Mark null 3 2 2 2.33 3
John null 2 3 1 2 2
My query now looks like this:
;with cte as (
select *, AvgRank= (Rank1+Rank2+Rank3+Rank4)/#NumOfRankedBy
from mytable
)
select *, FinakRank= row_number() over (order by AvgRank)
from cte
I am stuck at finding the value of #NumOfRankedBy, which should be 3 in our case because Rank1 is null for all.
What is the best way to approach such an issue?
Thanks.
Your conumdrum stems from the fact your table in not normalised and you are treating data (Rank) as structure (columns).
You should have a table for Ranks where each rank is a row, then your query is easy.
You can unpivot your columns into rows and then make use of avg
select *, FinakRank = row_number() over (order by AvgRank)
from mytable
cross apply (
select Avg(r * 1.0) AvgRank
from (values(rank1),(rank2),(rank3),(rank4))r(r)
)r;

How to select all duplicate rows except original one?

Let's say I have a table
CREATE TABLE names (
id SERIAL PRIMARY KEY,
name CHARACTER VARYING
);
with data
id name
-------------
1 John
2 John
3 John
4 Jane
5 Jane
6 Jane
I need to select all duplicate rows by name except the original one. So in this case I need the result to be this:
id name
-------------
2 John
3 John
5 Jane
6 Jane
How do I do that in Postgresql?
You can use ROW_NUMBER() to identify the 'original' records and filter them out. Here is a method using a cte:
with Nums AS (SELECT id,
name,
ROW_NUMBER() over (PARTITION BY name ORDER BY ID ASC) RN
FROM names)
SELECT *
FROM Nums
WHERE RN <> 1 --Filter out rows numbered 1, 'originals'
select * from names where not id in (select min(id) from names
group by name)

Custom Sort Based On Referenced Records

Please consider these data:
Id F1 F2 Ref_ID
-------------------------------------------
1 Nima 35 Null
2 Eli 33 Null
3 Arian 5 1
4 Ava 1 1
5 Arsha 3 2
6 Rozhan 30 1
7 Zhina 20 2
I want to sort this table like this result:
Id F1 F2 Ref_ID
-------------------------------------------
1 Nima 35 Null
3 Arian 5 1
4 Ava 1 1
6 Rozhan 30 1
2 Eli 33 Null
5 Arsha 3 2
7 Zhina 20 2
the refrenced records should place under the reference record based on Id ascending.
How I can do this using LINQ or SQL. Thanks
In SQL you could sort it by using a COALESCE or ISNULL for the Ref_id and the Id.
And an IIF or a CASE WHEN to make sure the parent id comes first in the same group of Ref_id.
SELECT Id, F1, F2, Ref_ID
FROM YourTable
ORDER BY COALESCE(Ref_ID, Id), IIF(Ref_ID IS NULL, 0, 1), Id;
A test on db<>fiddle here
Here is a more simple solution:
SELECT
Id,
F1,
F2,
Ref_ID
FROM
#Table
ORDER BY
ISNULL(Ref_ID,ID), ID
Result:
Using LINQ, you can do it like this:
from i in data
orderby i.Ref_ID ?? i.Id
select i;
Another solution is to add an extra column in the query, and sort on that column
select t.*
from ( select Id,
F1,
F2,
Ref_ID,
coalesce(Ref_ID, Id) as RefID_or_Id,
iif(Ref_ID is null, 0, 1) as Ref_ID_0_or_1
from YourTable
) t
order by t.RefID_or_Id,
t.Ref_ID_0_or_1,
t.Id
In case your table is large, you should test wich of the solutions here performs the best for you

SQL Server get distinct counts of name by each ID

I have a dataset like :
ID NAME
1 Aaron
2 Theon
3 Jon Snow
4 Jon Snow
4 Dany
5 Arya
5 Robert
5 Tyrion
I need to add a new column to this that shows the output based on the number of distinct names per ID. So expected output would be:
ID NAME Mapping
1 Aaron 1
2 Theon 1
3 Jon Snow 1
4 Jon Snow 2
4 Dany 2
5 Arya 3
5 Robert 3
5 Tyrion 3
I am confused about how to achieve this since I have tried a case statement where count(distinct(name)) does not return the right values.
You may try using COUNT as an analytic function:
SELECT
ID,
Name,
COUNT(*) OVER (PARTITION BY ID) Mapping
FROM yourTable
ORDER BY
ID;
Another approach to get COUNT of DISTINCT Name for each ID
SELECT *,
(SELECT Count(DISTINCT NAME)
FROM #table T
WHERE T1.id = T.id) Mapping
FROM #table T1
Online Demo
You can simply use below query
SELECT COUNT(DISTINCT NAME)
FROM YOUR_TABLE
GROUP BY ID
Thanks
Other method (specif SQL Server, otherwise use INNER JOIN LATERAL):
SELECT *
FROM #table f1
CROSS APPLY
(
select Count(*) Nb from #table f2
where f2.ID=f1.ID
) f3

In SQL, find duplicates in one column with unique values for another column

So I have a table of aliases linked to record ids. I need to find duplicate aliases with unique record ids. To explain better:
ID Alias Record ID
1 000123 4
2 000123 4
3 000234 4
4 000123 6
5 000345 6
6 000345 7
The result of a query on this table should be something to the effect of
000123 4 6
000345 6 7
Indicating that both record 4 and 6 have an alias of 000123 and both record 6 and 7 have an alias of 000345.
I was looking into using GROUP BY but if I group by alias then I can't select record id and if I group by both alias and record id it will only return the first two rows in this example where both columns are duplicates. The only solution I've found, and it's a terrible one that crashed my server, is to do two different selects for all the data and then join them
ON [T_1].[ALIAS] = [T_2].[ALIAS] AND NOT [T_1].[RECORD_ID] = [T_2].[RECORD_ID]
Are there any solutions out there that would work better? As in, not crash my server when run on a few hundred thousand records?
It looks as if you have two requirements:
Identify all aliases that have more than one record id, and
List the record ids for these aliases horizontally.
The first is a lot easier to do than the second. Here's some SQL that ought to get you where you want with the first:
WITH A -- Get a list of unique combinations of Alias and [Record ID]
AS (
SELECT Distinct
Alias
, [Record ID]
FROM T1
)
, B -- Get a list of all those Alias values that have more than one [Record ID] associated
AS (
SELECT Alias
FROM A
GROUP BY
Alias
HAVING COUNT(*) > 1
)
SELECT A.Alias
, A.[Record ID]
FROM A
JOIN B
ON A.Alias = B.Alias
Now, as for the second. If you're satisfied with the data in this form:
Alias Record ID
000123 4
000123 6
000345 6
000345 7
... you can stop there. Otherwise, things get tricky.
The PIVOT command will not necessarily help you, because it's trying to solve a different problem than the one you have.
I am assuming that you can't necessarily predict how many duplicate Record ID values you have per Alias, and thus don't know how many columns you'll need.
If you have only two, then displaying each of them in a column becomes a relatively trivial exercise. If you have more, I'd urge you to consider whether the destination for these records (a report? A web page? Excel?) might be able to do a better job of displaying them horizontally than SQL Server can do in returning them arranged horizontally.
Perhaps what you want is just the min() and max() of RecordId:
select Alias, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having min(RecordId) <> max(RecordId)
You can also count the number of distinct values, using count(distinct):
select Alias, count(distinct RecordId) as NumRecordIds, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having count(DISTINCT RecordID) > 1;
This will give all repeated values:
select Alias, count(RecordId) as NumRecordIds,
from yourTable t
group by Alias
having count(RecordId) <> count(distinct RecordId);
I agree with Ann L's answer but would like to show how you can use window functions with CTE's as you may prefer the readability.
(Re: how to pivot horizontally, I again agree with Ann)
create temporary table things (
id serial primary key,
alias varchar,
record_id int
)
insert into things (alias, record_id) values
('000123', 4),
('000123', 4),
('000234', 4),
('000123', 6),
('000345', 6),
('000345', 7);
with
things_with_distinct_aliases_and_record_ids as (
select distinct on (alias, record_id)
id,
alias,
record_id
from things
),
things_with_unique_record_id_counts_per_alias as (
select *,
COUNT(*) OVER(PARTITION BY alias) as unique_record_ids_count
from things_with_distinct_aliases_and_record_ids
)
select * from things_with_unique_record_id_counts_per_alias
where unique_record_ids_count > 1
The first CTE gets all the unique alias/record id combinations. E.g.
id | alias | record_id
----+--------+-----------
1 | 000123 | 4
4 | 000123 | 6
3 | 000234 | 4
5 | 000345 | 6
6 | 000345 | 7
The second CTE simply creates a new column for the above and adds the count of record ids for each alias. This allows you to filter only those aliases which have more than one record id associated with them.
id | alias | record_id | unique_record_ids_count
----+--------+-----------+-------------------------
1 | 000123 | 4 | 2
4 | 000123 | 6 | 2
3 | 000234 | 4 | 1
5 | 000345 | 6 | 2
6 | 000345 | 7 | 2
SELECT A.CitationId,B.CitationId, A.CitationName, A.LoaderID, A.PrimaryReferenceLoaderID,B.SecondaryReference1LoaderID, A.SecondaryReference1LoaderID, A.SecondaryReference2LoaderID,
A.SecondaryReference3LoaderID, A.SecondaryReference4LoaderID, A.CreatedOn, A.LastUpdatedOn
FROM CitationMaster A, CitationMaster B
WHERE A.PrimaryReferenceLoaderID= B.SecondaryReference1LoaderID and Isnull(A.PrimaryReferenceLoaderID,'') != '' and Isnull(B.SecondaryReference1LoaderID,'') !=''