SQL Server pivot values with different data types - sql

i am trying to pivot all values in different type in MSSQL 2016. I could not find a way how i can pivot different data types..
The first table are initial form / structure. The second table is the desired shape.
I was trying the following SQL code to pivot my values
SELECT
[id] AS [id],
FIRSTNAME,
LASTNAME,
BIRTHDATE,
ADDRESS,
FLAG,
NUMBER
FROM (
SELECT
[cm].[key] AS [id],
[cm].[column] AS [column],
[cm].[string] AS [string],
[cm].[bit] AS [bit],
[cm].[xml] AS [xml],
[cm].[number] AS [number],
[cm].[date] AS [date]
FROM [cmaster] AS [cm]
) AS [t]
PIVOT (
MAX([string]) --!?!?
FOR [column] IN (
FIRSTNAME,
LASTNAME,
BIRTHDATE,
ADDRESS,
FLAG,
NUMBER
)
) AS [p]

I think your best bet is to use conditional aggregation, e.g.
SELECT cm.id,
FIRSTNAME = MAX(CASE WHEN cm.[property] = 'firstname' THEN cm.[string] END),
LASTNAME = MAX(CASE WHEN cm.[property] = 'lastname' THEN cm.[string] END),
BIRTHDATE = MAX(CASE WHEN cm.[property] = 'birthddate' THEN cm.[date] END),
FLAG = CONVERT(BIT, MAX(CASE WHEN cm.[bit] = 'flag' THEN CONVERT(TINYINT, cm.[boolean]) END)),
NUMBER = MAX(CASE WHEN cm.[property] = 'number' THEN cm.[integer] END)
FROM cmaster AS cm
GROUP BY cm.id;
Although, as you can see, your query becomes very tightly coupled to your EAV model, and why EAV is considered an SQL antipattern. Your alternative is to create a single column in your subquery and pivot on that, but you have to convert to a single data type, and lose a bit of type safety:
SELECT id, FIRSTNAME, LASTNAME, BIRTHDATE, ADDRESS, FLAG, NUMBER
FROM (
SELECT id = cm.[key],
[column] = cm.[column],
Value = CASE cm.type
WHEN 'NVARCHAR' THEN cm.string
WHEN 'DATETIME' THEN CONVERT(NVARCHAR(MAX), cm.date, 112)
WHEN 'XML' THEN CONVERT(NVARCHAR(MAX), cm.xml)
WHEN 'BIT' THEN CONVERT(NVARCHAR(MAX), cm.boolean)
WHEN 'INT' THEN CONVERT(NVARCHAR(MAX), cm.integer)
END
FROM cmaster AS cm
) AS t
PIVOT
(
MAX(Value)
FOR [column] IN (FIRSTNAME, LASTNAME, BIRTHDATE, ADDRESS, FLAG, NUMBER)
) AS p;

In order to make the result as per your request, first thing is we need to bring the data in to one format which is compatible with all data types. VARCHAR is ideal for that. Then prepare the base table using a simple select query, then PIVOT the result.
In the last projection, if you want, you can convert the data back in to the original format.
This query can be written dynamically as well to obtain the result as records are added. Here I provide the static answer according to your data. If you need a more generic dynamic answer, let me know. So I can post here.
--data insert scripts I used:
CREATE TABLE First_Table
(
[id] int,
[column] VARCHAR(10),
[string] VARCHAR(20),
[bit] BIT,
[xml] [xml],
[number] INT,
[date] DATE
)
SELECT GETDATE()
INSERT INTO First_Table VALUES(1, 'FIRST NAME', 'JOHN' , NULL, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'LAST NAME', 'DOE' , NULL, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'BIRTH DATE', NULL , NULL, NULL, NULL, '1985-02-25')
INSERT INTO First_Table VALUES(1, 'ADDRESS', NULL , NULL, 'SDFJDGJOKGDGKPDGKPDKGPDKGGKGKG', NULL, NULL)
INSERT INTO First_Table VALUES(1, 'FLAG', NULL , 1, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'NUMBER', NULL , NULL, NULL, 20, NULL)
SELECT
PIVOTED.* FROM
(
--MAKING THE BASE TABLE FOR PIVOT
SELECT
[id]
,[column] AS [COLUMN]
, CASE WHEN [column] = 'FIRST NAME' then [string]
WHEN [column] = 'LAST NAME' then [string]
WHEN [column] = 'BIRTH DATE' then CAST([date] AS VARCHAR(100))
WHEN [column] = 'ADDRESS' then CAst([xml] as VARCHAR(100))
WHEN [column] = 'FLAG' then CAST([bit] AS VARCHAR(100))
else CAST([number] AS VARCHAR(100)) END AS [VALUE]
FROM First_Table
) AS [P]
PIVOT
(
MIN ([P].[VALUE])
FOR [column] in ([FIRST NAME],[LAST NAME],[BIRTH DATE],[ADDRESS],[FLAG],[NUMBER])
) AS PIVOTED
RESULT:

SQL:
SELECT
            ID,
            FIRSTNAME,
            ...,
            FLAG = CAST (FLAG AS INT),
            ...
FROM
            (
            SELECT
                        *
            FROM
                        (
                        SELECT
                                    f.ID,
                                    f.PROPERTY,
                                    f.STRING + f."INTEGER" + f.DATETIME + f.BOLLEAN + f.XML AS COLS
                        FROM
                                    FIRSTTBL f)
            PIVOT(
                        min(COLS) FOR PROPERTY IN
                                    (
                                    'firstname' AS firstname,
                                    'lastname' AS lastname,
                                    'birthdate' AS birthdate,
                                    'address' AS address,
                                    'flag' AS flag,
                                    'number' AS "NUMBER"
                                    )
                        )
            )
According to the original table, there is one and only one non-null value among STRING, INTEGER, DATETIME, BOLLEAN and XML columns for any row, so we just need to get the first non-null value and assign it to the corresponding new column. It is not difficult to perform the transposition using PIVOT function, except that we need to handle different data types according to the SQL rule, which requires that each column have a consistent type. For this task, first we need to convert the combined column values into a string, perform row-to-column transposition, and then convert string back to the proper types. When there are a lot of columns, the SQL statement can be tricky, and dynamic requirements are even hard to achieve.
Yet it is easy to write the code using the open-source esProc SPL:
 
A
1
=connect("MSSQL")
2
=A1.query#x("SELECT * FROM FIRSTTBL")
3
=A2.pivot(ID;PROPERTY,~.array().m(4:).ifn();"firstname":"FIRSTNAME", "lastname":"LASTANME","birthdate":"BIRTHDAY","address":"ADDRESS","flag":"FLAG","number":"NUMBER")
SPL does not require that data in the same column have consistent type. It is easy for it to maintain the original data types while performing the transposition.

Related

I need to unpivot columns to rows where pairs of columns stay together in the results

The following article comes close, but I can't make the leap to my need: Unpivot pairs of associated columns to rows
IF OBJECT_ID ('dbo.tst_CrossApply') IS NOT NULL
DROP TABLE dbo.tst_CrossApply;
create table dbo.tst_CrossApply
(
GivenDay varchar(32) null,
OtherData varchar(32) null,
CODRPL varchar(32) null,
COD varchar (32) null,
BODRPL varchar(32) null,
BOD varchar (32) null,
)
go
insert into dbo.tst_CrossApply values ( 'Day1','OtherData1','<', '5','', '10')
insert into dbo.tst_CrossApply values ( 'Day2','OtherData2', '', '20','<', '30')
go
SELECT * FROM dbo.tst_CrossApply
SELECT t.[GivenDay],t.[OtherData],v.[RPL],v.[Result]
FROM [dbo].[tst_CrossApply] t
CROSS APPLY (VALUES ([CODRPL], [COD]),([BODRPL], [BOD])) v ([RPL],[Result])
The above script returns the above with the second piture minus the needed Column 'Parameter'.
I can get this column, but not the pairing of the RPL and Result columns using UNPIVOT
In my database there are several 'OtherData' columns, and several pairs of columns to CrossApply and/or UNPIVOT.
The following includes the Parameter column I need, which is one of the second of the paried column headings.
Any help is appreciated.
You're close. See the "Unpivoting" example linked in the next thread.
SELECT t.[GivenDay]
, t.[OtherData]
, v.[Param]
, v.[RPL]
, v.[Result]
FROM [dbo].[tst_CrossApply] t
CROSS APPLY (
VALUES ('COD', [CODRPL], [COD])
, ('BOD', [BODRPL], [BOD])
) v ([Param], [RPL],[Result])
Update 2022-03-02
I'm not aware of a simple alternative using UNPIVOT. The closest I could get was more convoluted than just using CROSS APPLY
SELECT cod.GivenDay, cod.OtherData, cod.Param, cod.RPL, cod.Result
FROM (
SELECT GivenDay, OtherData, COD, CODRPL AS RPL
FROM [dbo].[tst_CrossApply] t
) pvt
UNPIVOT
(
Result FOR Param IN (COD)
) AS cod
UNION ALL
SELECT bod.GivenDay, bod.OtherData, bod.Param, bod.RPL, bod.Result
FROM (
SELECT GivenDay, OtherData, BOD, BODRPL AS RPL
FROM [dbo].[tst_CrossApply] t
) pvt
UNPIVOT
(
Result FOR Param IN (BOD)
) AS bod
ORDER BY GivenDay, OtherData, Param
db<>fiddle here

Outer Apply with case statement and Xpath inside returns error

My table has 3 columns:
EmployeeId
Active
Data
Data is XML type that holds subjects and marks. I need to extract mark and subject from XML, but when the employee id InActive, I need to just return a single result as NULL even though the XML has Subject and Mark.
Below is the query I tried and I got following error.
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
SELECT eo.employeeid,
Result.marks.[Subject],
Result.marks.[Mark]
FROM employee eo
OUTER apply (SELECT CASE
WHEN eo.active = 0 THEN (SELECT NULL AS 'Subject',
NULL AS 'Mark')
ELSE ((SELECT f.n.value('#Subject', 'varchar(100)')
AS
'Subject',
f.n.value('#Mark', 'int')
AS
'Mark'
FROM eo.data.nodes('(/Employee/Results)') AS
F(n)))
END AS marks) Result
DECLARE #employee TABLE
(
employeeid INT IDENTITY(1,1),
active BIT,
data XML
);
INSERT INTO #employee(active, data)
VALUES
(1, N'<Employee><Results Subject="subject 1" Mark="111"/></Employee>'),
(1, N'<Employee><Results Subject="subject 2" Mark="222"/></Employee>'),
(0, N'<Employee><Results Subject="subject 3" Mark="333"/></Employee>');
SELECT
eo.employeeid, eo.active,
f.n.value('#Subject', 'varchar(100)') AS 'Subject',
f.n.value('#Mark', 'int') AS 'Mark'
FROM
(
SELECT employeeid, active,
CASE active WHEN 1 THEN data END AS data
FROM #employee
) AS eo
OUTER APPLY eo.data.nodes('Employee/Results') AS F(n);

Improve SQL Query to find redundant data

the following shows my sample dataset
PatientID PatientName
XXX-037070002 Riger, Jens^Wicki
XXX-037070002 Riger^Wicki
XXX-10052 Weier,Nicole^Peggy
XXX-10052 Weier,Nicole^Peppy
XXX-23310 Rodem^Sieglinde
XXX-23310 Sauberger, Birgit^Finja
XXX-23343 Je, Ronny^Wilma
XXX-23343 Jer, Ronny^Wilma
XXX-2349 Kel,Andy^Juka
XXX-2349 Kel^Juka
XXX-2998 Hel, Frank
XXX-2998 Hel,Frank^Fenris
XXX-3188 Mey, Marion
XXX-3188 Mey, Marion^Paula
XXX-3188 Schulz^Roma
XXX-3218 Böntgen-Simnet,Dr. Regine^Cara
XXX-3218 Simnet,Dr. Regine^Cara
XXX-3826 Mertes, Bernd Uwe^Ellie
XXX-3826 Mertes,Bernd^Ellie
XXX-3826 Mertes^Ellie
This is the query I got from my last request:
with d as
(
select distinct
patid,
patname
from dicomstudys
)
select *
from d
where d.patid in
(
select d.patid
from d
group by d.patid
having count(*) > 1
)
Now I want to adjust the query that only the following data get's an output:
PatientID PatientName
XXX-23310 Rodem^Sieglinde
XXX-23310 Sauberger, Birgit^Finja
XXX-23343 Je, Ronny^Wilma
XXX-23343 Jer, Ronny^Wilma
XXX-3188 Mey, Marion
XXX-3188 Mey, Marion^Paula
XXX-3188 Schulz^Roma
XXX-3218 Böntgen-Simnet,Dr. Regine^Cara
XXX-3218 Simnet,Dr. Regine^Cara
Last names are either seperated with a ',' or '^' . If last names are the same for the same PatientID then I dont want them being displayed. I tried fiddling with a sub select statement featuring a combination of CHARINDEX commands and others but my SQL syntax knowledge is very limited with the complexity of the request.
Please also note that for the case for XXX-3188 has two datasets with the same last name but also another dataset with a complete new patientName and thus it needs to be in the output.
Try this:
DECLARE #DataSource TABLE
(
[ID] VARCHAR(32)
,[Name] VARCHAR(256)
);
INSERT INTO #DataSource ([ID], [Name])
VALUES ('XXX-037070002', 'Riger, Jens^Wicki')
,('XXX-037070002', 'Riger^Wicki')
,('XXX-10052', 'Weier,Nicole^Peggy')
,('XXX-10052', 'Weier,Nicole^Peppy')
,('XXX-23310', 'Rodem^Sieglinde')
,('XXX-23310', 'Sauberger, Birgit^Finja')
,('XXX-23343', 'Je, Ronny^Wilma')
,('XXX-23343', 'Jer, Ronny^Wilma')
,('XXX-2349', 'Kel,Andy^Juka')
,('XXX-2349', 'Kel^Juka')
,('XXX-2998', 'Hel, Frank')
,('XXX-2998', 'Hel,Frank^Fenris')
,('XXX-3188', 'Mey, Marion')
,('XXX-3188', 'Mey, Marion^Paula')
,('XXX-3188', 'Schulz^Roma')
,('XXX-3218', 'Böntgen-Simnet,Dr. Regine^Cara')
,('XXX-3218', 'Simnet,Dr. Regine^Cara')
,('XXX-3826', 'Mertes, Bernd Uwe^Ellie')
,('XXX-3826', 'Mertes,Bernd^Ellie')
,('XXX-3826', 'Mertes^Ellie');
WITH DataSource AS
(
SELECT [ID]
,[Name]
,COUNT(*) OVER (PARTITION BY [ID], LTRIM(RTRIM(SUBSTRING([Name], 0, CHARINDEX(',', REPLACE([Name], '^', ',')))))) AS [ID_Name_Count]
,COUNT(*) OVER (PARTITION BY [ID]) AS [ID_Count]
,LTRIM(RTRIM(SUBSTRING([Name], 0, CHARINDEX(',', REPLACE([Name], '^', ','))))) AS [FamilyName]
FROM #DataSource
)
SELECT [ID]
,[Name]
FROM DataSource
WHERE [ID_Name_Count] = 1
AND [ID_Count] = 2
OR [ID] IN
(
SELECT [ID]
FROM DataSource
GROUP BY [ID]
HAVING COUNT(DISTINCT [FamilyName]) > 1
);
Тhe solution is pretty easy. Here are the interesting parts:
replace the ^ with , in order to simplify the last name extraction
extract the last name and calculation count based on ID and last name
in the final select check for unique id-last name pairs with id count equal to 2 and add ids with more then one unique family names (your special case)
You can try something like that:
Test data
drop table if exists #Patient;
create table #Patient (
PatientID varchar(20),
PatientName varchar(50)
);
insert into #Patient(PatientID,PatientName)
values ('XXX-037070002' ,'Riger, Jens^Wicki'),
('XXX-037070002' ,'Riger^Wicki'),
('XXX-10052' ,'Weier,Nicole^Peggy'),
('XXX-10052' ,'Weier,Nicole^Peppy'),
('XXX-23310' ,'Rodem^Sieglinde'),
('XXX-23310' ,'Sauberger, Birgit^Finja'),
('XXX-23343' ,'Je, Ronny^Wilma'),
('XXX-23343' ,'Jer, Ronny^Wilma'),
('XXX-2349' ,'Kel,Andy^Juka'),
('XXX-2349' ,'Kel^Juka'),
('XXX-2998' ,'Hel, Frank'),
('XXX-2998' ,'Hel,Frank^Fenris'),
('XXX-3188' ,'Mey, Marion'),
('XXX-3188' ,'Mey, Marion^Paula'),
('XXX-3188' ,'Schulz^Roma'),
('XXX-3218' ,'Böntgen-Simnet,Dr. Regine^Cara'),
('XXX-3218' ,'Simnet,Dr. Regine^Cara'),
('XXX-3826' ,'Mertes, Bernd Uwe^Ellie'),
('XXX-3826' ,'Mertes,Bernd^Ellie'),
('XXX-3826' ,'Mertes^Ellie');
My solution
with q1 as (
select
PatientID,
PatientName,
case when CHARINDEX(',',REPLACE( PatientName, '^',',')) > 0
then LEFT(PatientName,CHARINDEX(',',REPLACE( PatientName, '^',','))-1)
else PatientName end as FullName
from #Patient
) ,
q2 as (
select PatientID
from q1
group by PatientID having COUNT(1) > 1 and COUNT(DISTINCT FullName) > 1 )
select t.PatientID,t.PatientName
from #Patient t join q2 on t.PatientID = q2.PatientID;

How do I return the column name in table where a null value exists?

I have a table of more than 2 million rows and over 100 columns. I need to run a query that checks if there are any null values in any row or column of the table and return an ID number where there is a null. I've thought about doing the following, but I was wondering if there is a more concise way of checking this?
SELECT [ID]
from [TABLE_NAME]
where
[COLUMN_1] is null
or [COLUMN_2] is null
or [COLUMN_3] is null or etc.
Your method is fine. If your challenge is writing out the where statement, then you can run a query like this:
select column_name+' is null or '
from information_schema.columns c
where c.table_name = 'table_name'
Then copy the results into a query window and use them for building the query.
I used SQL Server syntax for the query, because it looks like you are using SQL Server. Most databases support the INFORMATION_SCHEMA tables, but the syntax for string concatenation varies among databases. Remember to remove the final or at the end of the last comparison.
You can also copy the column list into Excel and use Excel formulas to create the list.
You can use something similar to the following:
declare #T table
(
ID int,
Name varchar(10),
Age int,
City varchar(10),
Zip varchar(10)
)
insert into #T values
(1, 'Alex', 32, 'Miami', NULL),
(2, NULL, 24, NULL, NULL)
;with xmlnamespaces('http://www.w3.org/2001/XMLSchema-instance' as ns)
select ID,
(
select *
from #T as T2
where T1.ID = T2.ID
for xml path('row'), elements xsinil, type
).value('count(/row/*[#ns:nil = "true"])', 'int') as NullCount
from #T as T1

Column conflicts with the type of other columns in the unpivot list

Im pivoting sys.[views] into key value pairs to compare with values on another server for consistency testing. Im running into an issue which returns the error.
Msg 8167, Level 16, State 1, Line 51
The type of column "type" conflicts with the type of other columns specified in the UNPIVOT list.
Query:
SELECT
sourceUnpivoted.idServer,
sourceUnpivoted.sourceServerName,
sourceUnpivoted.name,
sourceUnpivoted.columnName,
sourceUnpivoted.columnValue
FROM (
SELECT
CAST('1' AS VARCHAR(255)) AS idServer,
CAST('thisOne' AS VARCHAR(255)) AS sourceServerName,
CAST('theDatabase' AS VARCHAR(255)) AS sourceDatabaseName,
CAST(name AS VARCHAR(255)) AS name,
CAST(object_id AS VARCHAR(255)) AS object_id,
CAST(principal_id AS VARCHAR(255)) AS principal_id,
CAST(schema_id AS VARCHAR(255)) AS schema_id,
CAST(parent_object_id AS VARCHAR(255)) AS parent_object_id,
CAST(type AS VARCHAR(255)) AS type,
CAST(type_desc AS VARCHAR(255)) AS type_desc,
CAST(create_date AS VARCHAR(255)) AS create_date,
CAST(lock_escalation_desc AS VARCHAR(255)) AS lock_escalation_desc
...
FROM noc_test.dbo.stage_sysTables
) AS databaseTables
UNPIVOT (
columnValue FOR columnName IN (
object_id,
principal_id,
schema_id,
parent_object_id,
type,
type_desc,
create_date,
lock_escalation_desc
)
) AS sourceUnpivoted
Why does this not like [type],[type_desc],[lock_escalation_desc] ???
Ive also tried CONVERT(VARCHAR(255),type) AS type
It's actually a collation issue. I can resolve it by changing these lines:
CAST([type] collate database_default AS VARCHAR(255)) AS [type],
CAST(type_desc collate database_default AS VARCHAR(255)) AS type_desc,
CAST(create_date AS VARCHAR(255)) AS create_date,
CAST(lock_escalation_desc collate database_default AS VARCHAR(255)) AS lock_escalation_desc
The specific issue is that name is collated as Latin1_General_CI_AS, whereas the other 3 columns you mentioned are collated as Latin1_General_CI_AS_KS_WS (At least, on my machine, I'm not sure what it would be like on a server/database with different default collation).
This is one of the solution for this type error
1: create the this table
CREATE TABLE People
(
PersonId int,
Firstname varchar(50),
Lastname varchar(25)
)
2: Then insert
INSERT INTO People VALUES (1, 'Jim', 'Smith');
INSERT INTO People VALUES (2, 'Jane', 'Jones');
INSERT INTO People VALUES (3, 'Bob', 'Unicorn');
3: run this script you get the error
Msg 8167, Level 16, State 1, Line 3
The type of column "Lastname" conflicts with the type of other columns specified in the UNPIVOT list.
SELECT PersonId, ColumnName, Value
FROM People
unpivot(Value FOR ColumnName IN (FirstName, LastName)) unpiv;
4: the solution is you must use a subquery to first cast the Lastname column to have the same length as Firstname
SELECT PersonId, ColumnName, Value
FROM (
SELECT personid, firstname, cast(lastname AS VARCHAR(50)) lastname
FROM People
) d
unpivot(Value FOR ColumnName IN (FirstName, LastName)) unpiv;
Ran into this same error and I just made all the columns in the table of the same data type - I had a mix of int, varchar, nvarchar of various lengths. Once I converted all the columns in my table to the same type - nvarchar(255) it worked perfectly.
The PIVOT/UNPIVOT clause is sensitive to the ANSI Padding Status of the column (right-click -> properties in SSMS) as well as the type, size and collation. Try specifying SET ANSI_PADDING ON|OFF in the session before adding or recreating the column in question so it matches the others in the PIVOT/UNPIVOT clause.
I had the same issue. Fixed it by right clicking on the column header and selecting change type "using locale". See attached screen shot
1: