Convert SQL table from Long Data to Wide Data

Convert SQL table from Long Data to Wide Data - sql

I have an SQL table that contains survey data. Unfortunately, the survey questions are long data instead of wide data (I have a column that has all the question numbers and I need the question numbers to be their own columns).
I have:
question_id | question_format_id | response----- | format_type | question
---63-------|------8-------------|--synchronous--|--likert-----|How will you attend class
---64-------|------3-------------|--COVID concern|short answer-|Reason for selection
I want:
63 ----------------------| 64
8------------------------| 3
synchornous--------------| COVID concern
likert modality----------| short answer
How will you attend class| Reason for selection
Is that possible?
The reason is that I'm trying to do reporting on students that select a certain answer for question 63. However, as soon as I put a filter on the responses for the questions, I no longer see both questions.
Any suggestions would be appreciated.

Firstly the UNPIVOT clause , and then PIVOT clause might be applied such as
SELECT [1] AS col1, [2] AS col2
FROM
(
SELECT *
FROM ( SELECT CAST(question_id AS VARCHAR(MAX)) AS question_id,
CAST(question_format_id AS VARCHAR(MAX)) AS question_format_id,
response,format_type,question,
ROW_NUMBER() OVER (ORDER BY question_id) AS rn
FROM [tab] ) t
UNPIVOT
(val FOR col IN
(question_id,question_format_id,response,format_type,question)
) AS upt
) ut
PIVOT
(
MAX(val) FOR rn IN ([1],[2])
) AS pt
in order to transpose the values in a vertical manner.
Demo

Related

Can I use string_split with enforcing combination of labels?

So I have the following table:
Id Name Label
---------------------------------------
1 FirstTicket bike|motorbike
2 SecondTicket bike
3 ThirdTicket e-bike|motorbike
4 FourthTicket car|truck
I want to use string_split function to identify rows that have both bike and motorbike labels.
So the desired output in my example will be just the first row:
Id Name Label
--------------------------------------
1 FirstTicket bike|motorbike
Currently, I am using the following query but it is returning row 1,2 and 3. I only want the first. Is it possible?
SELECT Id, Name, Label FROM tickets
WHERE EXISTS (
SELECT * FROM STRING_SPLIT(Label, '|')
WHERE value IN ('bike', 'motorbike')
)

You can use APPLY & do aggregation :
SELECT t.id, t.FirstTicket, t.Label
FROM tickets t CROSS APPLY
STRING_SPLIT(t.Label, '|') t1
WHERE t1.value IN ('bike', 'motorbike')
GROUP BY t.id, t.FirstTicket, t.Label
HAVING COUNT(DISTINCT t1.value) = 2;
However, this breaks the normalization rules you should have separate table tickets.

You could just use string functions for this:
select t.*
from mytable t
where
'|' + label + '|' like '%|bike|%'
and '|' + label + '|' like '%|motorbike|%'
I would expect this to be more efficient than other methods that split and aggregate.
Please note, however, that you should really consider fixing your data model. Instead of storing delimited lists, you should have a separated table to represent the relation between tickets and labels, with one row per ticket/label tuple. Storing delimited lists in database column is a well-know SQL antipattern, that should be avoided at all cost (hard to maintain, hard to query, hard to enforce data integrity, inefficicent, ...). You can have a look at this famous SO post for more on this topic.

Yogesh beat me to it; my solution is similar but with a HUGE performance improvement worth pointing out. We'll start with this sample data:
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#tickets','U') IS NOT NULL DROP TABLE #tickets;
CREATE TABLE #tickets (Id INT, [Name] VARCHAR(50), Label VARCHAR(1000));
INSERT #tickets (Id, [Name], Label)
VALUES
(1,'FirstTicket' , 'bike|motorbike'),
(2,'SecondTicket', 'bike'),
(3,'ThirdTicket' , 'e-bike|motorbike'),
(4,'FourthTicket', 'car|truck'),
(5,'FifthTicket', 'motorbike|bike');
Now the original and much improved version:
-- Original
SELECT t.id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY STRING_SPLIT(t.Label, '|') t1
WHERE t1.[value] IN ('bike', 'motorbike')
GROUP BY t.id, t.[Name], t.Label
HAVING COUNT(DISTINCT t1.[value]) = 2;
-- Improved Version Leveraging APPLY to avoid a sort
SELECT t.Id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY
(
SELECT 1
FROM STRING_SPLIT(t.Label,'|') AS split
WHERE split.[value] IN ('bike','motorbike')
HAVING COUNT(*) = 2
) AS isMatch(TF);
Now the execution plans:
If you compare the costs: the "sortless" version is query 4.36 times faster than the original. In reality it's more because, with the first version, we're not just sorting, we are sorting three columns - an int and two (n)varchars. Because sorting costs are N * LOG(N), the original query gets exponentially slower the more rows you throw at it.

SQL Server - If the field had been pivot, how to pivot again by another field? Is that the DB design correct?

I have a raw data like
Title Question Answer AnswerRemark
----------------------------------------
ACCCode1 Q1 Y NULL
ACCCode1 Q2 N 6
ACCCode1 Q3 Y Workout
As you can see the field "AnswerRemark" is free text for "Answer", some answer is not require remark.
I can simply pivot the question and answer like:
Title Q1 Q2 Q3
AccessCode1 Y N Y
My desired Result will be
Title Q1 R1 Q2 R2 Q3 R3
AccessCode1 Y NULL N 6 Y Workout
Is that possible? I can not figure it out how to achieve this, pivot the Answer is not good idea as it have many combinations.
Any suggestion?

Use Conditional Aggregation :
SELECT Title,
MAX(CASE WHEN Question='Q1' THEN Answer END) as Q1 ,
MAX(CASE WHEN Question='Q1' THEN AnswerRemark END) as R1 ,
MAX(CASE WHEN Question='Q2' THEN Answer END) as Q2 ,
MAX(CASE WHEN Question='Q2' THEN AnswerRemark END) as R2 ,
MAX(CASE WHEN Question='Q3' THEN Answer END) as Q3 ,
MAX(CASE WHEN Question='Q3' THEN AnswerRemark END) as R3
FROM [tablename]
GROUP BY Title

Using Pivot we get the result
;With cte(Title, Question,Answer,AnswerRemark)
AS
(
SELECT 'ACCCode1','Q1','Y',NULL UNION ALL
SELECT 'ACCCode1','Q2','N','6' UNION ALL
SELECT 'ACCCode1','Q3','Y','Workout' UNION ALL
SELECT 'ACCCode1','Q2','N','7' UNION ALL
SELECT 'ACCCode1','Q1','Y',NULL UNION ALL
SELECT 'ACCCode1','Q3','N','9' UNION ALL
SELECT 'ACCCode1','Q1','N','4' UNION ALL
SELECT 'ACCCode1','Q2','N','Workout' UNION ALL
SELECT 'ACCCode1','Q4','N','2' UNION ALL
SELECT 'ACCCode1','Q3','Y','Workout' UNION ALL
SELECT 'ACCCode1','Q1','N','1' UNION ALL
SELECT 'ACCCode1','Q4','Y',NULL
)
SELECT *,'Remark'+CAST(ROW_NUMBER()OVER(ORDER BY (SELECT 1))AS varchar(10)) AS Question2
, ROW_NUMBER()OVER(PArtition by Question Order by Question ) AS Seq
INTO #t FROM cte
Using Dynamic Sql where the columns are not static
DECLARE #DyColumn1 Nvarchar(max),
#DyColumn2 Nvarchar(max),
#Sql Nvarchar(max),
#MAxDyColumn1 Nvarchar(max),
#MAxDyColumn2 Nvarchar(max),
#CombineColumn Nvarchar(max)
SELECT #DyColumn1=STUFF((SELECT DISTINCT ', '+QUOTENAME(Question) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #DyColumn2=STUFF((SELECT ', '+QUOTENAME(Question2) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #MAxDyColumn1=STUFF((SELECT DISTINCT ', '+'MAX('+QUOTENAME(Question)+') AS '+QUOTENAME(Question) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #MAxDyColumn2=STUFF((SELECT ', '+'MAX('+QUOTENAME(Question2)+') AS '+QUOTENAME(Question2) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #CombineColumn=STUFF((SELECT DISTINCT ', '+QUOTENAME(Question)+','+QUOTENAME(Question2) FROM #t FOR XML PATH ('')),1,1,'')
SET #Sql='SELECT Title,'+#CombineColumn+' From
(
SELECT Title,'+#MAxDyColumn1+','+#MAxDyColumn2+' FRom
(
SELECT * FROM #t
)AS SRC
PIVOT
(
MAX(Answer) FOR Question IN('+#DyColumn1+')
) AS Pvt1
PIVOT
(
MAX(AnswerRemark) FOR Question2 IN('+#DyColumn2+')
) AS Pvt2
GROUP BY Title
)dt
'
PRINT #Sql
EXEC(#Sql)
Result
Title Q1 Remark1 Q1 Remark2 Q1 Remark3 Q1 Remark4 Q2 Remark5 Q2 Remark6 Q2 Remark7 Q3 Remark8 Q3 Remark9 Q3 Remark10 Q4 Remark11 Q4 Remark12
ACCCode1 Y NULL Y 1 Y 4 Y NULL N 6 N Workout N 7 Y Workout Y Workout Y 9 Y NULL Y 2

I don't know how big your data is, or how many questions are possible. A more generic Q&A structure done at the presentation layer would be far better, but for your specific request a more correct design would be a 3NF table. This will allow you to create a primary key that is highly optimised and create a secondary index by question type id. All your keys are now IDs which are far faster to search and match than strings:
Account Codes
AccID - AccName - columns for other data related to accounts
Stores each account you have.
Questions
QuestionID - QuestionName
List of possible questions, one row for every question you have, Q1, Q2 etc. You could add question categories here to exploit any commonality you have, e.g. if you have different surveys with the same set of questions, you could put them in one category and easily then query the below.
Results
AccId, QuestionID, Result, Result Remark
Contains one row for every question asked.
Query for your result still uses pivot, but now you can select the list of columns to use from a variable or dynamic SQL syntax, which means you can control it somewhat better and hte query itself should be better.
With that said, if you have any knowledge about your data whatsoever you can use it to make a static query which can then be indexed. Examples are here of this query: SQL Server 2005 Pivot on Unknown Number of Columns. You can then set the column names if required using the AS syntax, which unfortunately would require dynamic sql again (Change column name while using PIVOT SQL Server 2008).
By the way, what you are trying to do is specifically dealing with denormalised data, which is what nosql is good for, SQL Server gives you great help but you have to have some structure to your data.
If you aren't working for survey monkey and dealing with millions of variations, I'd seriously look at whether you can just make a table specific to each round of questions you get, and then simply denormalise it and add an explicit columns for each question and then make your entire logic just a select * from surveyxyztable where accountid = abc.

Multi row to a row sql

I have a table as bellow:
I want query to print output as bellow:

Note: Please, do not downvote. I know the rules of posting answers, but for such of questions there's no chance to post short answer. I posted it only to provide help for those who want to find out how to achieve that, but does not expect ready-to-use solution.
I'd suggest to read these articles:
PIVOT on two or more fields in SQL Server
Pivoting on multiple columns - SQL Server
Pivot two or more columns in SQL Server 2005

At first UNPIVOT then PIVOT. If number of rows for each Pod_ID is not always equal 3 then you need to use dynamic SQL. The basic sample:
SELECT *
FROM (
SELECT Pod_ID,
Purs + CASE WHEN RN-1 = 0 THEN '' ELSE CAST(RN-1 as nvarchar(10)) END as Purs,
[Values]
FROM (
SELECT Pod_ID,
Pur_Qty, --All columns that will be UNPIVOTed must be same datatype
Pur_Price,
CAST(ETD_Date as int) ETD_Date, -- that is why I cast date to int
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as RN
FROM YourTable
) as p1
UNPIVOT (
[Values] FOR [Purs] IN(Pur_Qty, Pur_Price, ETD_Date)
) as unpvt
) as p2
PIVOT (
MAX([Values]) FOR Purs IN (Pur_Qty,Pur_Price,ETD_Date,Pur_Qty1,Pur_Price1,ETD_Date1,Pur_Qty2,Pur_Price2,ETD_Date2)
) as pvt
Will bring you:
Pod_ID Pur_Qty Pur_Price ETD_Date Pur_Qty1 Pur_Price1 ETD_Date1 Pur_Qty2 Pur_Price2 ETD_Date2
F8E2F614-75BC-4E46-B7F8-18C7FC4E5397 24 22 20160820 400 33 20160905 50 44 20160830

Sorting twice on same column

I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)

This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000

A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.

This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20

I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.

Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.

SQL Pivot Command

I am looking for some help on designing a simple pivot so that I can link it into other parts of my queries.
My data is like this
Items Table
Below is my table if I run Select * from items
ITEM Weight
12345 10
12345 11
654321 50
654321 20
654321 100
There are hundreds of Items in this table but each item code will only ever have
maximum of 3 weight records each.
I want the desired output
ITEM Weight_1 Weight_2 Weight_3
12345 10 11 null
654321 50 20 100
Would appreciate any suggestions,
I have played around with pivots but each subsequent item puts the weights into weight 4,5,6,7,etc
instead of starting at weight1 for each item.
Thanks
Update
Below is what I have used so far,
SELECT r.*
FROM (SELECT 'weight' + CAST(Row_number() OVER (ORDER BY regtime ASC)AS
VARCHAR(10))
line,
id,
weight
FROM items it) AS o PIVOT(MIN([weight]) FOR line IN (weight1, weight2,
weight3)) AS r

You were almost there! You were only missing the PARTITION BY clause in OVER:
SELECT r.*
FROM (SELECT 'weight' + CAST(Row_number() OVER (PARTITION BY id ORDER BY
regtime ASC)
AS
VARCHAR(10)) line,
id,
weight
FROM items it) AS o PIVOT(MIN([weight]) FOR line IN (weight1, weight2,
weight3)) AS r
When you PARTITION BY by ID, the row numbers are reset for each different ID.
Update
You do not need dynamic pivot, since you will always have 3 weights. But, if you ever need dynamic number of columns, take a look at some of the examples here:
SQL Server PIVOT perhaps?
Pivot data in T-SQL
How do I build a summary by joining to a single table with SQL Server?

You will need a value to form the columns which I do with row_number. The outcome is what you want. The only negative that I have against PIVOT is that you need to know how many columns in advance. I use a similar method, but build up the select as dynamic SQL and can then insert my columns.
EDIT: updated to show columns as weight1, weight2, etc.
create table #temp (Item int, Weight int)
insert into #temp (Item, Weight)
Values (12345, 10),
(12345, 11),
(654321, 50),
(654321, 20),
(654321, 200)
SELECT *
FROM (SELECT Item,
Weight,
'weight' + cast(Row_number()
OVER (partition by Item order by item) as varchar(10)) as seq
FROM #temp) as Src
PIVOT ( MAX(Weight) FOR Seq IN ([Weight1], [Weight2], [Weight3]) ) as PVT

MySQL
Whenever you need a pivot, use group_concat it will output a CSV list of the values you need.
Once you get used to working with it, it's a great tool.
SELECT item, GROUP_CONCAT(weight) as weights FROM table1
GROUP BY item
See: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
TSQL aka SQL-server
Many many questions on this because T-SQL supports a pivot keyword.
See:
Transact SQL Query-Pivot-SQL
Pivot data in T-SQL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas