Transposing an sql result so that one column goes onto multiple columns - sql

I'm trying to get data out of a table for a survey in a particular format. However all my attempts seems to hand the DB because of too many joins/too heavy on the DB.
My data looks like this:
id, user, question_id, answer_id,
1, 1, 1, 1
3, 1, 3, 15
4, 2, 1, 2
5, 2, 2, 12
6, 2, 3, 20
There are roughly 250,000 rows and each user has about 30 rows. I want the result to look like:
user0, q1, q2, q3
1, 1, NULL, 15
2, 2, 12, 20
So that each user has one row in the result, each with a separate column for each answer.
I'm using Postgres but answers in any SQL language would be appreciated as I could translate to Postgres.
EDIT: I also need to be able to deal with users not answering questions, i.e. in the example above q2 for user 1.

Consider the following demo:
CREATE TEMP TABLE qa (id int, usr int, question_id int, answer_id int);
INSERT INTO qa VALUES
(1,1,1,1)
,(2,1,2,9)
,(3,1,3,15)
,(4,2,1,2)
,(5,2,2,12)
,(6,2,3,20);
SELECT *
FROM crosstab('
SELECT usr::text
,question_id
,answer_id
FROM qa
ORDER BY 1,2')
AS ct (
usr text
,q1 int
,q2 int
,q3 int);
Result:
usr | q1 | q2 | q3
-----+----+----+----
1 | 1 | 9 | 15
2 | 2 | 12 | 20
(2 rows)
user is a reserved word. Don't use it as column name! I renamed it to usr.
You need to install the additional module tablefunc which provides the function crosstab(). Note that this operation is strictly per database.
In PostgreSQL 9.1 you can simply:
CREATE EXTENSION tablefunc;
For older version you would execute a shell-script supplied in your contrib directory. In Debian, for PostgreSQL 8.4, that would be:
psql mydb -f /usr/share/postgresql/8.4/contrib/tablefunc.sql

Erwins answer is good, until missing answer for a user shows up. I'm going to make an assumption on you....you have a users table that has one row per user and you have a questions table that has one row per questions.
select usr, question_id
from users u inner join questions q on 1=1
order by 1,
This statement will create a row for every user/question, and be in the same order. Turn it into a subquery and left join it to your data...
select usr,question_id,qa.answer_id
from
(select usr, question_id
from users u inner join questions q on 1=1
)a
left join qa on qa.usr = a.usr and qa.question_id = a.usr
order by 1,2
Plug that into Erwins crosstab statement and give him credit for the answer :P

I implemented a truly dynamic function to handle this problem without having to hard code any specific number of questions or use external modules/extensions. It also much simpler to use than crosstab().
You can find it here: https://github.com/jumpstarter-io/colpivot
Example that solves this particular problem:
begin;
create temp table qa (id int, usr int, question_id int, answer_id int);
insert into qa values
(1,1,1,1)
,(2,1,2,9)
,(3,1,3,15)
,(4,2,1,2)
,(5,2,2,12)
,(6,2,3,20);
select colpivot('_output', $$
select usr, ('q' || question_id::text) question_id, answer_id from qa
$$, array['usr'], array['question_id'], '#.answer_id', null);
select * from _output;
rollback;
Result:
usr | 'q1' | 'q2' | 'q3'
-----+------+------+------
1 | 1 | 9 | 15
2 | 2 | 12 | 20
(2 rows)

Related

Get Ids from constant list for which there are no rows in corresponding table

Let say I have a table Vehicles(Id, Name) with below values:
1 Car
2 Bike
3 Bus
and a constant list of Ids:
1, 2, 3, 4, 5
I want to write a query returning Ids from above list for which there are no rows in Vehicles table. In the above example it should return:
4, 5
But when I add new row to Vehicles table:
4 Plane
It should return only:
5
And similarly, when from the first version of Vehicle table I remove the third row (3, Bus) my query should return:
3, 4, 5
I tried with exist operator but it doesn't provide me correct results:
select top v.Id from Vehicle v where Not Exists ( select v2.Id from Vehicle v2 where v.id = v2.id and v2.id in ( 1, 2, 3, 4, 5 ))
You need to treat your "list" as a dataset, and then use the EXISTS:
SELECT V.I
FROM (VALUES(1),(2),(3),(4),(5))V(I) --Presumably this would be a table (type parameter),
--or a delimited string split into rows
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable YT
WHERE YT.YourColumn = V.I);
Please try the following solution.
It is using EXCEPT set operator.
Set Operators - EXCEPT and INTERSECT (Transact-SQL)
SQL
-- DDL and sample data population, start
DECLARE #Vehicles TABLE (ID INT PRIMARY KEY, vehicleType VARCHAR(30));
INSERT INTO #Vehicles (ID, vehicleType) VALUES
(1, 'Car'),
(2, 'Bike'),
(3, 'Bus');
-- DDL and sample data population, end
DECLARE #vehicleList VARCHAR(20) = '1, 2, 3, 4, 5'
, #separator CHAR(1) = ',';
SELECT TRIM(value) AS missingID
FROM STRING_SPLIT(#vehicleList, #separator)
EXCEPT
SELECT ID FROM #Vehicles;
Output
+-----------+
| missingID |
+-----------+
| 4 |
| 5 |
+-----------+
In SQL we store our values in tables. We therefore store your list in a table.
It is then simple to work with it and we can easily find the information wanted.
I fully agree that it is possible to use other functions to solve the problem. It is more intelligent to implement database design to use basic SQL. It will run faster, be easier to maintain and will scale for a table of a million rows without any problems. When we add the 4th mode of transport we don't have to modify anything else.
CREATE TABLE vehicules(
id int, name varchar(25));
INSERT INTO vehicules VALUES
(1 ,'Car'),
(2 ,'Bike'),
(3 ,'Bus');
CREATE TABLE ids (iid int)
INSERT INTO ids VALUES
(1),(2),(3),(4),(5);
CREATE VIEW unknownIds AS
SELECT iid unknown_id FROM ids
LEFT JOIN vehicules
ON iid = id
WHERE id IS NULL;
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 4 |
| 5 |
INSERT INTO vehicules VALUES (4,'Plane')
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 5 |
db<>fiddle here

SQL - List all pages in between record while maintaining ID key

I'm trying to come up with a useful way to list all pages in between the first of last page of a document into new rows while maintaining the ID number as a key, or cross reference. I have a few ways of getting pages in between, but I'm not exactly sure how to maintain the key in a programmatic way.
Example Input:
First Page Last Page ID
ABC_001 ABC_004 1
ABC_005 ABC_005 2
ABC_006 ABC_010 3
End Result:
All Pages ID
ABC_001 1
ABC_002 1
ABC_003 1
ABC_004 1
ABC_005 2
ABC_006 3
ABC_007 3
ABC_008 3
ABC_009 3
ABC_010 3
Any help is much appreciated. I'm using SQL mgmt studio.
One approach would be to set up a numbers table, that contains a list of numbers that you may possibly find in the column content:
CREATE TABLE numbers( idx INTEGER);
INSERT INTO numbers VALUES(1);
INSERT INTO numbers VALUES(2);
...
INSERT INTO numbers VALUES(10);
Now, assuming that all page values have 7 characters, with the last 3 being digits, we can JOIN the original table with the numbers table to generate the missing records:
SELECT
CONCAT(
SUBSTRING(t.First_Page, 1, 4),
REPLICATE('0', 3 - LEN(n.idx)),
n.idx
) AS [ALl Pages],
t.id
FROM
mytable t
INNER JOIN numbers n
ON n.idx >= CAST(SUBSTRING(t.First_Page, 5, 3) AS int)
AND n.idx <= CAST(SUBSTRING(t.Last_Page, 5, 3) AS int)
This demo on DB Fiddle with your sample data returns:
ALl Pages | id
:-------- | -:
ABC_001 | 1
ABC_002 | 1
ABC_003 | 1
ABC_004 | 1
ABC_005 | 2
ABC_006 | 3
ABC_007 | 3
ABC_008 | 3
ABC_009 | 3
ABC_010 | 3
To find all pages from First Page to Last Page per Book ID, CAST your page numbers from STRING to INTEGER, then add +1 to each page number until you reach the Last Page.
First, turn your original table into a table variable with the Integer data types using a TRY_CAST.
DECLARE #Book TABLE (
[ID] INT
,[FirstPage] INT
,[LastPage] INT
)
INSERT INTO #Book
SELECT [ID]
,TRY_CAST(RIGHT([FirstPage], 3) AS int) AS [FirstPage]
,TRY_CAST(RIGHT([LastPage], 3) AS int) AS [LastPage]
FROM [YourOriginalTable]
Set the maximum page that your pages will increment to using a variable. This will cap out your results to the correct number of pages. Otherwise your table would have many more rows than you need.
DECLARE #LastPage INT
SELECT #LastPage = MAX([LastPage]) FROM #Book
Turning a three-column table (ID, First Page, Last Page) into a two-column table (ID, Page) will require an UNPIVOT.
We're tucking that UNPIVOT into a CTE (Common Table Expression: basically a smart version of a temporary table (like a #TempTable or #TableVariable, but which you can only use once, and is a little more efficient in certain circumstances).
In addition to the UNPIVOT of your [First Name] and [Last Name] columns into a tall table, we're going to append every other combination of page number per ID using a UNION ALL.
;WITH BookCTE AS (
SELECT [ID]
,[Page]
FROM (SELECT [ID]
,[FirstPage]
,[LastPage]
FROM #Book) AS bp
UNPIVOT
(
[Page] FOR [Pages] IN ([FirstPage], [LastPage])
) AS up
UNION ALL
SELECT [ID], [Page] + 1 FROM BookCTE WHERE [Page] + 1 < #LastPage
)
Now that your data is held in a table format using a CTE with all combinations of [ID] and [Page] up to the maximum page in your #Book table, it's time to join your CTE with the #Book table.
SELECT DISTINCT
cte.ID
,cte.Page
FROM BookCTE AS cte
INNER JOIN #Book AS bk
ON bk.ID = cte.ID
WHERE cte.Page <= bk.[LastPage]
ORDER BY
cte.ID
,cte.Page
OPTION (MAXRECURSION 10000)
See also:
How to generate a range of numbers between two numbers (I based my code off of #Jayvee's answer)
Assigning variables using SET vs SELECT
SQL Server UNPIVOT
SQL Server CTE Basics
Recursive CTEs Explained
Note: will update with re-integrating string portion of FirstPage and LastPage (which I assume is based on book title). Stand by.

Get values from 2 tables based on ID

Consider below tables
Job Table
JobID AnswerID UserID
1 1,2 1
2 2,3 2
3 1,3 3
Answer Table
AnswerID Answer QuestionID
1 Clean 1
2 Install 1
3 Other 2
For this I need to get the result as below
JobID Answer UserID
1 Clean,Install 1
2 Install,Other 2
3 Clean,Other 3
Please help to write MSSQL query for this.
You are storing a list of ids as a comma separated list. This is a really bad idea for several reasons:
Storing numbers as strings is a bad idea.
You cannot define foreign key relationships.
SQL does not have great support for strings.
Any attempt to join to the original table will be inefficient, because of the type conversion.
Such a structure violates the idea that a column contains a single value.
There is a proper way to store lists in a relational database. It is called a "table". You want a junction table with one row per job and answer. I would call it JobAnswers.
With the proper data structure, your query would be trivial.
Although I agree with Gordon Linoff I do understand we have no control over what we inherit from previous developers.
here is what you require to do:
Sample data
CREATE TABLE #temp
(
JobID INT, AnswerID VARCHAR(10), UserID INT
);
INSERT INTO #temp
VALUES
(1, '1,2', 1
),
(2, '2,3', 2
),
(3, '1,3', 3
);
CREATE TABLE #temp2
(
AnswerID INT, Answer VARCHAR(10), QuestionID INT
);
INSERT INTO #temp2
VALUES
(1, 'Clean', 1
),
(2, 'Install', 1
),
(3, 'Other', 2
);
Query:
SELECT #temp.JobID,
(
SELECT #temp2.Answer
FROM #temp2
WHERE #temp2.AnswerID = SUBSTRING(#temp.AnswerID, 1, CHARINDEX(',', #temp.AnswerID)-1)
)+','+
(
SELECT #temp2.Answer
FROM #temp2
WHERE #temp2.AnswerID = SUBSTRING(#temp.AnswerID, CHARINDEX(',', #temp.AnswerID)+1, LEN(#temp.AnswerID)-1)
) AS Answer,
#temp.UserID
FROM #temp;
Result:
You can try to use subqueries like this:
SELECT job.jobID,
(SELECT answer.answer FROM answer WHERE answer.answerID IN (job.answerID)) as answers,
job.userID

Need to find average value across multi-level nested SQL query in Oracle

This one's a bit of a mess, and there's probably some far superior way of doing this but we just need the information for some reports we're working on.
So, we have a bunch of projects; each project has a bunch of tasks and each task has a document type ID associated with it. A project can belong to one or more workgroups.
We want to analyze projects that have at least one task of doc type x, and then see how many workgroups it has. I can do that with:
select distinct T.PROJECTID,
(select COUNT(*) from TPM_PROJECTWORKGROUPS where PROJECTID=T.PROJECTID) as NumWorkgroups
from TPM_TASK T
where T.DOCUMENTTYPEID=17
Now, we want to see the average number of workgroups across these projects. So I can do:
select AVG(NumWorkgroups) FROM (
select distinct T.PROJECTID,
(select COUNT(*) from TPM_PROJECTWORKGROUPS where PROJECTID=T.PROJECTID) as NumWorkgroups
from TPM_TASK T
where T.DOCUMENTTYPEID=17
)
However, we want to run this same query across all the document types (there's about 200 of them). I can't find a way to do this without copying and pasting the query 200 times. I've tried:
select DOCUMENTTYPEID,
(select AVG(NumWorkgroups) FROM (
select distinct T.PROJECTID,
(select COUNT(*) from TPM_PROJECTWORKGROUPS where PROJECTID=T.PROJECTID) as NumWorkgroups
from TPM_TASK T
where T.DOCUMENTTYPEID=DT.DOCUMENTTYPEID
))
from TPM_DOCUMENTTYPE DT
However, I get the error:
ORA-00904: "TPM_DOCUMENTTYPE"."DOCUMENTTYPEID": invalid identifier
I believe because DT is out of scope more than one level down in a nested query. Is there a better way to do this query?
Update for Justin:
Here's a sample schema:
create table Test_Projects (
id number primary key
)
create table Test_Tasks (
id number primary key,
project number,
doctype number
)
create table Test_Workgroups (
id number primary key,
workgroup number,
project number
)
With some sample data:
insert into Test_Projects VALUES (1) --Create projects 1 and 2
insert into Test_Projects VALUES (2)
insert into Test_Tasks VALUES (1, 1, 5) --Project 1 has two tasks, doc types 5 and 6
insert into Test_Tasks VALUES (2, 1, 6)
insert into Test_Tasks VALUES (3, 2, 6) --Project 2 has one task, doc type 6
insert into Test_Workgroups VALUES (1, 1, 1) --Project 1 belongs to workgroups 1 and 2
insert into Test_Workgroups VALUES (2, 2, 1)
insert into Test_Workgroups VALUES (3, 2, 2) --Project 2 belongs to workgroup 2
We need to know the average number of workgroups that a project with a task of type x belongs to.
For example, doc type 5 has only project 1 which has 2 workgroups, so the average is 2. Doc type 6 has 2 projects (1 and 2) - 1 has 2 workgroups and 2 has one workgroup - so the average is 1.5.
We need to list all doc types and the average number of workgroups in each.
I'd expect this query to return:
DOCTYPE AverageWorkgroups
------- -----------------
5 2
6 1.5
Thanks for the sample data. That makes it much clearer.
I believe this does what you want (I'm including the calculations for the number of projects and the number of workgroups in the output as well just because that made my testing easier)
SQL> ed
Wrote file afiedt.buf
1 select t.doctype,
2 count(distinct p.id) numProjects,
3 count(*) numWorkgroups,
4 count(*)/ count( distinct p.id) avgNumWorkgroups
5 from test_projects p,
6 test_tasks t,
7 test_workgroups w
8 where p.id = t.project
9 and p.id = w.project
10* group by t.doctype
SQL> /
DOCTYPE NUMPROJECTS NUMWORKGROUPS AVGNUMWORKGROUPS
---------- ----------- ------------- ----------------
6 2 3 1.5
5 1 2 2

SQL query assistance with bridge table

I'm working with a existing database and trying to write a sql query to get out all the account information including permission levels. This is for a security audit. We want to dump all of this information out in a readible fashion to make it easy to compare. My problem is that there is a bridge/link table for the permissions so there are multiple records per user. I want to get back results with all the permission for one user on one line. Here is an example:
Table_User:
UserId UserName
1 John
2 Joe
3 James
Table_UserPermissions:
UserId PermissionId Rights
1 10 1
1 11 2
1 12 3
2 11 2
2 12 3
3 10 2
PermissionID links to a table with the name of the Permission and what it does. Right is like 1 = view, 2 = modify, and etc.
What I get back from a basic query for User 1 is:
UserId UserName PermissionId Rights
1 John 10 1
1 John 11 2
1 John 12 3
What I would like something like this:
UserId UserName Permission1 Rights1 Permission2 Right2 Permission3 Right3
1 John 10 1 11 2 12 3
Ideally I would like this for all users.
The closest thing I've found is the Pivot function in SQL Server 2005.
Link
The problem with this from what I can tell is that I need to name each column for each user and I'm not sure how to get the rights level. With real data I have about 130 users and 40 different permissions.
Is there another way with just sql that I can do this?
You could do something like this:
select userid, username
, max(case when permissionid=10 then rights end) as permission10_rights
, max(case when permissionid=11 then rights end) as permission11_rights
, max(case when permissionid=12 then rights end) as permission12_rights
from userpermissions
group by userid, username;
You have to explicitly add a similar max(...) column for each permissionid.
If you where using MySQL I would suggest you use group_concat() like below.
select UserId, UserName,
group_concat(PermissionId) as PermIdList,
group_concat(Rights SEPARATOR ',') as RightsList
from Table_user join Table_UserPermissions on
Table_User.UserId = Table_UserPermissions.UserId=
GROUP BY Table_User.UserId
This would return
UserId UserName PermIdList RightsList
1 John 10,11,12 1,2,3
A quick google search for 'mssql group_concat' revealed a couple different stored procedures (I), (II) for MSSQL that can achieve the same behavior.
Short answer:
No.
You can't dynamically add columns in to your query.
Remember, SQL is a set based language. You query sets and join sets together.
What you're digging out is a recursive list and requiring that the list be strung together horizontally rather then vertically.
You can, sorta, fake it, with a set of self joins, but in order to do that, you have to know all possible permissions before you write the query...which is what the other suggestions have proposed.
You can also pull the recordset back into a different language and then iterate through that to generate the proper columns.
Something like:
SELECT Table_User.userID, userName, permissionid, rights
FROM Table_User
LEFT JOIN Table_UserPermissions ON Table_User.userID =Table_UserPermissions.userID
ORDER BY userName
And then display all the permissions for each user using something like (Python):
userID = recordset[0][0]
userName = recordset[0][1]
for row in recordset:
if userID != row[0]:
printUserPermissions(username, user_permissions)
user_permissions = []
username = row[1]
userID = row[0]
user_permissions.append((row[2], row[3]))
printUserPermissions(username, user_permissions)
You could create a temporary table_flatuserpermissions of:
UserID
PermissionID1
Rights1
PermissionID2
Rights2
...etc to as many permission/right combinations as you need
Insert records to this table from Table_user with all permission & rights fields null.
Update records on this table from table_userpermissions - first record insert and set PermissionID1 & Rights1, Second record for a user update PermissionsID2 & Rights2, etc.
Then you query this table to generate your report.
Personally, I'd just stick with the UserId, UserName, PermissionID, Rights columns you have now.
Maybe substitute in some text for PermissionID and Rights instead of the numeric values.
Maybe sort the table by PermissionID, User instead of User, PermissionID so the auditor could check the users on each permission type.
If it's acceptable, a strategy I've used, both for designing and/or implementation, is to dump the query unpivoted into either Excel or Access. Both have much friendlier UIs for pivoting data, and a lot more people are comfortable in that environment.
Once you have a design you like, then it's easier to think about how to duplicate it in TSQL.
It seems like the pivot function was designed for situations where you can use an aggregate function on one of the fields. Like if I wanted to know how much revenue each sales person made for company x. I could sum up the price field from a sales table. I would then get the sales person and how much revenue in sales they have. For the permissions though it doesn't make sense to sum/count/etc up the permissionId field or the Rights field.
You may want to look at the following example on creating cross-tab queries in SQL:
http://www.databasejournal.com/features/mssql/article.php/3521101/Cross-Tab-reports-in-SQL-Server-2005.htm
It looks like there are new operations that were included as part of SQL Server 2005 called PIVOT and UNPIVOT
For this type of data transformation you will need to perform both an UNPIVOT and then a PIVOT of the data. If you know the values that you want to transform, then you can hard-code the query using a static pivot, otherwise you can use dynamic sql.
Create tables:
CREATE TABLE Table_User
([UserId] int, [UserName] varchar(5))
;
INSERT INTO Table_User
([UserId], [UserName])
VALUES
(1, 'John'),
(2, 'Joe'),
(3, 'James')
;
CREATE TABLE Table_UserPermissions
([UserId] int, [PermissionId] int, [Rights] int)
;
INSERT INTO Table_UserPermissions
([UserId], [PermissionId], [Rights])
VALUES
(1, 10, 1),
(1, 11, 2),
(1, 12, 3),
(2, 11, 2),
(2, 12, 3),
(3, 10, 2)
;
Static PIVOT:
select *
from
(
select userid,
username,
value,
col + '_'+ cast(rn as varchar(10)) col
from
(
select u.userid,
u.username,
p.permissionid,
p.rights,
row_number() over(partition by u.userid
order by p.permissionid, p.rights) rn
from table_user u
left join Table_UserPermissions p
on u.userid = p.userid
) src
unpivot
(
value
for col in (permissionid, rights)
) unpiv
) src
pivot
(
max(value)
for col in (permissionid_1, rights_1,
permissionid_2, rights_2,
permissionid_3, rights_3)
) piv
order by userid
See SQL Fiddle with Demo
Dynamic PIVOT:
If you have an unknown number of permissionids and rights, then you can use dynamic sql:
DECLARE
#query AS NVARCHAR(MAX),
#colsPivot as NVARCHAR(MAX)
select #colsPivot = STUFF((SELECT ','
+ quotename(c.name +'_'+ cast(t.rn as varchar(10)))
from
(
select row_number() over(partition by u.userid
order by p.permissionid, p.rights) rn
from table_user u
left join Table_UserPermissions p
on u.userid = p.userid
) t
cross apply sys.columns as C
where C.object_id = object_id('Table_UserPermissions') and
C.name not in ('UserId')
group by c.name, t.rn
order by t.rn
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query
= 'select *
from
(
select userid,
username,
value,
col + ''_''+ cast(rn as varchar(10)) col
from
(
select u.userid,
u.username,
p.permissionid,
p.rights,
row_number() over(partition by u.userid
order by p.permissionid, p.rights) rn
from table_user u
left join Table_UserPermissions p
on u.userid = p.userid
) src
unpivot
(
value
for col in (permissionid, rights)
) unpiv
) x1
pivot
(
max(value)
for col in ('+ #colspivot +')
) p
order by userid'
exec(#query)
See SQL Fiddle with demo
The result for both is:
| USERID | USERNAME | PERMISSIONID_1 | RIGHTS_1 | PERMISSIONID_2 | RIGHTS_2 | PERMISSIONID_3 | RIGHTS_3 |
---------------------------------------------------------------------------------------------------------
| 1 | John | 10 | 1 | 11 | 2 | 12 | 3 |
| 2 | Joe | 11 | 2 | 12 | 3 | (null) | (null) |
| 3 | James | 10 | 2 | (null) | (null) | (null) | (null) |