Rearrange Dataset - sql

I am working on some survey data and was wondering if i could rearrange the data to make it a lot more usable. The results are classified as 1-5 and I would like the preferred table to count the results by value and group by question.
original table:
year | month | customer_id | survey | q1 | q2 | q3 | q4 | q5 | q6 ----> q29
-----|-------|-------------|--------|----|----|----|----|----|---
2016 | Oct | ABC12345678 | 1 | 1 | 2 | 3 | 1 | 2 | 3
2016 | Oct | DEF12345678 | 1 | 2 | 1 | 4 | 2 | 1 | 1
2016 | Oct | GHI12345678 | 1 | 4 | 2 | 1 | 1 | 3 | 2
2016 | Oct | JKL12345678 | 1 | 2 | 3 | 2 | 4 | 1 | 3
2016 | Oct | MNO12345678 | 1 | 5 | 2 | 3 | 1 | 2 | 3
2016 | Oct | PQR12345678 | 1 | 3 | 4 | 4 | 2 | 4 | 4
2016 | Oct | STU12345678 | 1 | 1 | 5 | 3 | 1 | 2 | 5
2016 | Oct | VWX12345678 | 1 | 2 | 2 | 4 | 2 | 1 | 1
Preferred Table:
Year | Month | Survey | Question | 1 | 2 | 3 | 4 | 5 |
-----|-------|--------|----------|----|----|----|----|----|
2016 | Oct | 1 | q1 | 80 | 45 | 25 | 63 | 89 |
2016 | Oct | 1 | q2 | 65 | 75 | 35 | 53 | 69 |
I can do this with a basic select query but to do it for every question will end up with 29 unions and there must be a quicker way.
Regards,
Neil

This is what I would use until someone posts a better solution:
<!-- language: lang-sql -->
use tempdb;
create table #tempsurvey (year int, month varchar(32), customer_id varchar(32), survey int, [q1] int, [q2] int, [q3] int, [q4] int, [q5] int, [q6] int, [q7] int, [q8] int, [q9] int, [q10] int, [q11] int, [q12] int, [q13] int, [q14] int, [q15] int, [q16] int, [q17] int, [q18] int, [q19] int, [q20] int, [q21] int, [q22] int, [q23] int, [q24] int, [q25] int, [q26] int, [q27] int, [q28] int, [q29] int);
insert into #tempsurvey values (2016,'Oct', 'ABC12345678', 1, 1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2);
insert into #tempsurvey values (2016,'Oct', 'DEF12345678', 1, 4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5);
with cte as (
select t.[year], t.[month], t.customer_id, t.survey, x.question, x.answer
from #tempsurvey t
cross apply (values ('q1',q1) ,('q2',q2) ,('q3',q3) ,('q4',q4) ,('q5',q5) ,('q6',q6) ,('q7',q7) ,('q8',q8) ,('q9',q9) ,('q10',q10) ,('q11',q11) ,('q12',q12) ,('q13',q13) ,('q14',q14) ,('q15',q15) ,('q16',q16) ,('q17',q17) ,('q18',q18) ,('q19',q19) ,('q20',q20) ,('q21',q21) ,('q22',q22) ,('q23',q23) ,('q24',q24) ,('q25',q25) ,('q26',q26) ,('q27',q27) ,('q28',q28) ,('q29',q29))
as x (Question,Answer)
)
select [year], [month], [survey], question, [1]=sum(case when answer=1 then 1 else 0 end), [2]=sum(case when answer=2 then 1 else 0 end), [3]=sum(case when answer=3 then 1 else 0 end), [4]=sum(case when answer=4 then 1 else 0 end), [5]=sum(case when answer=5 then 1 else 0 end)
from cte
group by [year], [month], [survey], question;
drop table #tempsurvey;
Brad Schulz on cross apply: http://bradsruminations.blogspot.com/search/label/CROSS%20APPLY

Sean is correct.
It will go like this:
with subquery as (
select year, month, survey, question, tempVal from #table
unpivot
(tempVal for question in (q1, q2, q3, q4, q5, q6, q7, ..., q29)) as up
)
select year, month, survey, question,
sum(case when tempVal = 1 then 1 else 0 end) as a1,
sum(case when tempVal = 2 then 1 else 0 end) as a2,
sum(case when tempVal = 3 then 1 else 0 end) as a3,
sum(case when tempVal = 4 then 1 else 0 end) as a4,
sum(case when tempVal = 5 then 1 else 0 end) as a5
from subquery
group by year, month, survey, question

Related

Generate multiple record from existing records based on interval columns [from and to]

I have 2 types of score [M,B] in column 3, if a type is M, then the score is either an S[scored] or SB[bonus scored] in column 6. Every interval [from_hrs - to_hrs] for a type B must have a corresponding SB for type M, thus, an interval for a type B cannot have a score of S for a type M. I have several records that were unfortunately captured as seen in the table below.
CREATE TABLE SCORE_TBL
(
ID int IDENTITY(1,1) PRIMARY KEY,
PERSONID_FK int NOT NULL,
S_TYPE varchar(50) NULL,
FROM_HRS int NULL,
TO_HRS int NULL,
SCORE varchar(50) NULL,
);
INSERT INTO SCORE_TBL(PERSONID_FK,S_TYPE,FROM_HRS,TO_HRS,SCORE)
VALUES
(1, 'M' , 0,20, 'S'),
(1, 'B',6, 8, 'B'),
(2, 'B',0, 2, 'B'),
(2, 'M',0,20, 'S'),
(2, 'B', 10,13, 'B'),
(2, 'B', 18,20, 'B'),
(2, 'M', 13,18, 'S');
| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
| 1 | 1 | M | 0 | 20 | S |
| 2 | 1 | B | 6 | 8 | B |
| 3 | 2 | B | 0 | 2 | B |
| 4 | 2 | M | 0 | 20 | S |
| 5 | 2 | B | 10 | 13 | B |
| 6 | 2 | B | 18 | 20 | B |
| 7 | 2 | M | 13 | 18 | S |
I want the data to look like this
| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
| 1 | 1 | M | 0 | 6 | S |
| 2 | 1 | M | 6 | 8 | SB |
| 3 | 1 | B | 6 | 8 | B |
| 4 | 1 | M | 8 | 20 | S |
| 5 | 2 | B | 0 | 2 | B |
| 6 | 2 | M | 0 | 2 | SB |
| 7 | 2 | M | 2 | 10 | S |
| 8 | 2 | B | 10 | 13 | B |
| 9 | 2 | M | 10 | 13 | SB |
| 10 | 2 | M | 13 | 18 | S |
| 11 | 2 | B | 18 | 20 | B |
| 12 | 2 | S | 18 | 20 | SB |
Any ideas on how to generate this data in SQL Server select statement? Visually, this what am trying to get.
Tricky part here is that interval might need to be split in several pieces like 0..20 for person 2.
Window functions to the rescue. This query illustrates what you need to do:
WITH
deltas AS (
SELECT personid_fk, hrs, sum(delta_s) as delta_s, sum(delta_b) as delta_b
FROM (SELECT personid_fk, from_hrs as hrs,
case when score = 'S' then 1 else 0 end as delta_s,
case when score = 'B' then 1 else 0 end as delta_b
FROM score_tbl
UNION ALL
SELECT personid_fk, to_hrs as hrs,
case when score = 'S' then -1 else 0 end as delta_s,
case when score = 'B' then -1 else 0 end as delta_b
FROM score_tbl) _
GROUP BY personid_fk, hrs
),
running AS (
SELECT personid_fk, hrs as from_hrs,
lead(hrs) over (partition by personid_fk order by hrs) as to_hrs,
sum(delta_s) over (partition by personid_fk order by hrs) running_s,
sum(delta_b) over (partition by personid_fk order by hrs) running_b
FROM deltas
)
SELECT personid_fk, 'M' as s_type, from_hrs, to_hrs,
case when running_b > 0 then 'SB' else 'S' end as score
FROM running
WHERE running_s > 0
UNION ALL
SELECT personid_fk, s_type, from_hrs, to_hrs, score
FROM score_tbl
WHERE s_type = 'B'
ORDER BY personid_fk, from_hrs;
Step by step:
deltas is union of two passes on score_tbl - one for start and one for end of score/bonus interval, creating a timeline of +1/-1 events
running calculates running total of deltas over time, yielding split intervals where score/bonus are active
final query just converts score codes and unions bonus intervals (which are passed unchanged)
SQL Fiddle here.

Pivot table in SQL but keep measure names in column

Im having trouble pivoting a table correct.
My input is this raw data table:
+------+---------+------------+----------+
| YEAR | FACULTY | ADMISSIONS | DROPOUTS |
+------+---------+------------+----------+
| 2018 | LAW | 15 | 2 |
| 2019 | LAW | 18 | 4 |
| 2020 | LAW | 11 | 1 |
| 2018 | MATH | 19 | 1 |
| 2019 | MATH | 17 | 6 |
| 2020 | MATH | 24 | 5 |
+------+---------+------------+----------+
I want to pivot years to row but I also want to keep the measure for admissions and drop outs as row names. E.g I want a table as this:
+---------+------------+------+------+------+
| FACULTY | MEASURE | 2018 | 2019 | 2020 |
+---------+------------+------+------+------+
| LAW | ADMISSIONS | 15 | 18 | 11 |
| LAW | DROPOUTS | 2 | 4 | 1 |
| MATH | ADMISSIONS | 19 | 17 | 24 |
| MATH | DROPOUTS | 1 | 6 | 5 |
+---------+------------+------+------+------+
I can pivot years using:
SELECT *
FROM
(
SELECT FACULTY, YEAR, ADMINISSION, DROPPUTS
FROM TABLE
PIVOT (SUM (ADMISSIONS)
FOR YEAR IN (2018,2019,2020)
)
But I need to pivot both measures and still get the measure names column. Any ideas?
That's unpivoting, then pivoting. If your database supports lateral joins and values(), you can do:
select
t.faculty,
x.measure,
sum(case when t.year = 2018 then x.value end) value_2018,
sum(case when t.year = 2019 then x.value end) value_2019,
sum(case when t.year = 2020 then x.value end) value_2020
from mytable t
cross apply (values ('admission', admission), ('dropout', dropout)) as x(measure, value)
group by t.faculty, x.measure
I would unpivot using apply (assuming you are using SQL Server) and reaggregate:
select t.faculty, v.measure,
max(case when year = 2018 then val end) as [2018],
max(case when year = 2019 then val end) as [2019],
max(case when year = 2020 then val end) as [2020]
from t cross apply
(values ('ADMISSIONS', ADMISSIONS), ('DROPOUTS', DROPOUTS)
) v(measure, val)
group by t.faculty, v.measure

How to count value by type and convert column to row

I have a table "tbTest1" like this:
q1 | q2 | q3 | type
---+----+----+-----------
3 | 2 | 2 | Student
2 | 2 | 3 | Student
3 | 1 | 1 | Alumni
1 | 1 | 3 | Student
1 | 3 | 2 | Alumni
Now I want to convert "tbTest1" into like this where how many 1's,2's or 3's had given by Student for 'q1', 'q2' & 'q3' :
q | 1 | 2 | 3
---+---+---+---
q1 | 1 | 1 | 1
q2 | 1 | 2 | 0
q3 | 0 | 1 | 2
You can use conditional aggregation:
select v.q,
sum(case when val = 1 then 1 else 0 end) as val_1,
sum(case when val = 2 then 1 else 0 end) as val_2,
sum(case when val = 3 then 1 else 0 end) as val_3
from tbTest t cross apply
(values ('q1', t.q1), ('q2', t.q2), ('q3', t.q3)) v(q, val)
where t.type = 'student'
group by v.q;

Need a select query to get the output as shown below.?

I Have a SQL Table as shown below,
| Loc | Date | Id | Sts |
-------------------------
| Hyd | 15-01-2016 | 1 | A |
| Vjd | 16-01-2016 | 2 | B |
| Viz | 15-01-2016 | 3 | C |
| Hyd | 15-03-2016 | 4 | A |
| Vjd | 15-03-2016 | 5 | B |
| Viz | 15-03-2016 | 6 | C |
| Hyd | 15-03-2016 | 4 | A |
| Vjd | 15-05-2016 | 5 | B |
| Viz | 15-05-2016 | 6 | C |
And i need output like,
**| Loc | Jan-16 | Mar-16 | May-16 |**
**|-------|A |B |C |A |B |C |A |B |C |**
----------
|Hyd | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
|Vjd | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
|Viz | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
Can anyone help me out please..
Thanks in Advance.
You will basically need to aggregate based on CASE statements, like this:
DECLARE #table TABLE (loc VARCHAR(3), [date] DATE, id INT, sts CHAR(1));
INSERT INTO #table SELECT 'Hyd', '20160115', 1, 'A';
INSERT INTO #table SELECT 'Vjd', '20160116', 2, 'B';
INSERT INTO #table SELECT 'Viz', '20160115', 3, 'C';
INSERT INTO #table SELECT 'Hyd', '20160315', 4, 'A';
INSERT INTO #table SELECT 'Vjd', '20160315', 5, 'B';
INSERT INTO #table SELECT 'Viz', '20160315', 6, 'C';
INSERT INTO #table SELECT 'Hyd', '20160315', 4, 'A';
INSERT INTO #table SELECT 'Vjd', '20160515', 5, 'B';
INSERT INTO #table SELECT 'Viz', '20160515', 6, 'C';
SELECT
loc,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 1 AND sts = 'A' THEN 1 END) AS Jan_A,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 1 AND sts = 'B' THEN 1 END) AS Jan_B,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 1 AND sts = 'C' THEN 1 END) AS Jan_C,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 3 AND sts = 'A' THEN 1 END) AS Mar_A,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 3 AND sts = 'B' THEN 1 END) AS Mar_B,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 3 AND sts = 'C' THEN 1 END) AS Mar_C,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 5 AND sts = 'A' THEN 1 END) AS May_A,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 5 AND sts = 'B' THEN 1 END) AS May_B,
COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 5 AND sts = 'C' THEN 1 END) AS May_C
FROM
#table
GROUP BY
loc;
Results:
loc Jan_A Jan_B Jan_C Mar_A Mar_B Mar_C May_A May_B May_C
Hyd 1 0 0 2 0 0 0 0 0
Viz 0 0 1 0 0 1 0 0 1
Vjd 0 1 0 0 1 0 0 1 0

Return all records if more than 2/3 satisfy a value

I have a table representing multiple transactions by customers in any given day. I need to return all transactions per customer if two thirds or more of the transactions per customer were cash instead of credit card.
In the example below I want to return all of customers' 1, 4 transactions as they were the only customers to have 2 thirds or more of their transactions as cash:
+----------------+-------------+-----------------+------------------+
| Transaction ID | CustomerNum | TransactionType | TransactionValue |
+----------------+-------------+-----------------+------------------+
| 1 | 1 | Cash | 11 |
| 2 | 1 | Card | 12 |
| 3 | 1 | Cash | 13 |
| 4 | 2 | Cash | 14 |
| 5 | 2 | Card | 15 |
| 6 | 3 | Cash | 15 |
| 7 | 3 | Card | 11 |
| 8 | 3 | Cash | 12 |
| 9 | 3 | Card | 13 |
| 10 | 4 | Cash | 14 |
| 11 | 4 | Cash | 15 |
| 12 | 4 | Cash | 15 |
+----------------+-------------+-----------------+------------------+
This seems to work with the sample data:
declare #t table (TranID int not null,CustomerNum int not null,
TranType varchar(17) not null,TranValue decimal(18,0) not null)
insert into #t(TranID,CustomerNum,TranType,TranValue) values
( 1,1,'Cash',11), ( 2,1,'Card',12), ( 3,1,'Cash',13),
( 4,2,'Cash',14), ( 5,2,'Card',15),
( 6,3,'Cash',15), ( 7,3,'Card',11), ( 8,3,'Cash',12), ( 9,3,'Card',13),
(10,4,'Cash',14), (11,4,'Cash',15), (12,4,'Cash',15)
;With Counted as (
select *,
COUNT(*) OVER (PARTITION BY CustomerNum) as cnt,
SUM(CASE WHEN TranType='Cash' THEN 1 ELSE 0 END)
OVER (PARTITION BY CustomerNum) as cashcnt
from #t
)
select * from Counted
where cashcnt * 3 >= cnt * 2
I've gone with simple multiplication at the end to keep all of the maths as integers and avoid having to think about float/decimal and the representation of 2/3.
Result:
TranID CustomerNum TranType TranValue cnt cashcnt
----------- ----------- ----------------- ----------- ----------- -----------
1 1 Cash 11 3 2
2 1 Card 12 3 2
3 1 Cash 13 3 2
10 4 Cash 14 3 3
11 4 Cash 15 3 3
12 4 Cash 15 3 3
Try this:
select t.*
from (select customernum
from transactions
group by customernum
having sum(case when TransactionType = 'Cash' then 1.0 else 0.0 end) / sum(1.0) > 0.6666) c
join transactions t on t.customernum = c.customernum