Rearrange Dataset - sql
I am working on some survey data and was wondering if i could rearrange the data to make it a lot more usable. The results are classified as 1-5 and I would like the preferred table to count the results by value and group by question.
original table:
year | month | customer_id | survey | q1 | q2 | q3 | q4 | q5 | q6 ----> q29
-----|-------|-------------|--------|----|----|----|----|----|---
2016 | Oct | ABC12345678 | 1 | 1 | 2 | 3 | 1 | 2 | 3
2016 | Oct | DEF12345678 | 1 | 2 | 1 | 4 | 2 | 1 | 1
2016 | Oct | GHI12345678 | 1 | 4 | 2 | 1 | 1 | 3 | 2
2016 | Oct | JKL12345678 | 1 | 2 | 3 | 2 | 4 | 1 | 3
2016 | Oct | MNO12345678 | 1 | 5 | 2 | 3 | 1 | 2 | 3
2016 | Oct | PQR12345678 | 1 | 3 | 4 | 4 | 2 | 4 | 4
2016 | Oct | STU12345678 | 1 | 1 | 5 | 3 | 1 | 2 | 5
2016 | Oct | VWX12345678 | 1 | 2 | 2 | 4 | 2 | 1 | 1
Preferred Table:
Year | Month | Survey | Question | 1 | 2 | 3 | 4 | 5 |
-----|-------|--------|----------|----|----|----|----|----|
2016 | Oct | 1 | q1 | 80 | 45 | 25 | 63 | 89 |
2016 | Oct | 1 | q2 | 65 | 75 | 35 | 53 | 69 |
I can do this with a basic select query but to do it for every question will end up with 29 unions and there must be a quicker way.
Regards,
Neil
This is what I would use until someone posts a better solution:
<!-- language: lang-sql -->
use tempdb;
create table #tempsurvey (year int, month varchar(32), customer_id varchar(32), survey int, [q1] int, [q2] int, [q3] int, [q4] int, [q5] int, [q6] int, [q7] int, [q8] int, [q9] int, [q10] int, [q11] int, [q12] int, [q13] int, [q14] int, [q15] int, [q16] int, [q17] int, [q18] int, [q19] int, [q20] int, [q21] int, [q22] int, [q23] int, [q24] int, [q25] int, [q26] int, [q27] int, [q28] int, [q29] int);
insert into #tempsurvey values (2016,'Oct', 'ABC12345678', 1, 1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2);
insert into #tempsurvey values (2016,'Oct', 'DEF12345678', 1, 4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5,1,4,5);
with cte as (
select t.[year], t.[month], t.customer_id, t.survey, x.question, x.answer
from #tempsurvey t
cross apply (values ('q1',q1) ,('q2',q2) ,('q3',q3) ,('q4',q4) ,('q5',q5) ,('q6',q6) ,('q7',q7) ,('q8',q8) ,('q9',q9) ,('q10',q10) ,('q11',q11) ,('q12',q12) ,('q13',q13) ,('q14',q14) ,('q15',q15) ,('q16',q16) ,('q17',q17) ,('q18',q18) ,('q19',q19) ,('q20',q20) ,('q21',q21) ,('q22',q22) ,('q23',q23) ,('q24',q24) ,('q25',q25) ,('q26',q26) ,('q27',q27) ,('q28',q28) ,('q29',q29))
as x (Question,Answer)
)
select [year], [month], [survey], question, [1]=sum(case when answer=1 then 1 else 0 end), [2]=sum(case when answer=2 then 1 else 0 end), [3]=sum(case when answer=3 then 1 else 0 end), [4]=sum(case when answer=4 then 1 else 0 end), [5]=sum(case when answer=5 then 1 else 0 end)
from cte
group by [year], [month], [survey], question;
drop table #tempsurvey;
Brad Schulz on cross apply: http://bradsruminations.blogspot.com/search/label/CROSS%20APPLY
Sean is correct.
It will go like this:
with subquery as (
select year, month, survey, question, tempVal from #table
unpivot
(tempVal for question in (q1, q2, q3, q4, q5, q6, q7, ..., q29)) as up
)
select year, month, survey, question,
sum(case when tempVal = 1 then 1 else 0 end) as a1,
sum(case when tempVal = 2 then 1 else 0 end) as a2,
sum(case when tempVal = 3 then 1 else 0 end) as a3,
sum(case when tempVal = 4 then 1 else 0 end) as a4,
sum(case when tempVal = 5 then 1 else 0 end) as a5
from subquery
group by year, month, survey, question
Related
Generate multiple record from existing records based on interval columns [from and to]
I have 2 types of score [M,B] in column 3, if a type is M, then the score is either an S[scored] or SB[bonus scored] in column 6. Every interval [from_hrs - to_hrs] for a type B must have a corresponding SB for type M, thus, an interval for a type B cannot have a score of S for a type M. I have several records that were unfortunately captured as seen in the table below. CREATE TABLE SCORE_TBL ( ID int IDENTITY(1,1) PRIMARY KEY, PERSONID_FK int NOT NULL, S_TYPE varchar(50) NULL, FROM_HRS int NULL, TO_HRS int NULL, SCORE varchar(50) NULL, ); INSERT INTO SCORE_TBL(PERSONID_FK,S_TYPE,FROM_HRS,TO_HRS,SCORE) VALUES (1, 'M' , 0,20, 'S'), (1, 'B',6, 8, 'B'), (2, 'B',0, 2, 'B'), (2, 'M',0,20, 'S'), (2, 'B', 10,13, 'B'), (2, 'B', 18,20, 'B'), (2, 'M', 13,18, 'S'); | ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE | |----|-------------|------|----------|--------|-------| | 1 | 1 | M | 0 | 20 | S | | 2 | 1 | B | 6 | 8 | B | | 3 | 2 | B | 0 | 2 | B | | 4 | 2 | M | 0 | 20 | S | | 5 | 2 | B | 10 | 13 | B | | 6 | 2 | B | 18 | 20 | B | | 7 | 2 | M | 13 | 18 | S | I want the data to look like this | ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE | |----|-------------|------|----------|--------|-------| | 1 | 1 | M | 0 | 6 | S | | 2 | 1 | M | 6 | 8 | SB | | 3 | 1 | B | 6 | 8 | B | | 4 | 1 | M | 8 | 20 | S | | 5 | 2 | B | 0 | 2 | B | | 6 | 2 | M | 0 | 2 | SB | | 7 | 2 | M | 2 | 10 | S | | 8 | 2 | B | 10 | 13 | B | | 9 | 2 | M | 10 | 13 | SB | | 10 | 2 | M | 13 | 18 | S | | 11 | 2 | B | 18 | 20 | B | | 12 | 2 | S | 18 | 20 | SB | Any ideas on how to generate this data in SQL Server select statement? Visually, this what am trying to get.
Tricky part here is that interval might need to be split in several pieces like 0..20 for person 2. Window functions to the rescue. This query illustrates what you need to do: WITH deltas AS ( SELECT personid_fk, hrs, sum(delta_s) as delta_s, sum(delta_b) as delta_b FROM (SELECT personid_fk, from_hrs as hrs, case when score = 'S' then 1 else 0 end as delta_s, case when score = 'B' then 1 else 0 end as delta_b FROM score_tbl UNION ALL SELECT personid_fk, to_hrs as hrs, case when score = 'S' then -1 else 0 end as delta_s, case when score = 'B' then -1 else 0 end as delta_b FROM score_tbl) _ GROUP BY personid_fk, hrs ), running AS ( SELECT personid_fk, hrs as from_hrs, lead(hrs) over (partition by personid_fk order by hrs) as to_hrs, sum(delta_s) over (partition by personid_fk order by hrs) running_s, sum(delta_b) over (partition by personid_fk order by hrs) running_b FROM deltas ) SELECT personid_fk, 'M' as s_type, from_hrs, to_hrs, case when running_b > 0 then 'SB' else 'S' end as score FROM running WHERE running_s > 0 UNION ALL SELECT personid_fk, s_type, from_hrs, to_hrs, score FROM score_tbl WHERE s_type = 'B' ORDER BY personid_fk, from_hrs; Step by step: deltas is union of two passes on score_tbl - one for start and one for end of score/bonus interval, creating a timeline of +1/-1 events running calculates running total of deltas over time, yielding split intervals where score/bonus are active final query just converts score codes and unions bonus intervals (which are passed unchanged) SQL Fiddle here.
Pivot table in SQL but keep measure names in column
Im having trouble pivoting a table correct. My input is this raw data table: +------+---------+------------+----------+ | YEAR | FACULTY | ADMISSIONS | DROPOUTS | +------+---------+------------+----------+ | 2018 | LAW | 15 | 2 | | 2019 | LAW | 18 | 4 | | 2020 | LAW | 11 | 1 | | 2018 | MATH | 19 | 1 | | 2019 | MATH | 17 | 6 | | 2020 | MATH | 24 | 5 | +------+---------+------------+----------+ I want to pivot years to row but I also want to keep the measure for admissions and drop outs as row names. E.g I want a table as this: +---------+------------+------+------+------+ | FACULTY | MEASURE | 2018 | 2019 | 2020 | +---------+------------+------+------+------+ | LAW | ADMISSIONS | 15 | 18 | 11 | | LAW | DROPOUTS | 2 | 4 | 1 | | MATH | ADMISSIONS | 19 | 17 | 24 | | MATH | DROPOUTS | 1 | 6 | 5 | +---------+------------+------+------+------+ I can pivot years using: SELECT * FROM ( SELECT FACULTY, YEAR, ADMINISSION, DROPPUTS FROM TABLE PIVOT (SUM (ADMISSIONS) FOR YEAR IN (2018,2019,2020) ) But I need to pivot both measures and still get the measure names column. Any ideas?
That's unpivoting, then pivoting. If your database supports lateral joins and values(), you can do: select t.faculty, x.measure, sum(case when t.year = 2018 then x.value end) value_2018, sum(case when t.year = 2019 then x.value end) value_2019, sum(case when t.year = 2020 then x.value end) value_2020 from mytable t cross apply (values ('admission', admission), ('dropout', dropout)) as x(measure, value) group by t.faculty, x.measure
I would unpivot using apply (assuming you are using SQL Server) and reaggregate: select t.faculty, v.measure, max(case when year = 2018 then val end) as [2018], max(case when year = 2019 then val end) as [2019], max(case when year = 2020 then val end) as [2020] from t cross apply (values ('ADMISSIONS', ADMISSIONS), ('DROPOUTS', DROPOUTS) ) v(measure, val) group by t.faculty, v.measure
How to count value by type and convert column to row
I have a table "tbTest1" like this: q1 | q2 | q3 | type ---+----+----+----------- 3 | 2 | 2 | Student 2 | 2 | 3 | Student 3 | 1 | 1 | Alumni 1 | 1 | 3 | Student 1 | 3 | 2 | Alumni Now I want to convert "tbTest1" into like this where how many 1's,2's or 3's had given by Student for 'q1', 'q2' & 'q3' : q | 1 | 2 | 3 ---+---+---+--- q1 | 1 | 1 | 1 q2 | 1 | 2 | 0 q3 | 0 | 1 | 2
You can use conditional aggregation: select v.q, sum(case when val = 1 then 1 else 0 end) as val_1, sum(case when val = 2 then 1 else 0 end) as val_2, sum(case when val = 3 then 1 else 0 end) as val_3 from tbTest t cross apply (values ('q1', t.q1), ('q2', t.q2), ('q3', t.q3)) v(q, val) where t.type = 'student' group by v.q;
Need a select query to get the output as shown below.?
I Have a SQL Table as shown below, | Loc | Date | Id | Sts | ------------------------- | Hyd | 15-01-2016 | 1 | A | | Vjd | 16-01-2016 | 2 | B | | Viz | 15-01-2016 | 3 | C | | Hyd | 15-03-2016 | 4 | A | | Vjd | 15-03-2016 | 5 | B | | Viz | 15-03-2016 | 6 | C | | Hyd | 15-03-2016 | 4 | A | | Vjd | 15-05-2016 | 5 | B | | Viz | 15-05-2016 | 6 | C | And i need output like, **| Loc | Jan-16 | Mar-16 | May-16 |** **|-------|A |B |C |A |B |C |A |B |C |** ---------- |Hyd | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | |Vjd | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | |Viz | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | Can anyone help me out please.. Thanks in Advance.
You will basically need to aggregate based on CASE statements, like this: DECLARE #table TABLE (loc VARCHAR(3), [date] DATE, id INT, sts CHAR(1)); INSERT INTO #table SELECT 'Hyd', '20160115', 1, 'A'; INSERT INTO #table SELECT 'Vjd', '20160116', 2, 'B'; INSERT INTO #table SELECT 'Viz', '20160115', 3, 'C'; INSERT INTO #table SELECT 'Hyd', '20160315', 4, 'A'; INSERT INTO #table SELECT 'Vjd', '20160315', 5, 'B'; INSERT INTO #table SELECT 'Viz', '20160315', 6, 'C'; INSERT INTO #table SELECT 'Hyd', '20160315', 4, 'A'; INSERT INTO #table SELECT 'Vjd', '20160515', 5, 'B'; INSERT INTO #table SELECT 'Viz', '20160515', 6, 'C'; SELECT loc, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 1 AND sts = 'A' THEN 1 END) AS Jan_A, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 1 AND sts = 'B' THEN 1 END) AS Jan_B, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 1 AND sts = 'C' THEN 1 END) AS Jan_C, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 3 AND sts = 'A' THEN 1 END) AS Mar_A, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 3 AND sts = 'B' THEN 1 END) AS Mar_B, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 3 AND sts = 'C' THEN 1 END) AS Mar_C, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 5 AND sts = 'A' THEN 1 END) AS May_A, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 5 AND sts = 'B' THEN 1 END) AS May_B, COUNT(CASE WHEN YEAR([date]) = 2016 AND MONTH([date]) = 5 AND sts = 'C' THEN 1 END) AS May_C FROM #table GROUP BY loc; Results: loc Jan_A Jan_B Jan_C Mar_A Mar_B Mar_C May_A May_B May_C Hyd 1 0 0 2 0 0 0 0 0 Viz 0 0 1 0 0 1 0 0 1 Vjd 0 1 0 0 1 0 0 1 0
Return all records if more than 2/3 satisfy a value
I have a table representing multiple transactions by customers in any given day. I need to return all transactions per customer if two thirds or more of the transactions per customer were cash instead of credit card. In the example below I want to return all of customers' 1, 4 transactions as they were the only customers to have 2 thirds or more of their transactions as cash: +----------------+-------------+-----------------+------------------+ | Transaction ID | CustomerNum | TransactionType | TransactionValue | +----------------+-------------+-----------------+------------------+ | 1 | 1 | Cash | 11 | | 2 | 1 | Card | 12 | | 3 | 1 | Cash | 13 | | 4 | 2 | Cash | 14 | | 5 | 2 | Card | 15 | | 6 | 3 | Cash | 15 | | 7 | 3 | Card | 11 | | 8 | 3 | Cash | 12 | | 9 | 3 | Card | 13 | | 10 | 4 | Cash | 14 | | 11 | 4 | Cash | 15 | | 12 | 4 | Cash | 15 | +----------------+-------------+-----------------+------------------+
This seems to work with the sample data: declare #t table (TranID int not null,CustomerNum int not null, TranType varchar(17) not null,TranValue decimal(18,0) not null) insert into #t(TranID,CustomerNum,TranType,TranValue) values ( 1,1,'Cash',11), ( 2,1,'Card',12), ( 3,1,'Cash',13), ( 4,2,'Cash',14), ( 5,2,'Card',15), ( 6,3,'Cash',15), ( 7,3,'Card',11), ( 8,3,'Cash',12), ( 9,3,'Card',13), (10,4,'Cash',14), (11,4,'Cash',15), (12,4,'Cash',15) ;With Counted as ( select *, COUNT(*) OVER (PARTITION BY CustomerNum) as cnt, SUM(CASE WHEN TranType='Cash' THEN 1 ELSE 0 END) OVER (PARTITION BY CustomerNum) as cashcnt from #t ) select * from Counted where cashcnt * 3 >= cnt * 2 I've gone with simple multiplication at the end to keep all of the maths as integers and avoid having to think about float/decimal and the representation of 2/3. Result: TranID CustomerNum TranType TranValue cnt cashcnt ----------- ----------- ----------------- ----------- ----------- ----------- 1 1 Cash 11 3 2 2 1 Card 12 3 2 3 1 Cash 13 3 2 10 4 Cash 14 3 3 11 4 Cash 15 3 3 12 4 Cash 15 3 3
Try this: select t.* from (select customernum from transactions group by customernum having sum(case when TransactionType = 'Cash' then 1.0 else 0.0 end) / sum(1.0) > 0.6666) c join transactions t on t.customernum = c.customernum