Unpivot grouped SQL Data - google-bigquery

Unpivot grouped SQL Data - google-bigquery

I have table [dataset] as following:
Date
Person
Question
Answer
Answer Notes
2022/04/10
A
Topic A Question 1
Apple
Good
2022/04/10
A
Topic A Question 2
Banana
Bad
2022/04/10
A
Topic A Question 3
2022/04/10
A
Topic B Question 1
Dog
Red
2022/04/10
A
Topic B Question 2
Cat
Black
2022/04/10
A
Topic B Question 3
Horse
Blue
Seem illogical but I need to have in the following format:
Date
Person
Topic A
Topic A Notes
Topic B
Topic B Notes
2022/04/10
A
Apple, Banana
Good, Bad
Dog, Cat, Horse
Red, Black, Blue
How do I achieve this? I have tried the following:
SELECT a.Date,
a.Person,
topA.Answer AS Topic A,
topA.Answer Notes AS Topic A Notes,
topB.Answer AS Topic B,
topB.Answer Notes AS Topic A Notes
FROM [Dataset] a
LEFT JOIN [Dataset] topA
ON a.Date = topA.Date
AND a.Person = topA.Person
AND topA.Question LIKE 'Topic A%'
LEFT JOIN [Dataset] topB
ON a.Date = topB.Date
AND a.Person= topB.Person
AND topB.Question LIKE 'Topic B%'

You might try below:
WITH parsed_dataset AS (
SELECT *,
REGEXP_EXTRACT(Question, r'^Topic (\w+)') AS Topic,
REGEXP_EXTRACT(Question, r'Question (\d+)$') AS Number
FROM Dataset
)
SELECT * FROM (
SELECT Date, Person, Topic,
ARRAY_AGG(Answer IGNORE NULLS ORDER BY Number) AS Answer,
ARRAY_AGG(Answer_Notes IGNORE NULLS ORDER BY Number) AS Note
FROM parsed_dataset
GROUP BY 1, 2, 3
) PIVOT (
ANY_VALUE(TRIM(FORMAT('%t', Answer), '[]')) Topic,
ANY_VALUE(TRIM(FORMAT('%t', Note), '[]')) Topic_Note FOR Topic IN ('A', 'B')
);

Consider below approach
select * from (
select * except(Question)
from your_table, unnest([struct(
regexp_extract(Question, r'^Topic (\w+)') as Topic,
regexp_extract(Question, r'Question (\w+)') as QuestionNum
)])
)
pivot (
string_agg(Answer, ', ' order by QuestionNum) as Topic,
string_agg(Answer_Notes, ', ' order by QuestionNum) as Topic_Notes
for Topic in ('A', 'B')
)
if applied to sample data in your question - output is

Related

What's the best way to group SQL results by items batch?

For example, I have a simple table books:
author
book
Author-A
Book-A1
Author-A
Book-A2
Author-B
Book-B1
Author-C
Book-C1
Author-C
Book-C2
And I need to count books by each author, so I'll write:
select author, count(*) from books
group by author
# Author-A = 2
# Author-B = 1
# Author-C = 2
But now I need to count books by groups of authors:
groupA = ['Author-A', 'Author-C'],
groupB = ['Author-B']
select authorGroup, count(*) from books
group by {
case author in groupA -> 'groupA'
case author in groupB -> 'groupB'
} as authorGroup
# ['Author-A', 'Author-C'] = 4
# ['Author-B'] = 1
These groups can be different and come from another module.
What's the best way to write this requests? Maybe without union such as:
select author as 'groupA', count(*) from books
where author in { groupA }
union
select author as 'groupB', count(*) from books
where author in { groupB }
because there could be a lot of groups in request (~20-30)
The problem is that these groups can be absolutely dynamic: I can request ['Author-A', 'Author-B'] in one request as one group and ['Author-B', 'Author-C'] in another.
For example, the group is not something like author's country or genre. It can be totally dynamic.

The usual way is to JOIN on to a mapping table, which can be an in-line-view if need be (though I recommend an actual table, which can be indexed).
WITH
author_group AS
(
SELECT 'Author-A' AS author, 'Group-A' AS group_label
UNION ALL
SELECT 'Author-B' AS author, 'Group-B' AS group_label
UNION ALL
SELECT 'Author-C' AS author, 'Group-A' AS group_label
)
SELECT
author_group.group_label,
COUNT(*)
FROM
books
INNER JOIN
author_group
ON author_group.author = books.author
GROUP BY
author_group.group_label
Similar results can be achieved with CASE expressions, but it doesn't scale very well...
WITH
mapped_author AS
(
SELECT
*,
CASE author
WHEN 'Author-A' THEN 'Group-A'
WHEN 'Author-B' THEN 'Group-B'
WHEN 'Author-C' THEN 'Group-A'
END
AS author_group
FROM
books
)
SELECT
author_group,
COUNT(*)
FROM
mapped_author
GROUP BY
author_group

First you need to create a new table that show in what group is the author.
Later you just count
Like this:
select distinct a.group_auth, count(a.book) over (partition by a.group_auth)
from
(select
case when b.Author in [groupA] then 'groupA',
when b.Author in [groupB] then 'groupB'
end case as group_auth,
b.book as book
from books b
) as a
;

Query with different categories

I want to look if there are more than one QuestionCategory in one day in the table contentment. In my case people don't need to answer in one day questions with different categories. I can make a trigger of this.
The contentmenttable: employeeid, questionid, date, score
The questiontable: questionid, questioncat, question
Data contentmenttable: 1, 1, 11-18-2018, 4
Data questiontable: 1, Work, How is your job? 2, Work, Are you happy
with your job?
If have something like this:
select c.questionid, date
from contentment c
join question q
on c.questionid= q.questionid
group by c.questionid, date
having count(questioncat) >= 2
But this query is only counting IF a questionid is two times or more in this table, not if there are two different questioncategories in this table.
I use SQL Server.
So if someone wants to insert this:
insert into contentment values (1, 2, 11-18-2018', null) (null because employee needs to give a score)
The query needs to give this questionid and date (2 and 11-18-2018), because it is the same questioncat "work" on the same day 11-18-2018.

You need to add DISTINCT:
select c.questionid, date
from contentment c
join question q
on c.questionid= q.questionid
group by c.questionid, date
having count(DISTINCT questioncat) >= 2;
-- counting only different questioncat

Your question is hard to follow, but I think you want employees that have more than one question category in a given day. If so:
select c.employeeid, c.date, count(distinct q.questioncat)
from contentment c join
question q
on c.questionid = q.questionid
group by c.employeeid, c.date
having count(distinct q.questioncat) >= 2;

SQL Results show up in both queries

I'm trying to return results where people have signed a particular survey. However ''m having issues returning survey answers when they have previously answered Survey 1 in the past and show in both Survey 1 & Survey 2.
How do i ensure survey answers only appear once by selecting the most recent survey results so that they do not show in both Surveys?
The results in italics represent a duplicate record for a store which has answered both surveys but i only want the most recently answered survey to appear. In this instance they should only appear in Survey 1 as it is most recent
CODE
go
use [database]
--Select Outlets that have answers to Survey 1
(select distinct activityanswers.CustomerNumber as Outlet, 'Survey 1' as
'Survey Program', max(answereddate) as 'Last Answered Date'
from dbo.activityanswers
where activityid in (select id from activitys where ActivityGroupId =
'1061293')
group by customernumber
)
--Select Outlets that have answers to Survey 2
(select distinct activityanswers.CustomerNumber as Outlet, 'Survey 2' as
'Survey Program', max(answereddate) as 'Last Answered Date'
from dbo.activityanswers
where activityid in (select id from activitys where ActivityGroupId =
'1061294')
group by customernumber
)
Survey 1 RESULTS
Store Survey AnswerTime
1285939 Survey 1 2018-08-27 10:13:57.000
1348372 Survey 1 2018-08-27 09:21:18.000
2142522 Survey 1 2018-08-27 15:26:29.000
2147380 Survey 1 2018-08-24 22:26:49.000
Survey 2 RESULTS
Store Survey AnswerTime
2147380 Survey 2 2018-08-24 21:58:59.000
2641188 Survey 2 2018-08-27 11:39:31.000

You can get the result with a Single SQL Statement. Try using Row_Number or Rank Function.
Something like this might work
;WITH CTE
AS
(
SELECT
RN =ROW_NUMBER() OVER(PARTITION BY ANS.CustomerNumber ORDER BY ANS.answereddate DESC,
ACT.ActivityGroupId ASC),
ANS.CustomerNumber as Outlet,
CASE ACT.ActivityGroupId
WHEN '1061293' THEN 'Survey 1'
ELSE 'Survey 2' END as 'Survey Program',
ANS.answereddate as 'Last Answered Date'
FROM dbo.ActivityAnswers ANS
INNER JOIN Activitys ACT
ON ANS.activityid = ACT.ID
WHERE ACT.ActivityGroupId IN
(
'1061293',
'1061294'
)
)
SELECT
*
FROM CTE
WHERE RN = 1

I think you should filter after grouping, try something like this:
select a.CustomerNumber as Outlet, a.Last_Answered_Date from (
select CustomerNumber, max(answereddate) as 'Last_Answered_Date'
from activityanswers
group by customernumber) a
join activityanswers b on a.CustomerNumber = b.CustomerNumber and a.
[Last_Answered_Date] = b.answereddate
where b.activityid in (select id from activitys where ActivityGroupId = '1061294')

You can do it by checking whether a particular customer has a newer answer or not (not exists subquery). That way, you can also eliminate the need to group by.
select
CustomerNumber as Outlet,
'Survey 1' as 'Survey Program',
answereddate as 'Last Answered Date'
from dbo.activityanswers a
where activityid in (
select id from activitys where ActivityGroupId = '1061293')
and not exists (
select from dbo.activityanswers b
where b.CustomerNumber = a.CustomerNumber
and b.answereddate > a.answereddate)

Maybe a sub-query would serve your purpose !
SELECT
Outlet
, SurveyProgram
, LastAnsweredDate
FROM (
SELECT
ans.CustomerNumber Outlet
, CASE WHEN ActivityGroupId = '1061293' THEN 'Survey 1' ELSE 'Survey 2' END SurveyProgram
, answereddate LastAnsweredDate
, ROW_NUMBER() OVER(PARTITION BY ans.CustomerNumber ORDER BY answereddate DESC) RN
FROM
activityanswers ans
LEFT JOIN activitys act ON act.ID = ans.activityid AND ActivityGroupId IN('1061293', '1061294')
GROUP BY
ans.CustomerNumber
) D
WHERE
RN = 1
I have replaced the IN() with a LEFT JOIN which is much better approach in your query. Also, from the query it will give you both Survey 1 and 2, so we used ROW_NUMBER to filter them. I ordered them in DESC, so the recent datetime will be at the top. So, getting the first row from each CustomerNumber will give you the recent records. . This would give you more stretching flexibility in the query.

SQL Server - If the field had been pivot, how to pivot again by another field? Is that the DB design correct?

I have a raw data like
Title Question Answer AnswerRemark
----------------------------------------
ACCCode1 Q1 Y NULL
ACCCode1 Q2 N 6
ACCCode1 Q3 Y Workout
As you can see the field "AnswerRemark" is free text for "Answer", some answer is not require remark.
I can simply pivot the question and answer like:
Title Q1 Q2 Q3
AccessCode1 Y N Y
My desired Result will be
Title Q1 R1 Q2 R2 Q3 R3
AccessCode1 Y NULL N 6 Y Workout
Is that possible? I can not figure it out how to achieve this, pivot the Answer is not good idea as it have many combinations.
Any suggestion?

Use Conditional Aggregation :
SELECT Title,
MAX(CASE WHEN Question='Q1' THEN Answer END) as Q1 ,
MAX(CASE WHEN Question='Q1' THEN AnswerRemark END) as R1 ,
MAX(CASE WHEN Question='Q2' THEN Answer END) as Q2 ,
MAX(CASE WHEN Question='Q2' THEN AnswerRemark END) as R2 ,
MAX(CASE WHEN Question='Q3' THEN Answer END) as Q3 ,
MAX(CASE WHEN Question='Q3' THEN AnswerRemark END) as R3
FROM [tablename]
GROUP BY Title

Using Pivot we get the result
;With cte(Title, Question,Answer,AnswerRemark)
AS
(
SELECT 'ACCCode1','Q1','Y',NULL UNION ALL
SELECT 'ACCCode1','Q2','N','6' UNION ALL
SELECT 'ACCCode1','Q3','Y','Workout' UNION ALL
SELECT 'ACCCode1','Q2','N','7' UNION ALL
SELECT 'ACCCode1','Q1','Y',NULL UNION ALL
SELECT 'ACCCode1','Q3','N','9' UNION ALL
SELECT 'ACCCode1','Q1','N','4' UNION ALL
SELECT 'ACCCode1','Q2','N','Workout' UNION ALL
SELECT 'ACCCode1','Q4','N','2' UNION ALL
SELECT 'ACCCode1','Q3','Y','Workout' UNION ALL
SELECT 'ACCCode1','Q1','N','1' UNION ALL
SELECT 'ACCCode1','Q4','Y',NULL
)
SELECT *,'Remark'+CAST(ROW_NUMBER()OVER(ORDER BY (SELECT 1))AS varchar(10)) AS Question2
, ROW_NUMBER()OVER(PArtition by Question Order by Question ) AS Seq
INTO #t FROM cte
Using Dynamic Sql where the columns are not static
DECLARE #DyColumn1 Nvarchar(max),
#DyColumn2 Nvarchar(max),
#Sql Nvarchar(max),
#MAxDyColumn1 Nvarchar(max),
#MAxDyColumn2 Nvarchar(max),
#CombineColumn Nvarchar(max)
SELECT #DyColumn1=STUFF((SELECT DISTINCT ', '+QUOTENAME(Question) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #DyColumn2=STUFF((SELECT ', '+QUOTENAME(Question2) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #MAxDyColumn1=STUFF((SELECT DISTINCT ', '+'MAX('+QUOTENAME(Question)+') AS '+QUOTENAME(Question) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #MAxDyColumn2=STUFF((SELECT ', '+'MAX('+QUOTENAME(Question2)+') AS '+QUOTENAME(Question2) FROM #t FOR XML PATH ('')),1,1,'')
SELECT #CombineColumn=STUFF((SELECT DISTINCT ', '+QUOTENAME(Question)+','+QUOTENAME(Question2) FROM #t FOR XML PATH ('')),1,1,'')
SET #Sql='SELECT Title,'+#CombineColumn+' From
(
SELECT Title,'+#MAxDyColumn1+','+#MAxDyColumn2+' FRom
(
SELECT * FROM #t
)AS SRC
PIVOT
(
MAX(Answer) FOR Question IN('+#DyColumn1+')
) AS Pvt1
PIVOT
(
MAX(AnswerRemark) FOR Question2 IN('+#DyColumn2+')
) AS Pvt2
GROUP BY Title
)dt
'
PRINT #Sql
EXEC(#Sql)
Result
Title Q1 Remark1 Q1 Remark2 Q1 Remark3 Q1 Remark4 Q2 Remark5 Q2 Remark6 Q2 Remark7 Q3 Remark8 Q3 Remark9 Q3 Remark10 Q4 Remark11 Q4 Remark12
ACCCode1 Y NULL Y 1 Y 4 Y NULL N 6 N Workout N 7 Y Workout Y Workout Y 9 Y NULL Y 2

I don't know how big your data is, or how many questions are possible. A more generic Q&A structure done at the presentation layer would be far better, but for your specific request a more correct design would be a 3NF table. This will allow you to create a primary key that is highly optimised and create a secondary index by question type id. All your keys are now IDs which are far faster to search and match than strings:
Account Codes
AccID - AccName - columns for other data related to accounts
Stores each account you have.
Questions
QuestionID - QuestionName
List of possible questions, one row for every question you have, Q1, Q2 etc. You could add question categories here to exploit any commonality you have, e.g. if you have different surveys with the same set of questions, you could put them in one category and easily then query the below.
Results
AccId, QuestionID, Result, Result Remark
Contains one row for every question asked.
Query for your result still uses pivot, but now you can select the list of columns to use from a variable or dynamic SQL syntax, which means you can control it somewhat better and hte query itself should be better.
With that said, if you have any knowledge about your data whatsoever you can use it to make a static query which can then be indexed. Examples are here of this query: SQL Server 2005 Pivot on Unknown Number of Columns. You can then set the column names if required using the AS syntax, which unfortunately would require dynamic sql again (Change column name while using PIVOT SQL Server 2008).
By the way, what you are trying to do is specifically dealing with denormalised data, which is what nosql is good for, SQL Server gives you great help but you have to have some structure to your data.
If you aren't working for survey monkey and dealing with millions of variations, I'd seriously look at whether you can just make a table specific to each round of questions you get, and then simply denormalise it and add an explicit columns for each question and then make your entire logic just a select * from surveyxyztable where accountid = abc.

SQL query MAX(SUM(..)) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Table Structure:
Article(
model int(key),
year int(key),
author varchar(key),
num int)
num: number of articles wrote during the year
Find all the authors that each one of them in one year atleast wrote maximal number of articles (relative to all the other authors)
I tried:
SELECT author FROM Article,
(SELECT year,max(sumnum) s FROM
(SELECT year,author,SUM(num) sumnum FROM Article GROUP BY year,author)
GROUP BY year) AS B WHERE Article.year=B.year and Article.num=B.s;
Is this the right answer?
Thanks.

You might want to try a self-JOIN to get what you are looking for:
SELECT Main.author
FROM Article AS Main
INNER JOIN (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS SumMain
ON SumMain.year = Main.year
AND SumMain.author = Main.author
GROUP BY Main.author
HAVING SUM(Main.num) = MAX(SumMain.sumnum)
;
This would guarantee (as it is ANSI) you are getting the MAX of the SUMmed nums and only bringing back results for what you need. Keep in mind I only JOINed on those two fields because of the information provided ... if you have a unique ID you can JOIN on, or you require more specificity to get a 1-to-1 match, adjust accordingly.
Depending on what DBMS you are using, it can be simplified one of two ways:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
HAVING SUM(num) = MAX(sumnum)
) AS Main
;
Some DBMSes allow you to do multiple aggregate functions, and this could work there.
If your DBMS allows you to do OLAP functions, you can do something like this:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS Main
QUALIFY (
ROW_NUMBER() OVER (
PARTITION BY author
,year
ORDER BY sumnum DESC
) = 1
)
;
Which would limit the result set to only the highest sumnum, although you may need more parameters to handle things if you wanted the year to be involved (you are GROUPing by it, only reason I bring it up).
Hope this helps!

You mention for homework and a valid attempt, however incorrect.
This is under a premise (unclear since no sample data) that the model column is like an auto-increment, and there is only going to be one entry per author per year and never multiple records for the same author within the same year. Ex:
model year author num
===== ==== ====== ===
1 2013 A 15
2 2013 C 18
3 2013 X 17
4 2014 A 16
5 2014 B 12
6 2014 C 16
7 2014 X 18
8 2014 Y 18
So the result expected is highest article count in 2013 = 18 and would only return author "C". In 2014, highest article count is 18 and would return authors "X" and "Y"
First, get a query of what was the maximum number of articles written...
select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year
This would give you one record per year, and the maximum number of articles published... so if you had data for years 2010-2014, you would at MOST have 5 records returned. Now, it is as simple as joining this to the original table that had the matching year and articles
select
A2.*
from
( select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year ) PreQuery
JOIN Article A2
on PreQuery.Year = A2.Year
AND PreQuery.ArticlesPerYear = A2.num

I suggest a CTE
WITH maxyear AS
(SELECT year, max(num) AS max_articles
FROM article
GROUP BY year)
SELECT DISTINCT author
FROM article a
JOIN maxyear m
ON a.year=m.year AND a.num=m.max_articles;
and compare that in performance to a partition, which is another way
SELECT DISTINCT author FROM
(SELECT author, rank() AS r
OVER (PARTITION BY year ORDER BY num DESC)
FROM article) AS subq
WHERE r = 1;
I think some RDBMS will let you put HAVING rank()=1 on the subquery and then you don't need to nest queries.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas