I have a survey and I want to do aggregations based on the demographic data that come with their responses. However, the survey doesn't automatically format the data for that purpose.
For example, let's say we have a three question survey:
What is your Eye Color? (Demographic Question)
What is your Hair Color? (Demographic Question)
What is your Salary?
The table below is the raw survey data.
UserID
Question
Answer
1
Eye_Color
Brown
1
Hair_Color
Black
1
Salary
$100
2
Eye_Color
Blue
2
Hair_Color
Blond
2
Salary
$150
I want format my data to eventually perform group by's any question that uses "color" . Thus, I need to dynamically create columns on all questions that have "color" in them.
UserID
Question
Answer
Eye_Color
Hair_Color
1
Salary
$100
Brown
Black
2
Salary
$150
Blue
Blond
What SQL query can I use to do this dynamically? I thought about windowing, but I'm sure there is more to it. Also, I am using Google BigQuery for the database.
Thanks!
You might consider below using a dynamic SQL which left-join demographic information to question&answers for each user.
Note that I've added one more question for UserID 1 in your survey data.
CREATE TEMP TABLE responses AS
SELECT 1 UserID, 'Eye_Color' Question, 'Brown' Answer UNION ALL
SELECT 1 UserID, 'Hair_Color' Question, 'Black' Answer UNION ALL
SELECT 1 UserID, 'Salary' Question, '$100' Answer UNION ALL
SELECT 1 UserID, 'Car' Question, 'QM5' Answer UNION ALL
SELECT 2 UserID, 'Eye_Color' Question, 'Blue' Answer UNION ALL
SELECT 2 UserID, 'Hair_Color' Question, 'Blond' Answer UNION ALL
SELECT 2 UserID, 'Salary' Question, '$150' Answer;
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM (
-- Questions and answers except demographic informations
SELECT * FROM responses WHERE Question NOT LIKE '%%Color%%'
) LEFT JOIN (
-- Demographic informations
SELECT * FROM (
SELECT * FROM responses
) PIVOT (ANY_VALUE(Answer) FOR Question IN ('%s'))
) USING (UserID);
""", (SELECT STRING_AGG(DISTINCT Question, "','") FROM responses WHERE Question LIKE '%Color%'));
Query results
See Also :
EXECUTE IMMEDIATE - https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#execute_immediate
%%Color%% - PARSE_DATE not working in FORMAT() in BigQuery
Related
this question was probably asked somewhere but I can't seem to phrase it correctly in the search to find an accurate answer.
I'm doing a query on a Postgres DB, it has quite a few joins, the results are something like this:
WON | name | item
1 Joe A
1 Joe B
2 Smith A
So one row for each entry, I need to somehow get the result back as such:
WON | name | item
1 Joe A, B
2 Smith A
This can be done in the query or with NodeJS, there are hundreds to thousands of results for the query, so getting a distinct row (WON 1) then searching the DB for all entries that match it then repeating for the rest isn't feasible, so this may be better done in Node / Javascript, but I'm somewhat new to that, what would be a (somewhat) efficient way to do this?
If there IS a way to do this in the query itself then that would be my preference though.
Thanks
A sql approach:
SELECT won, name
,STRING_AGG(item, ',' ORDER BY item) AS items
FROM myTable
GROUP BY won, name
ORDER BY won, name
You can use GROUP BY and string_agg to cancat rows, somelike this:
Create table:
CREATE TABLE test
(
won int,
name character varying(255),
item character varying(255)
);
insert into test (won, name, item) values (1,'Joe', 'A'),(1, 'Joe', 'B'),(2, 'Smith', 'A')
And do this in the query:
select won, name, string_agg(item, ',') from test group by won, name order by won
See this example in sqlFiddle
This question already has an answer here:
Combine values from related rows into a single concatenated string value
(1 answer)
Closed 9 years ago.
I have a table that looks like:
Event ID Name
1 Bob
1 Steve
1 Tom
2 Bob
3 Steve
3 Tom
There are thousands of event IDs, and tens of unique names. I'd like an SQL query to return the following table:
Event ID Names
1 Bob, Steve, Tom
2 Bob
3 Steve, Tom
I'm looking for an aggregate function like SUM() or AVG() except that it joins strings instead of does mathematics.
EDIT: I'm stuck using MS Access on this one.
EDIT 2: I realize that this would be trivial in a client language, but I'm trying to see if I can get an all-SQL solution.
If you are using SQL server you could do something like this
SQL Fiddle Example
WITH x AS
(
SELECT event_id FROM users
GROUP BY event_id
)
SELECT x.event_id,
name = STUFF((SELECT ',' + name
FROM users WHERE event_id = x.event_id
FOR XML PATH('')), 1, 1, '')
FROM x
You don't mention which DBMS you are using but in MySQL it's pretty easy:
SELECT EventId, GROUP_CONCAT(Names)
FROM MyTable
GROUP BY EventId
In SQL Server it's a little trickier. The solution I typically see, requires that you use FOR XML PATH. You can find a good article on how to do this here.
In PostgreSQL this would be:
select EventId, string_agg(name, ',') as names
from the_table
group by EventId;
If you want the names sorted in the list:
select EventId, string_agg(name, ',' order by name) as names
from the_table
group by EventId
So I have 3 tables: Recommendation, Article and User.
Recommendation has 4 columns:
id | integer
article_id |integer
user_id |integer
submit_time |integer
Article has 3 columns:
id | integer
title
url
I need to obtain a list of all articles, while also annotating each row with a new recommended column, which is 1 if the user in question has recommended the article or 0 if not. There shouldn't be any duplicate Article in the result, and I need it ordered by the Recommendation's submit_time column.
This is on Postgres - 9.1.8.
SELECT DISTINCT ON(t.title) t.title,
t.id, t.url,
MAX(recommended) as recommended
FROM (
SELECT submitter_article.title as title,
submitter_article.id as id,
submitter_article.url as url,
1 as recommended
FROM submitter_article, submitter_recommendation
WHERE submitter_recommendation.user_id=?
AND submitter_recommendation.article_id=submitter_article.id
UNION ALL
SELECT submitter_article.title as title,
submitter_article.id as id,
submitter_article.url as url,
0 as recommended
FROM submitter_article
) as t
GROUP BY t.title, t.id, t.url, recommended
And I'm passing a user id into the ?
I've been trying to do this for a while but can't figure it out. The queries I come up with either return all recommended values as 0, or return duplicate Article rows (one with recommended=0 and the other with recommended=1).
Any ideas?
You don't need a subquery, CASE will do, DISTINCT ON is useless if you also use GROUP BY and you should use explicit joins instead of implicit joins. This query should get you started:
SELECT DISTINCT ON (sa.title) sa.title, sa.id, sa.url,
(CASE
WHEN sr.id IS NULL THEN 0
ELSE 1
END) AS recommended
FROM submitter_article AS sa
LEFT JOIN submitter_recommendation AS sr ON sa.id=sr.article_id
AND sr.user_id=?
ORDER BY sa.title,sr.submit_time DESC;
But there are still some things I'm not sure. You can have two articles with the same title but diffrent id? In that case you can select that which has earlier/later recommendation submit_time but what if there are no recommendations? You need logic for how to select distinct rows and for how to order things in the end.
I have a table which has the following columns and values
ID TYPE NAME
1 MAJOR RAM
2 MAJOR SHYAM
3 MAJOR BHOLE
4 MAJOR NATHA
5 MINOR JOHN
6 MINOR SMITH
My requirement is to right a stored procedure (or SQL query) which would return the same resultset except that there will be blank line after the TYPE changes from one type to another type (major, minor).
MAJOR RAM
MAJOR SHYAM
MAJOR BHOLE
MAJOR NATHA
MINOR JOHN
MINOR SMITH
While i use this query for adding blank line but it is not sorted by basis of ID
select TYPE, NAME from (
select
TYPE as P1,
1 as P2,
ID,
TYPE,
NAME
from EMP
union all
select distinct
TYPE,
2,
'',
'',
N''
from EMP
) Report
order by P1, P2
go
How i sort data by ID
Thanks in advance
Yes, yes, don't do this, but here's the query to do it, assuming SQL Server 2008 R2. Other versions/rdbms you can achieve same functionality by writing two separate queries unioned together.
Query
; WITH DEMO (id, [type], [name]) AS
(
SELECT 1,'MAJOR','RAM'
UNION ALL SELECT 2,'MAJOR','SHYAM'
UNION ALL SELECT 3,'MAJOR','BHOLE'
UNION ALL SELECT 4,'MAJOR','NATHA'
UNION ALL SELECT 5,'MINOR','JOHN'
UNION ALL SELECT 6,'MINOR','SMITH'
)
, GROUPED AS
(
SELECT
D.[type]
, D.[name]
, ROW_NUMBER() OVER (ORDER BY D.[type] ASC, D.[name] DESC) AS order_key
FROM
DEMO D
GROUP BY
--grouping sets introduced with SQL Server 2008 R2
-- http://msdn.microsoft.com/en-us/library/bb510427.aspx
GROUPING SETS
(
[type]
, ([type], [name])
)
)
SELECT
CASE WHEN G.[name] IS NULL THEN NULL ELSE G.[type] END AS [type]
, G.[name]
FROM
GROUPED G
ORDER BY
G.order_key
Results
If you don't like the nulls, use coalsece to make empty strings
type name
MAJOR SHYAM
MAJOR RAM
MAJOR NATHA
MAJOR BHOLE
NULL NULL
MINOR SMITH
MINOR JOHN
NULL NULL
I agree with billinkc.
In a sequential mind, like mine, it can occur different.
The approach is to use a cursor and insert the records into a temp table.
This table can have a column, INT type, lets say it is called "POSITION" which increments with every insert.
Check for ID changes, and add the empty row everytime it does.
Finally make the SELECT order by "POSITION".
My context was:
An interface that dinamically adjust to what the user needs, one of the screens shows a payment table, grouped by provider with the approach early mentioned.
I decided to manage this from database and skip maintainance for the screen at client side because every provider has different payment terms.
Hope this helps, and lets keep an open mind, avoid saying "don't do this" or "this is not what SQL was designed for"
Ok so I am writing a report against a third party database which is in sql server 2005. For the most part its normalized except for one field in one table. They have a table of users (which includes groups.) This table has a UserID field (PK), a IsGroup field (bit) , a members field (text) this members field has a comma separated list of all the members of this group or (if not a group) a comma separated list of the groups this member belongs to.
The question is what is the best way to write a stored procedure that displays what users are in what groups? I have a function that parses out the ids into a table. So the best way I could come up with was to create a cursor that cycles through each group and parse out the userid, write them to a temp table (with the group id) and then select out from the temp table?
UserTable
Example:
ID|IsGroup|Name|Members
1|True|Admin|3
2|True|Power|3,4
3|False|Bob|1,3
4|False|Susan|2
5|True|Normal|6
6|False|Bill|5
I want my query to show:
GroupID|UserID
1|3
2|3
2|4
5|6
Hope that makes sense...
If you have (or could create) a separate table containing the groups you could join it with the users table and match them with the charindex function with comma padding of your data on both sides. I would test the performance of this method with some fairly extreme workloads before deploying. However, it does have the advantage of being self-contained and simple. Note that changing the example to use a cross-join with a where clause produces the exact same execution plan as this one.
Example with data:
SELECT *
FROM (SELECT 1 AS ID,
'1,2,3' AS MEMBERS
UNION
SELECT 2,
'2'
UNION
SELECT 3,
'3,1'
UNION
SELECT 4,
'2,1') USERS
LEFT JOIN (SELECT '1' AS MEMBER
UNION
SELECT '2'
UNION
SELECT '3'
UNION
SELECT '4') GROUPS
ON CHARINDEX(',' + GROUPS.MEMBER + ',',',' + USERS.MEMBERS + ',') > 0
Results:
id members group
1 1,2,3 1
1 1,2,3 2
1 1,2,3 3
2 2 2
3 3,1 1
3 3,1 3
4 2,1 1
4 2,1 2
Your technique will probably be the best method.