Transpose Rows and Columns after Aggregate Query - sql

I have a database which is a list of respondents to a 54-question survey in the following format...
Respondent ID Q_1 Q_2 Q_3 ... Q_54
1 5 3 [null] 2
...
3000 [null] 3 3 5
...and I have an aggregate query to get the count of respondents to each question...
Count_Q_1 Count_Q_2 Count_Q_3 ... Count_Q_54
1547 602 2999 1874
...and I am looking for a way to transpose the columns in the above query to get the following result...
Question Count_Respondents
Q_1 1547
Q_2 602
Q_3 2999
... ...
Q_54 1874
...is there any way for me to do this without 54 UNION queries (or multiple blocks of UNION queries that roll up to a master UNION query)?

No, there is not. Your results should've been normalized in the first place, Access doesn't support UNPIVOT or anything similar to it.
You can either unpivot data with many union queries, by using VBA, or by moving the data to Excel/SQL server/another program and unpivoting the results there.

Related

Filling in missing column details in SQL table when similar rows are populated?

I have a (probably) very easy question for a SQL server data issue. I have some test data with missing Customer IDs in certain rows - but I know that when the Details column is the same, I will have the same ID.
Meaning, for row 6 Customer will be 3 since it has the same Details as row 5 and 4.
Customer
Details
Date
Amount
1
40495BS
15/01/2022
300
1
40495BS
10/02/2022
250
2
83825NO
31/10/2021
100
3
90401HI
01/06/2022
525
3
90401HI
07/09/2022
130
90401HI
09/05/2022
-130
4
17452RE
14/07/2022
125
Any ideas for a fix to return all the missing Customer IDs based on this logic?
Actually MAX() used as an analytic function might work well here:
SELECT MAX(Customer) OVER (PARTITION BY Details) AS Customer,
Details, Date, Amount
FROM yourTable;

SUM by Combination in SAS

I want to get from this table:
[ProductCode] [ClientNO] [Fund]
11 3 100
12 4 45
11 3 18
12 4 5
To this one:
[ProductCode] [ClientNO] [Fund]
11 3 118
12 4 50
So basically sum FUND when all the given variables match.
I'm almost there with this statement:
Proc sql;
create table SumByCombination as
select *, sum(Fund) as Total
from FundsData
group by ProductCode,ClientNO
;
quit;
But with this I get all the rows (duplicates) with a SUM column.
Edit: This is what I get.
[ProductCode] [ClientNO] [_SUM_]
11 3 118
12 4 50
11 3 118
12 4 50
I know this should be a no-brainer but I keep getting stuck.
What would be the easiest way to do this in Proc SQL ? What about other methods ?
Thanks
Stop using SELECT * in your queries. You should explicitly identify the columns that you want the SELECT to return.
Select * is nasty and evil and should very very rarely, if ever, be used.
Here is the SQL Fiddle, which returns your expected result
select ProductCode
,ClientNO
,sum(Fund) as Total
from FundsData
group by
ProductCode
,ClientNO
You're using SAS, so do it the SAS way - PROC MEANS.
proc means data=fundsdata;
var fund;
class productcode clientno;
types productcode*clientno;
output out=sumbycombination sum(fund)=fund;
run;

Performing Comparisons "down" a table, not across rows

I have a SQL problem that I don't have the vocabulary to explain very well.
Here is a simplified table. My goal is to identify groups where the Tax_IDs are not equal. In this case, the query should return groups 1 and 3.
Group Line_ID Tax_ID
1 1001 11
1 1002 13
2 1003 17
2 1004 17
3 1005 23
3 1006 29
I can easily perform comparisons across rows, however I do not know how to perform comparisons "down" a table (here is really where my vocabulary fails me). I.e. what is the syntax that will cause SQL to compare Tax_ID values within groups?
Any help appreciated,
OB
The simplest way is to use group by with a having clause:
select "group"
from t
group by "group"
having min(tax_id) <> max(tax_id);
You can also phrase the having clause as:
having count(distinct tax_id) > 1;
However, count(distinct) is more expensive than just a min() or max()operation.

Cartesian product and WHERE clause issue

Let us have simple table:
CREATE TABLE dbo.test
(
c1 INT
)
INSERT INTO test (c1) VALUES (1)
INSERT INTO test (c1) VALUES (2)
INSERT INTO test (c1) VALUES (3)
Next calculate some SUM:
SELECT SUM(t1.c1) FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
Output is: 6 . Simple and easy.
But if I run:
SELECT SUM(t1.c1), * FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
The output is:
6 2 2
6 2 3
6 2 1
6 3 2
6 3 3
6 3 1
6 1 2
6 1 3
6 1 1
My question is: Why the second output is not matching the condition in WHERE clause?
Looks like Sybase implements it's own extensions to GROUP BY:
Through the following extensions, Sybase lifts restrictions on what
you can include or omit in the select list of a query that includes
group by.
The columns in the select list are not limited to the grouping columns
and columns used with the vector aggregates.
The columns specified by group by are not limited to those
non-aggregate columns in the select list.
However, the results of the extension are not always intuitive:
When you use the Transact-SQL extensions in complex queries that
include the where clause or joins, the results may become even more
difficult to understand.
How does this relate to your problem?
However, the way that Adaptive Server handles extra columns in the
select list and the where clause may seem contradictory. For example:
select type, advance, avg(price)
from titles
where advance > 5000
group by type
type advance
------------- --------- --------
business 5,000.00 2.99
business 5,000.00 2.99
business 10,125.00 2.99
business 5,000.00 2.99
mod_cook 0.00 2.99
mod_cook 15,000.00 2.99
popular_comp 7,000.00 21.48
popular_comp 8,000.00 21.48
popular_comp NULL 21.48
psychology 7,000.00 14.30
psychology 2,275.00 14.30
psychology 6,000.00 14.30
psychology 2,000.00 14.30
psychology 4,000.00 14.30
trad_cook 7,000.00 17.97
trad_cook 4,000.00 17.97
trad_cook 8,000.00 17.97
(17 rows affected)
It only seems as if the query is ignoring the where clause when you
look at the results for the advance (extended) column. Adaptive Server
still computes the vector aggregate using only those rows that satisfy
the where clause, but it also displays all rows for any extended
columns that you include in the select list. To further restrict these
rows from the results, you must use a having clause.
So, to give you the results you would expect, Sybase should allow you to do:
SELECT SUM(t1.c1), * FROM test AS t1 , test AS t2
WHERE t2.c1 = 1
HAVING t2.c1 = 1
The WHERE will exclude the results from the total SUM; the HAVING will hide records that don't match the condition.
Confusing, isn't it?
Instead, you'd probably be better off writing the query so that it doesn't require Sybase's GROUP BY extensions.

Need help with complex sorting in SQL

I have a complex sorting problem with my SQL statement. I have a table with the following columns.
No Time Value
-- ---- -----
1 0900 ''
2 1030 ''
3 1020 ''
4 1010 ''
5 1100 ''
1 1015 'P'
2 1045 'P'
I want to sort this table by doing the following steps.
Select rows from the table where Value is '' (empty string) and sort it by No.
Select rows from the same table where Value is 'P' and then sort it by time.
Select each row from 2) and insert into 1) by time.
The result should be something like this.
No Time Value
-- ---- -----
1 0900 ''
1 1015 'P'
2 1030 ''
3 1020 ''
4 1010 ''
2 1045 'P'
5 1100 ''
How can I do this in SQL?
Edit: thanks for comments.
On rereading, I don't think part 3 of your question makes sense. The result from step 1) is not sorted by time, and you cannot insert in it by time.
For example, in your example result, the second row is has time 1015, that is between 0900 and 1030. But it could also be between the 1020 and 1010 rows further on?
Unfortunately, I don't think you can do this with a standard SQL query, and the reason is that your algorithm is not set-oriented. Your sample dataset illustrates this -- you have the first 'P' record showing up between the 0900 and 1030 records, but it would be just as appropriate to put it between the 1010 and 1045 records based on your criteria. If it's correct to have it in the position you show, you need to modify your condition to be something like "place each row from #2 between the first two rows in #1 that bracket it in time", where "first" is defined by the sorting criteria of #1.
The upshot is that this type of setup will likely force you into a cursor-based solution. You might be able to avoid this if you can identify a composite value to order upon, but based on what you have above I don't see what that might be.
It seems like you can express your need in a simpler need.
You always sort by time, possibly also by Value then by no.
sort by time, value, no
This will sort everything by time. For two identical times, the sorting on value will be applied, etc.
You could sort on a unique number, that you build to combine your criteria. But that would be more complex.