Treat Multiple Columns as 1 in SQL to Get Aggregates - sql

Is it possible to get counts, too? My UNIONs get me all distinct values across all 4 columns but now I need to know how many times each value appears across all 4 columns. Need to stay with stock SQL, if possible.
(SELECT DISTINCT classify1 AS classified FROM class) UNION
(SELECT DISTINCT classify2 AS classified FROM class) UNION
(SELECT DISTINCT classify3 AS classified FROM class) UNION
(SELECT DISTINCT classify4 AS classified FROM class)
ORDER BY classified
Returns:
A
B
C
D
E
F
H
Need:
A | 3
B | 3
C | 4
D | 3
E | 1
F | 1
H | 1

SQL Fiddle
SELECT a.classified, COUNT(*)
FROM
(
(SELECT classify1 AS classified FROM class) UNION ALL
(SELECT classify2 AS classified FROM class) UNION ALL
(SELECT classify3 AS classified FROM class) UNION ALL
(SELECT classify4 AS classified FROM class)) a
GROUP BY a.classified
Result
| CLASSIFIED | COLUMN_1 |
-------------------------
| A | 3 |
| B | 3 |
| C | 4 |
| D | 3 |
| E | 1 |
| F | 1 |
| H | 1 |
When you use DISTINCT you eliminate the extra 'A' in classify3

Use UNION ALL instead of UNION, the embed the result in a sub-query to perform your aggregate on.
SELECT
classified,
COUNT(*)
FROM
(
(SELECT classify1 AS classified FROM class) UNION ALL
(SELECT classify2 AS classified FROM class) UNION ALL
(SELECT classify3 AS classified FROM class) UNION ALL
(SELECT classify4 AS classified FROM class)
)
AS unified_data
GROUP BY
classified
ORDER BY
classified

Related

Multiple select from CTE with different number of rows in a StoredProcedure

How to do two select with joins from the cte's which returns total number of columns in the two selects?
I tried doing union but that appends to the same list and there is no way to differentiate for further use.
WITH campus AS
(SELECT DISTINCT CampusName, DistrictName
FROM dbo.file
),creditAcceptance AS
(SELECT CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal, COUNT(id) AS N
FROM dbo.file
WHERE (EligibilityStatusFinal LIKE 'Eligible%') AND (CollegeCreditEarnedFinal = 'Yes') AND (CollegeCreditAcceptedFinal = 'Yes')
GROUP BY CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal
),eligibility AS
(SELECT CampusName, EligibilityStatusFinal, COUNT(id) AS N, CollegeCreditAcceptedFinal
FROM dbo.file
WHERE (EligibilityStatusFinal LIKE 'Eligible%')
GROUP BY CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal
)
SELECT a.CampusName, c.[EligibilityStatusFinal], SUM(c.N) AS creditacceptCount
FROM campus as a FULL OUTER JOIN creditAcceptance as c ON a.CampusName=c.CampusName
WHERE (a.DistrictName = 'xy')
group by a.CampusName ,c.EligibilityStatusFinal
Union ALL
SELECT a.CampusName , b.[EligibilityStatusFinal], SUM(b.N) AS eligible
From Campus as a FULL OUTER JOIN eligibility as b ON a.CampusName = b.CampusName
WHERE (a.DistrictName = 'xy')
group by a.CampusName,b.EligibilityStatusFinal
Expected output:
+------------+------------------------+--------------------+
| CampusName | EligibilityStatusFinal | creditacceptCount |
+------------+------------------------+--------------------+
| M | G | 1 |
| E | NULL | NULL |
| A | G | 4 |
| B | G | 8 |
+------------+------------------------+--------------------+
+------------+------------------------+----------+
| CampusName | EligibilityStatusFinal | eligible |
+------------+------------------------+----------+
| A | G | 8 |
| C | G | 9 |
| A | T | 9 |
+------------+------------------------+----------+
As you can see here CTEs can be used in a single statement only, so you can't get the expected output with CTEs.
Here is an excerpt from Microsoft docs:
A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE
statement that references some or all the CTE columns. A CTE can also
be specified in a CREATE VIEW statement as part of the defining SELECT
statement of the view.
You can use table variables (declare #campus table(...)) or temp tables (create table #campus (...)) instead.

SQL Match group of records to another group of records

Is there SQL statement to match up multiple records to an exact match of multiple records in another table?
Lets say I have table A
ID | List# | Item
1 | 5 | A
2 | 5 | C
3 | 5 | B
4 | 6 | A
5 | 6 | D
*I purposely made Items 'ABC' out of order as the order of the records I receive may be out of order.
Table B
ID | Group | Item
1 | AAA | A
2 | AAA | B
3 | AAA | C
4 | AAA | D
5 | BBB | A
6 | BBB | B
7 | BBB | C
8 | DDD | A
If looking at the first table, I would want List# 5 to return a match only for group 'BBB', as all (and only) three records match.
The simplest way is to aggregate into a string or array and join. Standard SQL supports listagg(), so you can do:
select a.list, b.list, a.items
from (select a.list, listagg(item, ',') within group (order by item) as items
from a
group by a.list
) a join
(select b.list, listagg(item, ',') within group (order by item) as items
from b
group by b.list
) b
on a.items = b.items;
Not all databases support listagg(). Many -- but not all -- have similar functionality. This is simpler than the "standard" SQL approach.
You can simulate database division. It's a little bit cumbersome but here it is:
with
x as (
select
from a
where a.list = 5
),
y as (
select grp, count(*) as cnt
from b
join x on x.item = b.item
group by grp
)
select grp
from y
where cnt = (select count(*) from x)

Redshift create all the combinations of any length for the values in one column

How can we create all the combinations of any length for the values in one column and return the distinct count of another column for that combination?
Table:
+------+--------+
| Type | Name |
+------+--------+
| A | Tom |
| A | Ben |
| B | Ben |
| B | Justin |
| C | Ben |
+------+--------+
Output Table:
+-------------+-------+
| Combination | Count |
+-------------+-------+
| A | 2 |
| B | 2 |
| C | 1 |
| AB | 3 |
| BC | 2 |
| AC | 2 |
| ABC | 3 |
+-------------+-------+
When the combination is only A, there are Tom and Ben so it's 2.
When the combination is only B, 2 distinct names so it's 2.
When the combination is A and B, 3 distinct names: Tom, Ben, Justin so it's 3.
I'm working in Amazon Redshift. Thank you!
NOTE: This answers the original version of the question which was tagged Postgres.
You can generate all combinations with this code
with recursive td as (
select distinct type
from t
),
cte as (
select td.type, td.type as lasttype, 1 as len
from td
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
You can then use this in a join:
with recursive t as (
select *
from (values ('a'), ('b'), ('c'), ('d')) v(c)
),
cte as (
select t.c, t.c as lastc, 1 as len
from t
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
select type, count(*)
from (select name, cte.type, count(*)
from cte join
t
on cte.type like '%' || t.type || '%'
group by name, cte.type
having count(*) = length(cte.type)
) x
group by type
order by type;
There is no way to generate all possible combinations (A, B, C, AB, AC, BC, etc) in Amazon Redshift.
(Well, you could select each unique value, smoosh them into one string, send it to a User-Defined Function, extract the result into multiple rows and then join it against a big query, but that really isn't something you'd like to attempt.)
One approach would be to create a table containing all possible combinations — you'd need to write a little program to do that (eg using itertools in Python). Then, you could join the data against that reasonably easy to get the desired result (eg IF 'ABC' CONTAINS '%A%').

Group by 3 columns: "Each group by expression must contain at least one column that is not an outer reference"

I know questions regarding this error message have been asked already, but I couldn't find any that really fit my problem.
I have a table with three columns (A,B,C) containing different values and I need to identify all the identical combination. For example out of "TABLE A" below:
| A | B | C |
| 1 | 2 | 3 |
| 1 | 3 | 3 |
| 1 | 2 | 3 |
| 2 | 2 | 2 |
| 1 | 3 | 3 |
... I would like too get "TABLE B" below:
| A | B | C | count |
| 1 | 2 | 3 | 1 |
| 1 | 3 | 3 | 1 |
| 2 | 2 | 2 | 1 |
(I need the last column "count" with 1 in each row for later usage)
When I try with "group by A,B,C" I get the error mentioned in the title. Any help would be greatly appreciated!
FYI, I don't think it really changes the matter, but "TABLE A" is obtained from an other table: "SOURCE_TABLE", thanks to a query of the type:
select (case when ... ),(case when ...),(case when ...) from SOURCE_TABLE
and I need to build "TABLE B" with only one query.
i think what you are after of is using distinct
select distinct A,B,C, 1 [count] -- where 1 is a static value for later use
from (select ... from sourcetable) X
Sounds like you have the right idea. My guess is that the error is occurring due to an outer reference in your CASE statements. If you wrapped your first query in another query, it may alleviate this issue. Try:
SELECT A, B, C, COUNT(*) AS [UniqueRowCount]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C FROM SOURCE_TABLE
) AS Subquery
GROUP BY A, B, C
After re-reading your question, it seems that you're not counting at all, just putting a "1" after each distinct row. If that's the case, then you can try:
SELECT DISTINCT A, B, C, [Count]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE
) AS Subquery
Assuming your outer reference exceptions were occurring in only your aggregations, you should also simply try:
SELECT DISTINCT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE

subquery to derive a value if all values of the column is the same for

Hi i have a situation here with Oracle SQL to come out with the sql result as the following :-
Company No of Employees Group Derived Field
a 1 x
b 1 x
c 2 y
d 1 y
so based on the group if all the company has same no of employees then i want the derived field to be
true else false.
So for group x , if company a and b has the same no of employees then derived field for
a and b would be true. As for c and d because the no of employees is different so the derived field
should be false.
any help would be appreciated. thanks
You want to use an analytic function. I think this is what you want:
select t.*,
(case when min(NumEmployees) over (partition by grp) =
max(NumEmployees) over (partition by grp)
then 1
else 0
end) as DerivedField
from table t;
Note: I usually represent booleans as 0 and 1.
Here is a purely sql solution. Assume the column names are company,number,ggroup,test.
for convenience create a view of table t as
create view gnums as select count(distinct number)as gnum,count(number) as num, ggroup from t group by ggroup;
The this select
select a.*,b.gnum = 1 and b.num <> 1 as test from t a,gnums b where a.ggroup = b.ggroup;
yields
id | company | number | ggroup | test
----+---------+--------+--------+------
1 | a | 1 | x | t
2 | b | 1 | x | t
3 | c | 2 | y | f
4 | d | 1 | y | f
5 | e | 2 | z | f
I added row e to show that any group with only one company should yield false, assuming that's what you intended.