Table.* notation does not work in a 'group by' query - sql

On an oracle database, the Table.* notation does not work inside a 'select..group by..' query.
This query with no * works :
select A.id from TABLE_A A INNER JOIN TABLE_B B on A.id=B.aid group by A.id
This one with a * does not :
select A.* from TABLE_A A INNER JOIN TABLE_B B on A.id=B.aid group by A.id
The output is
00979. 00000 - "not a GROUP BY expression"
Why does this query not work? Is there a simple workaround?

Everything you selecting except agregate functions (MIN, MAX, SUM, AVG, COUNT...) must be in Group by

Yes, there is a workaround.
Assuming that each id in A is unique, then you don't even need to use group by, just:
select * from A
where id in (
select id from b
);
If id are not unique in A table, then you can simulate MySql functionality with this query:
select * from A
where rowid in (
select min( a.rowid )
from a
join b on a.id = b.id
group by a.id
);
Here is a link to SQL Fiddle demo
Here is a link to MySql documentation where their extension to group by is explained: http://dev.mysql.com/doc/refman/5.1/en/group-by-extensions.html
Pay attention to this fragment:
You can use this feature to get better performance by avoiding
unnecessary column sorting and grouping. However, this is useful
primarily when all values in each nonaggregated column not named in
the GROUP BY are the same for each group. The server is free to choose
any value from each group, so unless they are the same, the values
chosen are indeterminate. Furthermore, the selection of values from
each group cannot be influenced by adding an ORDER BY clause. Sorting
of the result set occurs after values have been chosen, and ORDER BY
does not affect which values within each group the server chooses.

A group by expression must include all the columns you select. So, if the table has 3 columns (column1, column2 and column3), you have to group by all of them like this: group by Column1, Column2, Column3. The * means you select all the columns, so add all of them in the group by expression.

Related

Snowflake, SQL where clause

I need to write query with where clause:
where
pl.ods_site_id in (select id from table1 where ...)
But if subquery (table1) didn't return anything, where clause doesn't need to include in result query (like it returns TRUE).
How can I do it? (I have snowflake SQL dialect)
You could include a second condition:
where pl.ods_site_id in (select id from table1 where ...) or
not exists (select id from table1 where ...)
This explicitly checks for the subquery returning no rows.
If you are willing to use a join instead, Snowflake supports qualify clause which might come in handy here. You can run this on Snowflake to see how it works.
with
pl (ods_site_id) as (select 1 union all select 5),
table1 (id) as (select 5) --change this to 7 to test if it returns ALL on no match
select a.*
from pl a
left join table1 b on a.ods_site_id = b.id -- and other conditions you want to add
qualify b.id = a.ods_site_id --either match the join condition
or count(b.id) over () = 0; --or make sure there is 0 match from table1

Filter by count from another table

This query works fine only without WHERE, otherwise there is an error:
column "cnt" does not exist
SELECT
*,
(SELECT count(*)
FROM B
WHERE A.id = B.id) AS cnt
FROM A
WHERE cnt > 0
Use a subquery:
SELECT a.*
FROM (SELECT A.*,
(SELECT count(*)
FROM B
WHERE A.id = B.id
) AS cnt
FROM A
) a
WHERE cnt > 0;
Column aliases defined in the SELECT cannot be used by the WHERE (or other clauses) for that SELECT.
Or, if the id on a is unique, you can more simply do:
SELECT a.*, COUNT(B.id)
FROM A LEFT JOIN
B
ON A.id = B.id
GROUP BY A.id
HAVING COUNT(B.id) > 0;
Or, if you don't really need the count, then:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Assumptions:
You need all columns from A in the result, plus the count from B. That's what your demonstrated query does.
You only want rows with cnt > 0. That's what started your question after all.
Most or all B.id exist in A. That's the typical case and certainly true if a FK constraint on B.id references to A.id.
Solution
Faster, shorter, correct:
SELECT * -- !
FROM (SELECT id, count(*) AS cnt FROM B) B
JOIN A USING (id) -- !
-- WHERE cnt > 0 -- this predicate is implicit now!
Major points
Aggregate before the join, that's typically (substantially) faster when processing the whole table or major parts of it. It also defends against problems if you join to more than one n-table. See:
Aggregate functions on multiple joined tables
You don't need to add the predicate WHERE cnt > 0 any more, that's implicit with the [INNER] JOIN.
You can simply write SELECT *, since the join only adds the column cnt to A.* when done with the USING clause - only one instance of the joining column(s) (id in the example) is added to the out columns. See:
How to drop one join key when joining two tables
Your added question in the comment
postgres really allows to have outside aggregate function attributes that are not behind group by?
That's only true if the PK column(s) is listed in the GROUP BY clause - which covers the whole row. Not the case for a UNIQUE or EXCLUSION constraint. See:
Return a grouped list with occurrences using Rails and PostgreSQL
SQL Fiddle demo (extended version of Gordon's demo).

how to do group by in HIVE

I'm writing a hive query in which I need to group by a few field however I need to select some other fields besides those exist in the group by statement. That is,
select A,B,C from table_name GROUP BY A,B
HIVE complains and says Invalid table alias or column reference C. It requires me to put C in the GROUP BY part but that changes my logic. How can I resolve this issue?
HIVE-Select-statement-and-group-by-clause - group by must be used with some aggregate function like count, sum etc.
so there must be one of the aggregate calculation on column C
example:
select A,B,count(C) as Total_C from table_name GROUP BY A,B;
select A,B,SUM(C) as Total_C from table_name GROUP BY A,B;
You have to join after group by.
select T1.*, t2.c from (select a,b, count(*) as SomeAggFunc from table group by a,b) T1;
<join condition> table t2 on t1.a=t2.a and t1.b=t2.b
You can try using cluster by instead of group by
select A,B,C from table_name CLUSTER BY A,B

How do I get the following SQL statement to work and return a join while picking one string?

I have two tables, A and B. Table A has an id and a string, and table B has a pointer to an id in A and a number (float).
I want to select everything from table B, averaging the number as group by id in table A, while also showing the string from table A.
This doesn't work:
select a.id,b.id,avg(num),str from a,b where a.id=b.id;
It gives me an error about grouping str.
Any time you use an aggregate function, like avg(), you have to group the results that are not already grouped
SELECT a.id, b.id, avg(num), a.str FROM a, b WHERE a.id = b.id
GROUP BY a.id, b.id, a.str;
That's basically it. Except that you also need to clarify which table "str" comes from. Is it a or b?
(Answer updated as per comment.)
Use group by clause To use aggregate functions you have to use group by
select a.id,b.id,avg(num),a.str from a,b where a.id=b.id group by a.id
Hope it will help.

How do I get a count of items in one column that match items in another column?

Assume I have two data tables and a linking table as such:
A B A_B_Link
----- ----- -----
ID ID A_ID
Name Name B_ID
2 Questions:
I would like to write a query so that I have all of A's columns and a count of how many B's are linked to A, what is the best way to do this?
Is there a way to have a query return a row with all of the columns from A and a column containing all of linked names from B (maybe separated by some delimiter?)
Note that the query must return distinct rows from A, so a simple left outer join is not going to work here...I'm guessing I'll need nested select statements?
For your first question:
SELECT A.ID, A.Name, COUNT(ab.B_ID) AS bcount
FROM A LEFT JOIN A_B_Link ab ON (ab.A_ID = A.ID)
GROUP BY A.ID, A.Name;
This outputs one row per row of A, with the count of matching B's. Note that you must list all columns of A in the GROUP BY statement; there's no way to use a wildcard here.
An alternate solution is to use a correlated subquery, as #Ray Booysen shows:
SELECT A.*,
(SELECT COUNT(*) FROM A_B_Link
WHERE A_B_Link.A_ID = A.A_ID) AS bcount
FROM A;
This works, but correlated subqueries aren't very good for performance.
For your second question, you need something like MySQL's GROUP_CONCAT() aggregate function. In MySQL, you can get a comma-separated list of B.Name per row of A like this:
SELECT A.*, GROUP_CONCAT(B.Name) AS bname_list
FROM A
LEFT OUTER JOIN A_B_Link ab ON (A.ID = ab.A_ID)
LEFT OUTER JOIN B ON (ab.B_ID = B.ID)
GROUP BY A.ID;
There's no easy equivalent in Microsoft SQL Server. Check here for another question on SO about this:
"Simulating group_concat MySQL function in MS SQL Server 2005?"
Or Google for 'microsoft SQL server "group_concat"' for a variety of other solutions.
For #1
SELECT A.*,
(SELECT COUNT(*) FROM A_B_Link WHERE A_B_Link.A_ID = AOuter.A_ID)
FROM A as AOuter
SELECT A.*, COUNT(B_ID)
FROM A
LEFT JOIN A_B_Link ab ON ab.A_ID=A.ID