Let's assume I have the following tables in an Oracle database:
TBL_A with the columns ID, C_1, C_2, C_3, ..., C_20 (primary key: ID)
TBL_B with the columns ID, A_ID, C_1, C_2, C_3, ..., C_20 (foreign key A_ID references TBL_A.ID)
TBL_C, TBL_D etc. with the same generic layout as TBL_B
Now, I am trying to build a report based on grouping rows from TBL_A and at the same time e.g. aggregating different data (sums, counts, min/max/avg values, etc.) from the additional tables (TBL_B, TBL_C, etc.), in which some additional criteria is met.
My problem probably boils down to how (if it's possible at all) to connect data from TBL_x in a subquery, if the primary query is based on a select from TBL_A using a GROUP BY clause, e.g. like this:
select a.c_1,
count(a.id) as cnt, -- number of matches in TBL_A for this group
(select count(*) from tbl_b b where b.a_id = a.id and b.c_1 = 2) as b_cnt,
(select sum(c_5) from tbl_c c where c.a_id = a.id and c.c_3 = 3) as c_sum
from tbl_a a
where ...
group by a.c_1;
Even if Oracle won't execute this code (ORA-00979, a.id is not a GROUP BY expression), I hope the purpose of the query should be obvious. In this case, I need a four column result set with:
All distinct values of TBL_A.C_1.
The number of rows in TBL_A within this group.
The number of rows in TBL_B where C_1 = 2 and A_ID refers to any of the rows in TBL_A contained in this group.
The sum of C_5 of the rows in TBL_B where C_3 = 3 and A_ID refers to any of the rows in TBL_A contained in this group.
I know I could rewrite the subqueries, so that the group-by columns are repeated in the where clause, e.g. like this for just one of the columns:
select a.c_1,
(select count(*) from tbl_b b, tbl_a a2
where a2.c_1 = a.c_1 and b.a_id = a2.id and b.c_1 = 2) as b_cnt_2
from tbl_a a
where ...
group by a.c_1;
But in this case, I would both have to repeat all group by columns and the outer where clause in all subqueries and since in reality the where clause is rather long and I have both quite a few columns in the group by clause, as well as many subqueries referring to different tables with different relations to TBL_A, the SQL statement will probably end up as a complete mess.
Is it really not possible in Oracle to use values from individual rows within a group in subqueries like I tried in the first example (both b.a_id = a.id as well as b.a_id in a.id fails)? I also considered doing some tricks with listagg, but Oracle seem not to accept any aggregating functions in the subquery clause (ORA-00934 group function is not allowed here). I would understand the limitation in the outer where clause, but not why this is not allowed in the subquery where clause.
I have tried to implement the query by joining the additional tables (TBL_B, TBL_C, etc) with outer joins instead of writing subqueries, but this will expand the result (creating several combinations of all involved tables) before grouping, so that the same row is considered more than once by the aggregation functions. E.g. having two rows in TBL_B referring to one row in TBL_A, count(a.id) would count the same row in TBL_A twice.
Anyone with an idea how to proceed?
Probably the simplest way is to use a subquery or CTE:
with a as (
select a.c_1, a.id,
count(a.id) as cnt, -- number of matches in TBL_A for this group
(select count(*) from tbl_b b where b.a_id = a.id and b.c_1 = 2) as b_cnt,
(select sum(c_5) from tbl_c c where c.a_id = a.id and c.c_3 = 3) as c_sum
from tbl_a a
where ...
group by a.id
)
select a.c_1,
sum(cnt) as cnt,
sum(b_cnt) as b_cnt,
sum(c_sum) as c_sum
from a
group by a.c_1;
This will work fine for most aggregation functions. If you have an avg(), then do the sum and count separately, and divide the totals for the average. If you have a count(distinct) then this will not work. Your question has neither of these.
If I understand what you're trying to do correctly you can get the aggregate for each a.id in a derived table, join those aggregates to table a and then aggregate again using sum
select
a.c_1
count(a.id),
coalesce(sum(b_count),0) b_cnt,
coalesce(sum(c_sum),0) c_sum
from tbl_a a
left join (
select a_id, count(*) b_count from tbl_b
where c_1 = 2
group by a_id
) b on b.a_id = a.id
left join (
select a_id, sum(c_5) c_sum from tbl_c
where c_3 = 3
group by a_id
) c on c.a_id = a.id
where ...
group by a.c_1
Related
I'm trying to select * from two tables (a and b) using a join (column a.id and b.id), given that the count of a column (b.owner) in b is lower than 3, i.e. the occurence of a person's name can be max 2.
I've tried:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN b on a.id = b.id
GROUP BY b.owner HAVING COUNT(b_count) <3
As im pretty new to SQL, im pretty stuck here. How can i resolve this issue? The result should be all columns for owners who do not appear more than twice in the data.
The query you are trying to run is not working due to the columns missing in the GROUP BY clause.
As you are outputting all columns from table a (with SELECT a.*), you need to include all those columns in the GROUP BY statement, so that the database understand the group of fields to group by and perform the aggregation required (in your case COUNT(b.owner)).
Example
Considering that your table a has 3 columns below:
CREATE TABLE persons (
id INTEGER,
name VARCHAR(50),
birthday DATE,
PRIMARY KEY (id)
);
.. and your table b the following and referencing the first table as below:
CREATE TABLE sales (
id INTEGER,
person_id INTEGER,
sale_value DECIMAL,
PRIMARY KEY (id),
FOREIGN KEY (person_id) REFERENCES persons(id)
);
.. you should query it aggregating the COUNT() by those 3 columns:
SELECT a.id, a.name, a.birthday, COUNT(b.person_id) AS b_count
FROM persons a
LEFT JOIN sales b ON a.id = b.person_id
GROUP BY a.id, a.name, a.birthday
HAVING COUNT(b.person_id) < 3
Alternative
In case the total of records on the 2nd table is not important to you, you could use a different "strategy" here to avoid performing the JOIN between the tables (useful when joining two huge tables) and rewriting all the columns from a on the SELECT+GROUP BY.
By identifying the records that has less than the 3 occurrences firstly:
SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3;
.. and using it in the WHERE clause to retrieve all the columns from the 1st table only for the ids that resulted from the previous query:
SELECT a.*
FROM persons a
WHERE a.id IN (....other query here....);
.. the execution happens in a more chronological and, perhaps, easier way to visualize while getting more familiar with SQL:
SELECT a.*
FROM persons a
WHERE a.id IN (SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3);
DB Fiddle here
In Standard SQL, you can use:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN
b
ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.owner) < 3;
This may not work in all databases (and it assumes that a.id is unique/primary key). An alternative would be to use a correlated subquery:
SELECT a.*
FROM (SELECT a.*,
(SELECT COUNT(*)
FROM b
WHERE a.id = b.id
) as b_count
FROM a
) a
WHERE b_count < 3;
I am trying to write a SQL query for calculating sum without success.
Let's say that we have:
table A with columns id and type
table B with columns id, a_id (relation to table A) and amount
I succeed to calculate number of records by type like in the following example:
SELECT DISTINCT
type,
COUNT(A.id) OVER (PARTITION BY type) AS numOfRecords
FROM A;
How to calculate sum of amounts also per type (to sum up all amounts from table B for all distinct types in A)?
Your query would normally be written as:
select type, count(*) as num_records
from A
group by type;
Then, you can incorporate b as:
select a.type, count(*) as num_records, sum(b.amount)
from A left join
(select a_id, sum(amount) as amount
from b
group by a_id
) b
on b.a_id = a.id
group by a.type;
You can also join and aggregate without a subquery, but this will throw off the count. To fix that, you can use count(distinct):
select a.type, count(distinct a.id) as num_records, sum(b.amount)
from A left join
from b
on b.a_id = a.id
group by a.type;
I am trying to join 2 tables. Table_A has ~145k rows whereas Table_B has ~205k rows.
They have two columns in common (i.e. ISIN and date). However, when I execute this query:
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.date = B.date
WHERE A.isin = B.isin
I get a table with more than 147k rows. How is it possible? Shouldn't it return a table with at most ~145k rows?
What you are seeing indicates that, for some of the records in Table_A, there are several records in Table_B that satisfy the join conditions (equality on the (date, isin) tuple).
To exhibit these records, you can do:
select B.date, B.isin
from Table_A
join Table_B on A.date = B.date and A.isin = B.isin
group by B.date, B.isin
having count(*) > 1
It's up to you to define how to handle those duplicates. For example:
if the duplicates have different values in column column_name, then you can decide to pull out the maximum or minimum value
or use another column to filter on the top or lower record within the duplicates
if the duplicates are true duplicates, then you can use select distinct in a subquery to dedup them before joining
... other solutions are possible ...
If you want one row per table A, then use outer apply:
SELECT A.*,
B.column_name
FROM Table_A a OUTER APPLY
(SELECT TOP (1) b.*
FROM Table_B b
WHERE A.date = B.date AND A.isin = B.isin
ORDER BY ? -- you can specify *which* row you want when there are duplicates
) b;
OUTER APPLY implements a lateral join. The TOP (1) ensures that at most one row is returned. The OUTER (as opposed to CROSS) ensures that nothing is filtered out. In this case, you could also phrase it as a correlated subquery.
All that said, your data does not seem to be what you really expect. You should figure out where the duplicates are coming from. The place to start is:
select b.date, b.isin, count(*)
from tableb b
group by b.date, b.isin
having count(*) >= 2;
This will show you the duplicates, so you can figure out what to do about them.
Duplicate possibilities is already discuss.
When millions of records are use in join then often due to poor Cardianility Estimate,
record return are not accurate.
For this just change join order,
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.isin = B.isin
and
A.date = B.date
Also create non clustered index on both table.
Create NonClustered index isin_date_table_A on Table_A(isin,date)include(*Table_A)
*Table_A= comma seperated list Table_A column which is require in resultset
Create NonClustered index isin_date_table_B on Table_B(isin,date)include(column_nameA)
Update STATISTICS Table_A
Update STATISTICS Table_B
Keeping the DATE columns of both tables in the same format in the JOIN condition you should be getting the result as expected.
Select A.*, B.column_name
from Table_A
join Table_B on to_date(a.date,'DD-MON-YY') = to_date(b.date,'DD-MON-YY')
where A.isin = B.isin
This query works fine only without WHERE, otherwise there is an error:
column "cnt" does not exist
SELECT
*,
(SELECT count(*)
FROM B
WHERE A.id = B.id) AS cnt
FROM A
WHERE cnt > 0
Use a subquery:
SELECT a.*
FROM (SELECT A.*,
(SELECT count(*)
FROM B
WHERE A.id = B.id
) AS cnt
FROM A
) a
WHERE cnt > 0;
Column aliases defined in the SELECT cannot be used by the WHERE (or other clauses) for that SELECT.
Or, if the id on a is unique, you can more simply do:
SELECT a.*, COUNT(B.id)
FROM A LEFT JOIN
B
ON A.id = B.id
GROUP BY A.id
HAVING COUNT(B.id) > 0;
Or, if you don't really need the count, then:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Assumptions:
You need all columns from A in the result, plus the count from B. That's what your demonstrated query does.
You only want rows with cnt > 0. That's what started your question after all.
Most or all B.id exist in A. That's the typical case and certainly true if a FK constraint on B.id references to A.id.
Solution
Faster, shorter, correct:
SELECT * -- !
FROM (SELECT id, count(*) AS cnt FROM B) B
JOIN A USING (id) -- !
-- WHERE cnt > 0 -- this predicate is implicit now!
Major points
Aggregate before the join, that's typically (substantially) faster when processing the whole table or major parts of it. It also defends against problems if you join to more than one n-table. See:
Aggregate functions on multiple joined tables
You don't need to add the predicate WHERE cnt > 0 any more, that's implicit with the [INNER] JOIN.
You can simply write SELECT *, since the join only adds the column cnt to A.* when done with the USING clause - only one instance of the joining column(s) (id in the example) is added to the out columns. See:
How to drop one join key when joining two tables
Your added question in the comment
postgres really allows to have outside aggregate function attributes that are not behind group by?
That's only true if the PK column(s) is listed in the GROUP BY clause - which covers the whole row. Not the case for a UNIQUE or EXCLUSION constraint. See:
Return a grouped list with occurrences using Rails and PostgreSQL
SQL Fiddle demo (extended version of Gordon's demo).
Usually we will select the field(s) in the SQL query. e.g.
SELECT A.id FROM Member A
But what if I want to align a column which elements correspond to the other selected field?
For example I want to select the member ID from a member table, and the COUNT that count how many times the member appear in the tuple of other table
So how do I make the COUNT column that align together with the select result?
If I understood you correctly, this is what you want:
SELECT A.id, count(B.MemberID)
FROM Member A
LEFT JOIN TableB B on A.id = B.MemberID
group by A.id
The LEFT JOIN will include records in A that do not have any corresponding records in B. Also, COUNT only counts non-null values, so you need to use it with B.MemberID. This way the count for records in A that do not have any corresponding records in B will be 0, since B.MemberID will be NULL.
I agree with #Adrian's solution, but if there were many columns in the original SELECT list, they all would have to be listed in GROUP BY. I mean something like this:
SELECT
A.id,
A.name,
A.whatever,
...
COUNT(B.member_id)
FROM Member A
LEFT JOIN Member_Something B ON A.id = B.member_id
GROUP BY
A.id,
A.name,
A.whatever,
...
It is not always convenient, especially when the columns are actually expressions. You could take another approach instead:
SELECT
A.id,
A.name,
A.whatever,
...
COALESCE(B.member_count, 0)
FROM Member A
LEFT JOIN (
SELECT member_id, COUNT(*) AS member_count
FROM Member_Something
GROUP BY member_id
) B ON A.id = B.member_id
select member_id, count(*)
from table
group by member_id;