Redshift Query returning too many rows in aggregate join - sql

I am sure I must be missing something obvious. I am trying to line up two tables with different measurement data for analysis, and my counts are coming back enormously high when I join the two tables together.
Here are the correct counts from my table1
select line_item_id,sum(is_imp) as imps
from table1
where line_item_id=5993252
group by 1;
Here are the correct counts from table2
select cs_line_item_id,sum(grossImpressions) as cs_imps
from table2
where cs_line_item_id=5993252
group by 1;
When I join the tables together, my counts become inaccurate:
select a.line_item_id,sum(a.is_imp) as imps,sum(c.grossImpressions) as cs_imps
from table1 a join table2 c
ON a.line_item_id=c.cs_line_item_id
where a.line_item_id=5993252
group by 1;
I'm using aggregates, group by, filtering, so I'm not sure where I'm going wrong. Here is the schema for these tables:

select a.*, b.imps table2_imps from
(select line_item_id,sum(is_imp) as imps
from table1
group by 1)a
join
(select line_item_id,sum(is_imp) as imps
from table1
group by 1)b
on a.select line_item_id=b.select line_item_id

You are generating a Cartesian product for each line_item_id. There are two relatively simply ways to solve this, one with a full join, the other with union all:
select line_item_id, sum(imps) as imps, sum(grossImpressions) as cs_imps
from ((select a.line_time_id, sum(is_imp) as imps, 0 as grossImpressions
from table1 a
where a.line_item_id = 5993252
group by a.line_item_id
) union all
(select c.line_time_id, 0 as imps, sum(grossImpressions) as grossImpressions
from table2 c
where c.line_item_id = 5993252
group by c.line_item_id
)
) ac
group by line_item_id;
You can remove the where clause from the subqueries to get the total for all line_tiem_ids. Note that this works even when one or the other table has no matching rows for a given line_item_id.
For performance, you really want to do the filtering before the group by.

Related

Adding sum values from two different tables

How can i achieve this. T2 is linked with another table which contains order details like customer name, country and classification. They have an inner join.
T1 is linked to T2 only via order code and order item.
Assuming that both tables report the same set of order numbers, we can try joining two subqueries each of which finds the sums in the respective tables:
SELECT
t1.ORDER_NUM,
t1.ORDER_ITEM,
t1.PRODUCED + t2.PRODUCED AS PRODUCED
FROM
(
SELECT ORDER_NUM, ORDER_ITEM, SUM(PRODUCED) AS PRODUCED
FROM table1
GROUP BY ORDER_NUM
) t1
INNER JOIN
(
SELECT ORDER_NUM, ORDER_ITEM, SUM(NET_IN - NET_OUT) AS PRODUCED
FROM table2
GROUP BY ORDER_NUM
) t2
ON t1.ORDER_NUM = t2.ORDER_NUM AND
t1.ORDER_ITEM = t2.ORDER_ITEM
ORDER BY
t1.ORDER_NUM,
t1.ORDER_ITEM;
Note that the above is not necessarily an ideal approach, because a given order/item combination in one table might not appear in the other table. A better approach would be to start the query with a reference table containing all orders and items. That failing, we could convert the above to a full outer join.
I think a simple approach is union all:
select ordernum, orderitem, sum(produced) as produced
from ((select ordernum, orderitem, produced
from table1
) union all
(select ordernum, orderitem, netout
from table2
)
) t12
group by ordernum, orderitem;
This has two advantages over pre-aggregating and using joins:
It keeps all order/item pairs, even those that appear in one table.
If you add a where claus to the outer query, SQL Server is likely to "project" that into the subqueries.
Try for bellow query also
select t1.order_num,t1.order_item,sum(t1.produced)+(select sum(net_in) from t2)-(select sum(t2.net_out) from t2)PRODUCED
from t1
group by t1.order_num,t1.order_item
if you have wanted the only sum from another table that time you have used select query and do the sum of a particular column.

Oracle SQL query optimization - getting counts based on a varchar field

Optimizing a query
I have a query getting data from one table and getting two counts from two other tables based
on a varchar field TYPE. I need to get count from TABLE2 where TYPE=TABLE1.TYPE and
count from TABLE3 where TYPE=TABLE1.TYPE
At this point I cannot create any indexes on those fields so I decided to use functions which brought my original query execution time
down to 5 seconds which is still too much. Any suggestions on how to further optimize my query?
SELECT a.ID,
a.FIELD1,
a.FIELD2,
a.TYPE,
GET_COUNT_1(a.TYPE) as COUNT1,
GET_COUNT_2(a.TYPE) as COUNT2,
FROM TABLE1 a
my original query was:
SELECT a.ID,
a.FIELD1,
a.FIELD2,
a.TYPE,
(SELECT COUNT(*) FROM TABLE2 b WHERE b.TYPE=a.TYPE) as COUNT1,
(SELECT COUNT(*) FROM TABLE3 c WHERE c.TYPE=a.TYPE) as COUNT2
FROM TABLE1 a
If you do not have index on the table2(TYPE) it is deadly to use subquery as you will repeatedly (for each row of TABLE1) perform a FULL TABLE SCAN.
Aparently the Oracle subquery cashing, that could save you, did not kick in.
The function approach will be not much better, except you implement some fucntion result caching on your own.
But there is a simple solution to precalculate the counts in a subquery and join the result to TABLE1.
Note that you calculates the count only once for each type and not for each row of the TABLE1
with cnt as
(select type, count(*) cnt
from table2 group by type),
cnt2 as
(select type, count(*) cnt
from table3 group by type)
select a.ID,
a.FIELD1,
a.FIELD2,
a.TYPE,
b.cnt cnt1
c.cnt cnt2
from TABLE1 a
left outer join cnt b
on a.type = b.type
left outer join cnt2 c
on a.type = c.type
You will end with one FTS for each table, aggregation and outer join, which is the minimum you need to do.
For your query, you want an index on table2(type).
The two subqueries are exactly the same, except for the table alias. If you really have two different tables, or if you are using different columns, then you'll want the appropriate index for that expression.

Count records only from left side of a LEFT JOIN

I'm building an Access query with a LEFT JOIN that, among other things, counts the number of unique sampleIDs present in the left table of the JOIN, and counts the aggregate number of specimens (bugs) present in the right table of the JOIN, both for a given group of samples (TripID). Here's the pertinent chunk of SQL code:
SELECT DISTINCT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t2.C1 + t2.C2)
AS Bugs FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
The trouble I'm having is that COUNT(t1.SampleID) is not giving me my desired result. My desired result is the number of unique SampleIDs present in t1 for a given TripID (let's say 7). Instead, what I get seems to be the number of rows in t2 for which the SampleID is contained within the given TripID group (let's say 77). How can I change this SQL query to get the desired number (7, not 77)?
just take the aggregate sum first on t2, then join with t2 like this:
SELECT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t3.Bugs) as Bugs
FROM tbl_Sample AS t1
LEFT Join (
SELECT t2.SampleID, SUM(t2.C1 + t2.C2) as Bugs
FROM tbl_Bugs as t2
GROUP BY SampleID) AS t3 ON t1.SampleID = t3.SampleID
GROUP BY t1.TripID
This is a tricky query, because you have different hierarchies. Here is one method:
select s.tripid, count(*) as numsamples,
(select sum(b2.c1 + b2.c2)
from bugs b join
tbl_sample s2
on s2.sampleid = b.sampleid
where s2.tripid = s.tripid
) as numbugs
from tbl_sample s
group by s.tripid
You included a DISTINCT with a Group By. This is removing duplicates twice, which is unnecessarily complex. You can get rid of the DISTINCT.
I would have the count separate from what is going on in the group by.
SELECT dT.TripID
,(SELECT COUNT(DISTINCT(SampleID))
FROM Bugs B
WHERE B.TripID = dT.TripID
) AS [Samples]
,dT.Bugs
FROM (
SELECT t1.TripID
,SUM(t2.C1 + t2.C2) AS Bugs
FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
) AS dT

How to do a SUM across two unrelated tables?

I'm trying to sum on two unrelated tables with postgres. With MySQL, I would do something like this :
SELECT SUM(table1.col1) AS sum_1, SUM(table2.col1) AS sum_2 FROM table1, table2
This should give me a table with two column named sum_1 and sum_2. However, postgres doesn't give me any result for this query.
Any ideas?
SELECT (SELECT SUM(table1.col1) FROM table1) AS sum_1,
(SELECT SUM(table2.col1) FROM table2) AS sum_2;
You can also write it as:
SELECT t1.sum_c1, t1.sum_c2, t2.sum_t2_c1
FROM
(
SELECT SUM(col1) sum_c1,
SUM(col2) sum_c2
FROM table1
) t1
FULL OUTER JOIN
(
SELECT SUM(col1) sum_t2_c1
FROM table2
) t2 ON 1=1;
The FULL JOIN is used with a dud condition so that either subquery could produce no results (empty) without causing the greater query to have no result.
I don't think the query as you have written would have produced the result you expected to get, because it's doing a CROSS JOIN between table1 and table2, which would inflate each SUM by the count of rows in the other table. Note that if either table1/table2 is empty, the CROSS JOIN will cause X rows by 0 rows to return an empty result.
Look at this SQL Fiddle and compare the results.
To combine multiple aggregates from multiple tables, use CROSS JOIN:
SELECT sum_1, sum_2, sum_3, sum_4
FROM
(SELECT sum(col1) AS sum_1, sum(col2) AS sum_2 FROM table1) t1
CROSS JOIN
(SELECT sum(col3) AS sum_3, sum(col4) AS sum_4 FROM table2) t2
There is always exactly one row from either of the subqueries, even with no rows in the source tables. So a CROSS JOIN (or even just a lowly comma between the subqueries - being the not so easy to read shorthand for a cross join with lower precedence) is the simplest way.
Note that this produces a cross join between single aggregated rows, not a cross join between individual rows of multiple tables like your incorrect statement in the question would - thereby multiplying each other.
I suggest something like the following, although I hjaven't tried it.
select sum1, sum2
from
(select sum(col1) sum1 from table1),
(select sum(col1) sum2 from table2);
The idea is to create two inline views, each with one row it, and then do a cartesian join on these two views, each with one row.
SELECT SUM(table1_column1 + table2_column1)
FROM table1
JOIN table2
ON table1_id= table2_id
WHERE account_no='${account_no}'
Express-JS with PostgreSQL via postman API

Counts for distinct values in different tables where columns are common to separate tables

I have no idea if that title conveys what I want it to.
I have two tables containing phone records (one for each account) and I'd like to get call counts for the numbers that are common to each account. In other words:
Table 1
Number ...
8675309
8675309
8675310
8675310
8675312
Table 2
Number ...
8675309
8675309
8675309
8675310
8675311
Querying with something like:
SELECT DISTINCT table1.number, COUNT(table1.number), COUNT(table2.number) FROM table1, table2 WHERE table1.number = table2.number GROUP BY table1.number
would hopefully produce:
8675309|2|3
8675310|2|1
Instead, it currently produces something like:
8675309|6|6
8675310|2|2
It appears to be multiplying the count from each table. Presumably, this is because I'm not joining the tables the way I should for this goal. Or because by the time I ask for COUNT(table1.number) the tables have already been joined in some multiplicative way. Should I not be doing a JOIN and instead something that would read like: "where table2.number CONTAINS(table1.number)"?
Any tips?
One way is with subqueries:
SELECT t1.number, t1.table1Count, t2.table2Count
from (select number, count(*) table1Count
from table1
group by number) t1
inner join (select number, count(*) table2Count
from table2
group by number) t2
on t2.number = t1.number
This assumes that you only want to list numbers that appear in both tables. If you want to list all numbers that appear in one table and optionally the other, you'd use a left or right outer join; if you wanted all numbers that appeared in either or both tables, you'd use a full outer join.
Another and potentially more efficient way requires the presence of a single column that uniquely identifies each row in each table:
SELECT
t1.number
,count(distinct t1.PrimaryKeyValue) table1Count
,count(distinct t2.PrimaryKeyValue) table2Count
from table1 t1
inner join table2 t2
on t2.number = t1.number
group by t1.number
This makes the same assumptions as before, and can also be adjusted modified via outer joins.
One way is to use a couple of derived tables to compute your counts separately and then join them to produce your final summary:
select t1.number, t1.count1, t2.count2
from (select number, count(number) as count1 from table1 group by number) as t1
join (select number, count(number) as count2 from table2 group by number) as t2
on t1.number = t2.number
There are probably other ways but that should work and it is the first thing that came to mind.
You're getting your "multiplicative" effect pretty much for the reasons you suspect. If you have this:
table1(id,x) table2(id,x)
------------ ------------
1, a 4, a
2, a 5, a
3, b 6, b
Then joining them on x will give you this:
1,a, 4,a
1,a, 5,a
2,a, 4,a
2,a, 5,a
...
Usually you could use a GROUP BY to sort out the duplicates but you can't do that because it would mess up your per-table counts.
Try this:
select tab1.number,tab1.num1,tab2.num2
from
(SELECT number, COUNT(number) as num1 from table1 group by number) as tab1
left join
(SELECT number, COUNT(number) as num2 from table2 group by number) as tab2
on tab1.number = tab2.number