How can I prevent cartesian product to avoid double counting

How can I prevent cartesian product to avoid double counting - sql

I am getting wrong data (cartesian product) when using inner join.
Table t1
PurchaseOrder CostID Amount
1 1 4
1 2 3
Table t2
PurchaseOrder OrderType ItemId
1 321 1
1 321 2
1 321 3
1 321 4
2 128 5
2 128 6
3 321 9
Required Output
PurchaseOrder Amount
1 7
My Output
PurchaseOrder Amount
1 28
I am trying to use inner join to get the output but not getting the right data.
Query:
CREATE TEMP TABLE t1
(
PurchaseOrder INT64,
CostID INT64,
Amount INT64
);
INSERT INTO t1
VALUES (1,1,4),(1,2,3);
SELECT *
FROM t1;
CREATE TEMP TABLE t2
(
PurchaseOrder INT64,
OrderType INT64
);
INSERT INTO t2
VALUES (1,321),(1,321),(1,321),(1,321);
SELECT *
FROM t2;
select t1.PurchaseOrder, sum(amount)
from t1 inner join t2 on t1.PurchaseOrder = t2.PurchaseOrder
where t2.OrderType = 321
group by t1.PurchaseOrder;

I think you just want to check if some PurchaseOrder has an OrderType 321, that is what you use Table 2 for, to do this check. But then to sum the amounts, all information is already in Table 1. So I would write the query as follows:
select PurchaseOrder, sum(amount)
from t1
where exists
(select * from t2
where PurchaseOrder=t1.PurchaseOrder
and OrderType=321)
group by PurchaseOrder

Related

Return id when id value does not equal 0 at least once

I have 2 tables.
table_1
id | product
1 | a
2 | b
3 | c
4 | d
table_2
product_id | value
1 | 0
2 | 0
1 | 5
2 | 0
4 | 10
How can I return details from table_1 for ids that:
- are present in table_2 (table_1.id = table_2.product_id)
- do not have any associated value equal to 0 (for example id "1" should be excluded)
The correct result would be id "4" as none of its values equal to zero.
I have tried below query but it returns also id "3" that is not present in the table_2.
SELECT * FROM table_1
WHERE id NOT IN (
SELECT product_id FROM table_2
WHERE value = 0)

You can use two conditions:
SELECT t1.*
FROM table_1 t1
WHERE EXISTS (SELECT 1
FROM table_2 t2
WHERE t1.id = t2.product_id
) AND
NOT EXISTS (SELECT 1
FROM table_2 t2
WHERE t1.id = t2.product_id AND t2.value = 0
);

The naive approach:
-- Step 1: Select product IDs to ignore
SELECT product_id
FROM table_2
WHERE value = 0
-- Step 2: Select product IDs to include
SELECT product_id
FROM table_2
WHERE product_id NOT IN ( -- Use the result of Step 1
SELECT product_id
FROM table_2
WHERE value = 0
)
-- Final query: Select products
SELECT *
FROM table_1
WHERE product_id IN ( -- Use the result of Step 2
SELECT product_id
FROM table_2
WHERE product_id NOT IN ( -- Use the result of Step 1
SELECT product_id
FROM table_2
WHERE value = 0
)
)

One option, using aggregation:
SELECT
t1.id,
t1.product
FROM table_1 t1
INNER JOIN table_2 t2
ON t1.id = t2.product_id
GROUP BY
t1.id,
t1.product
HAVING
COUNT(CASE WHEN t2.value = 0 THEN 1 END) = 0;
In order for the HAVING clause to return true, the product must not have had any zero value in the second table. Also, the inner join filters off any product which does not appear at all in the second table.

You can get the ids you need to use for the IN clause, by grouping by product_id and putting the condition in the HAVING clause:
SELECT * FROM table_1
WHERE id IN (
SELECT product_id
FROM table_2
GROUP BY product_id
HAVING SUM(CASE WHEN value = 0 THEN 1 ELSE 0 END) = 0
)

SQL update with aggregate in WHERE clause

I'm trying to set the date field for column date to let's say '10/11/2012' in table 1 when the sum of all amounts table 2 related to that id (via fk_id) = 0. Here's what I mean:
FROM:
table 1
id date
1 10/11/2011
2
3 10/12/2011
table 2
fk_id amount
1 200
2 0
2 0
3 100
TO:
table 1
id date
1 10/11/2011
2 10/11/2012
3 10/12/2011
table 2
fk_id amount
1 200
2 0
2 0
3 100
This is what I have currently:
update table1
set date = '10/11/2012
FROM table1 inner join table 2 on table1.id = table2.fk_id
HAVING sum(table2.amount) = 0
Can someone help me out here?

UPDATE table1
SET date = '10/11/2012'
FROM table1
WHERE id IN (SELECT FK_ID FROM table2 GROUP BY FK_ID HAVING SUM(Amount)=0)

This should work:
UPDATE T1
SET [date] = '20121011'
FROM table1 T1
INNER JOIN (SELECT fk_id, SUM(amount) Amount
FROM table2
GROUP BY fk_id
HAVING SUM(amount) = 0) T2
ON T1.id = T2.fk_id

SQL: Outputting Multiple Rows When Joining From Same Table

My question is this: Is it possible to output multiple rows when joining from the same table?
With this code for example, I would like it to output 2 rows, one for each table. Instead, what it does is gives me 1 row with all of the data.
SELECT t1.*, t2.*
FROM table t1
JOIN table t2
ON t2.id = t1.oldId
WHERE t1.id = '1'
UPDATE
Well the problem that I have with the UNION/UNION ALL is this: I don't know what the t1.oldId value is equal to. All I know is the id for t1. I am trying to avoid using 2 queries so is there a way I could do something like this:
SELECT t1.*
FROM table t1
WHERE t1.id = '1'
UNION
SELECT t2.*
FROM table t2
WHERE t2.id = t1.oldId
SAMPLE DATA
messages_users
id message_id user_id box thread_id latest_id
--------------------------------------------------------
8 1 1 1 NULL NULL
9 2 1 2 NULL 16
10 2 65 1 NULL 15
11 3 65 2 2 NULL
12 3 1 1 2 NULL
13 4 1 2 2 NULL
14 4 65 1 2 NULL
15 5 65 2 2 NULL
16 6 1 1 2 NULL
Query:
SELECT mu.id FROM messages_users mu
JOIN messages_users mu2 ON mu2.latest_id IS NOT NULL
WHERE mu.user_id = '1' AND mu2.user_id = '1' AND ((mu.box = '1'
AND mu.thread_id IS NULL AND mu.latest_id IS NULL) OR mu.id = mu2.latest_id)
This query fixes my problem. But it seems the answer to my question is to not use a JOIN but a UNION.

You mean one row for t1 and one row from t2?
You're looking for UNION, not JOIN.
select * from table where id = 1
union
select * from table where oldid = 1

If you are trying to multiply rows in a table, you need UNION ALL (not UNION):
select *
from ((select * from t) union all
(select * from t)
) t
I also sometimes use a cross join to do this:
select *
from t cross join
(select 1 as seqnum union all select 2) vals
The cross join is explicitly multiplying the number of rows, in this case, with a sequencenumber attached.

Well, since it's the same table, you could do:
SELECT t2.*
FROM table t1
JOIN table t2
ON t2.id = t1.oldId
OR t2.id = t1.id
WHERE t1.id = '1'

Get next minimum, greater than or equal to a given value for each group

given the following Table1:
RefID intVal SomeVal
----------------------
1 10 val01
1 20 val02
1 30 val03
1 40 val04
1 50 val05
2 10 val06
2 20 val07
2 30 val08
2 40 val09
2 50 val10
3 12 val11
3 14 val12
4 10 val13
5 100 val14
5 150 val15
5 1000 val16
and Table2 containing some RefIDs and intVals like
RefID intVal
-------------
1 11
1 28
2 9
2 50
2 51
4 11
5 1
5 150
5 151
need an SQL Statement to get the next greater intValue for each RefID and NULL if not found in Table1
following is the expected result
RefID intVal nextGt SomeVal
------------------------------
1 11 20 val01
1 28 30 val03
2 9 10 val06
2 50 50 val10
2 51 NULL NULL
4 11 NULL NULL
5 1 100 val14
5 150 150 val15
5 151 1000 val16
help would be appreciated !

Derived table a retrieves minimal values from table1 given refid and intVal from table2; outer query retrieves someValue only.
select a.refid, a.intVal, a.nextGt, table1.SomeVal
from
(
select table2.refid, table2.intval, min (table1.intVal) nextGt
from table2
left join table1
on table2.refid = table1.refid
and table2.intVal <= table1.intVal
group by table2.refid, table2.intval
) a
-- table1 is joined again to retrieve SomeVal
left join table1
on a.refid = table1.refid
and a.nextGt = table1.intVal
Here is Sql Fiddle with live test.

You can solve this using the ROW_NUMBER() function:
SELECT
RefID,
intVal,
NextGt,
SomeVal,
FROM
(
SELECT
t2.RefID,
t2.intVal,
t1.intVal AS NextGt,
t1.SomeVal,
ROW_NUMBER() OVER (PARTITION BY t2.RefID, t2.intVal ORDER BY t1.intVal) AS rn
FROM
dbo.Table2 AS t2
LEFT JOIN dbo.Table1 AS t1 ON t1.RefID = t2.RefID AND t1.intVal >= t2.intVal
) s
WHERE
rn = 1
;
The derived table matches each Table2 row with all Table1 rows that have the same RefID and an intVal that is greater than or equal to Table2.intVal. Each subset of matches is ranked and the first row is returned by the main query.
The nested query uses an outer join, so that those Table2 rows that have no Table1 matches are still returned (with nulls substituted for the Table1 columns).
Alternatively you can use OUTER APPLY:
SELECT
t2.RefID,
t2.intVal,
t1.intVal AS NextGt,
t1.SomeVal
FROM
dbo.Table2 AS t2
OUTER APPLY
(
SELECT TOP (1)
t1.intVal
FROM
dbo.Table1 AS t1
WHERE
t1.RefID = t2.RefID
AND t1.intVal >= t2.intVal
ORDER BY
t1.intVal ASC
) AS t1
;
This method is arguably more straightforward: for each Table2 row, get all matches from Table1 based on the same set of conditions, sort the matches in the ascending order of Table1.intVal and take the topmost intVal.

This can be done with a join, group by, and a case statement, and a trick:
select t1.refid, t2.intval,
min(case when t1.intval > t2.intval then t1.intval end) as min_greater_than_ref,
substring(min(case when t1.intval > t2.intval
then right('00000000'+cast(t1.intval as varchar(255)), 8)+t1.SomeVal)
end)), 9, 1000)
from table1 t1 left join
table2 t2
on t1.refid = t2.refid
group by t1.refid, t2.intval
SO, the trick is to prepend the integer value to SomeValue, zero-padding the integer value (in this case to 8 characters). You get something like: "00000020val01". The minimum on this column is based on the minimum of the integer. The final step is to extract the value.
For this example, I used SQL Server syntax for the concatenation. In other databases you might use CONCAT() or ||.

little help with some tsql

Given following table:
rowId AccountId Organization1 Organization2
-----------------------------------------------
1 1 20 10
2 1 10 20
3 1 40 30
4 2 15 10
5 2 20 15
6 2 10 20
How do I identify the records where Organization2 doesn't exist in Organization1 for a particular account
for instance, in the given data above my results will be a single record which will be AccountId 1 because row3 organization2 value 30 doesn't exist in organization1 for that particular account.

SELECT rowId, AccountId, Organization1, Organization2
FROM yourTable yt
WHERE NOT EXISTS (SELECT 1 FROM yourTable yt2 WHERE yt.AccountId = yt2.AccountId AND yt.Organization1 = yt2.Organization2)

There are two possible interpretations of your question. The first (where the Organization1 and Organization2 columns are not equal) is trivial:
SELECT AccountID FROM Table WHERE Organization1 <> Organization2
But I suspect you're asking the slightly more difficult interpretation (where Organization2 does not appear in ANY Organization1 value for the same account):
SELECT AccountID From Table T1 WHERE Organization2 NOT IN
(SELECT Organization1 FROM Table T2 WHERE T2.AccountID = T1.AccountID)

Here is a how you could do it:
Test data:
CREATE TABLE #T(rowid int, acc int, org1 int, org2 int)
INSERT #T
SELECT 1,1,10,10 UNION
SELECT 2,1,20,20 UNION
SELECT 3,1,40,30 UNION
SELECT 4,2,10,10 UNION
SELECT 5,2,15,15 UNION
SELECT 6,2,20,20
Then perform a self-join to discover missing org2:
SELECT
*
FROM #T T1
LEFT JOIN
#T T2
ON t1.org1 = t2.org2
AND t1.acc = t2.acc
WHERE t2.org1 IS NULL

SELECT
*
FROM
[YorTable]
WHERE
[Organization1] <> [Organization2] -- The '<>' is read "Does Not Equal".

Use left join as Noel Abrahams presented.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I prevent cartesian product to avoid double counting - sql

Related

Return id when id value does not equal 0 at least once

SQL update with aggregate in WHERE clause

SQL: Outputting Multiple Rows When Joining From Same Table

Get next minimum, greater than or equal to a given value for each group

little help with some tsql

Categories

Resources