SQL Joins with NOT IN displays incorrect data

SQL Joins with NOT IN displays incorrect data - sql

I have 3 tables as below, and I need data where Expense.Expense_Code Should not be availalbe in Income.Income_Code.
Table: Base
+----+-----------+----------------+
| ID | Reference | Reference_Name |
+----+-----------+----------------+
| 1 | 10000 | AAAA |
| 2 | 10001 | BBBB |
| 3 | 10002 | CCCC |
+----+-----------+----------------+
Table: Expense
+-----+---------+--------------+----------------+
| EID | BASE_ID | Expense_Code | Expense_Amount |
+-----+---------+--------------+----------------+
| 1 | 1 | I0001 | 25 |
| 2 | 1 | I0002 | 50 |
| 3 | 2 | I0003 | 75 |
+-----+---------+--------------+----------------+
Table: Income
+------+---------+-------------+------------+
| I_ID | BASE_ID | Income_Code | Income_Amt |
+------+---------+-------------+------------+
| 1 | 1 | I0001 | 10 |
| 2 | 1 | I0002 | 20 |
| 3 | 1 | I0003 | 30 |
+------+---------+-------------+------------+
SELECT DISTINCT Base.Reference,Expense.Expense_Code
FROM Base
JOIN Expense ON Base.ID = Expense.BASE_ID
JOIN Income ON Base.ID = Income.BASE_ID
WHERE Expense.Expense_Code IN ('I0001','I0002')
AND Income.Income _CODE NOT IN ('I0001','I0002')
I expect no data be retured.
However I am getting the result as below:
+-----------+--------------+
| REFERENCE | Expense_Code |
+-----------+--------------+
| 10000 | I0001 |
| 10000 | I0002 |
+-----------+--------------+
For Base.Reference (10000), Expense.Expense_Code='I0001','I0002' the same expense_code is availalbe in Income table therefore I should not get any data.
Am I trying to do something wrong with the joins.
Thanks in advance for your help!

You are not joining EXPENSE and INCOME tables in your query at all. There needs to be a condition to join these tables in order to get desired result. You can also use NOT EXISTS clause. Prefer using NOT EXISTS over NOT IN as it performs better in case there are NULLS allowed in the columns that you're joining on.
SELECT * FROM BASE B
JOIN EXPENSE E ON B.ID=E.BASE_ID
WHERE E.EXPENSE_CODE NOT EXISTS (SELECT I.INCOME_CODE FROM INCOME I WHERE I.I_ID=E.EID)

When the first join is performed, you end with two lines possessing the ID 1, because the relationship between the tables is not 1o1, hence every line of the first table will have joined to it a line coming from the second table. Like so:
Output of the first join statement
Then, when the second part of your statement is executed, the DBMS finds two ID's 1 from the first joined table(BASE+EXPENSE) and 3 from the third table(INCOME).
Again since it's non a 1o1 relationship between tables, every row from the first joined table will have a joined line coming from the second table, like so: Output of the second join statement
Finally, when it reads your where clause and outputs what you see. I highlighted the excluded rows from the where clause
Output of where statement

...I need data where Expense.Expense_Code Should not be availalbe in Income.Income_Code
The following query will retrieve this data:
select b.*, e.*
from base b
join expense e on e.base_id = b.id
left join income i on i.base_id = e.base_id
and e.expense_code = i.income_code
where i.i_id is null
For reference the data script (slightly modified) is:
create table base (
id number(6),
reference number(6),
reference_name varchar2(10)
);
insert into base (id, reference, reference_name) values (1, 10000, 'AAAA');
insert into base (id, reference, reference_name) values (2, 10001, 'BBBB');
insert into base (id, reference, reference_name) values (3, 10002, 'CCCC');
create table expense (
eid number(6),
base_id number(6),
expense_code varchar2(10),
expense_amount number(6)
);
insert into expense (eid, base_id, expense_code, expense_amount) values (1, 1, 'I0001', 25);
insert into expense (eid, base_id, expense_code, expense_amount) values (2, 1, 'I0002', 50);
insert into expense (eid, base_id, expense_code, expense_amount) values (3, 1, 'I0003', 75);
insert into expense (eid, base_id, expense_code, expense_amount) values (4, 2, 'I0004', 101);
create table income (
i_id number(6),
base_id number(6),
income_code varchar2(10),
income_amt number(6)
);
insert into income (i_id, base_id, income_code, income_amt) values (1, 1, 'I0001', 10);
insert into income (i_id, base_id, income_code, income_amt) values (2, 1, 'I0002', 20);
insert into income (i_id, base_id, income_code, income_amt) values (3, 1, 'I0003', 30);
Result:
ID REFERENCE REFERENCE_NAME EID BASE_ID EXPENSE_CODE EXPENSE_AMOUNT
-- --------- -------------- --- ------- ------------ --------------
2 10,001 BBBB 4 2 I0004 101

Related

How to return columns from nested SELECT queries in the final table?

I have three layered nested query which works.
select PARTNER, BIRTHDT, XSEXM, XSEXF from "schema"."platform.view/table2" where partner IN
(select SID from "schema"."platform.view/table1" where TYPE='BB' and CLASS='yy' and ID IN
(select SID from "schema"."platform.view/table1" where TYPE='AA' and CLASS='zz' and ID IN ("one", "two")
))
I want the values ( "one", "two") from table1 in the inner most query to be present in the final Table returned.
I have tried to get it like this:
select t1.ID, t2.SID from "schema"."platform.view/table1" t1
OUTER APPLY (
select SID from "schema"."platform.view/table1" t2
where t2.TYPE='BB' and t2.CLASS='yy' and t2.ID IN t1.SID
)
where t1.TYPE='AA' and t1.CLASS='zz' and t1.ID IN ("one", "two")
There are three three identifiers:
1. ID ( ONE, TWO, etc.)
2. intermediate SID ( 123, 124, etc) which is again searched as ID
3. Partner ID (P12, P13, etc) which maps to table2.
Sample Data:
table1:
| ID | SID | TYPE | CLASS |
|------|-----|------|-------|
| ONE | 123 | AA | zz |
| TWO | 124 | AA | zz |
| 123 | P12 | BB | yy |
| THRE | 125 | AA | zz |
| 124 | P13 | BB | yy |
| 125 | P14 | BB | yy |
| FOUR | 123 | AA | zz |
table2:
| PARTNER | BIRTHDT | XSEXM | XSEXF |
|---------|----------|-------|-------|
| P12 | 19900214 | X | |
| P13 | 19900713 | X | |
| P14 | 19900407 | | X |
Desired Output for Input ("ONE", "TWO", "THRE"):
| ID | PARTNER | BIRTHDT | XSEXM | XSEXF |
|-----|---------|----------|-------|-------|
| ONE | P12 | 19900214 | X | |
| TWO | P13 | 19900713 | X | |
| THRE| P14 | 19900407 | | X |
How to map this initial search value with its final result rows in this three layer nested statement?

Since you want to "carry" information from your "inner" SELECTs you can either "join back" the data at the final projection step which requires that you have a 1:1 relationship that you could use for the join.
This is not the case here.
Instead, don't use the WHERE ... IN (SELECT ID...) approach, but INNER JOINs instead.
These allow the same kind of filtering/selection but also give the option to project any column of the two involved tables.
For your rather abstract statement (the column names really need a lot of context knowledge in order to make sense... - that's something you may want to fix by adding useful column aliases) this can look like so:
drop table tab1;
drop table tab2;
CREATE TABLE TAB1
("ID" varchar(6)
, "SID" varchar(5)
, "TYPE" varchar(6)
, "CLASS" varchar(7))
;
INSERT INTO TAB1
VALUES ('ONE', '123', 'AA', 'zz');
INSERT INTO TAB1
VALUES ('TWO', '124', 'AA', 'zz');
INSERT INTO TAB1
VALUES ('123', 'P12', 'BB', 'yy');
INSERT INTO TAB1
VALUES ('THRE', '125', 'AA', 'zz');
INSERT INTO TAB1
VALUES ('124', 'P13', 'BB', 'yy');
INSERT INTO TAB1
VALUES ('125', 'P14', 'BB', 'yy');
INSERT INTO TAB1
VALUES ('FOUR', '123', 'AA', 'zz');
select * from tab1;
CREATE TABLE TAB2
("PARTNER" varchar(9)
, "BIRTHDT" varchar(10)
, "XSEXM" varchar(7)
, "XSEXF" varchar(7))
;
INSERT INTO TAB2
VALUES ('P12', '19900214', 'X', NULL);
INSERT INTO TAB2
VALUES ('P13', '19900713', 'X', NULL);
INSERT INTO TAB2
VALUES ('P14', '19900407', NULL, 'X');
with id_sel as (
select SID, ID
from TAB1
where
TYPE='AA'
and CLASS='zz'
and ID IN ('ONE', 'TWO', 'THRE')
),
part_sel as (
select
t1.SID, id.ID orig_id
from
TAB1 t1
inner join id_sel id
on t1.id = id.sid
where
t1.TYPE='BB'
and t1.CLASS='yy'
)
select
part_sel.orig_id, t2.PARTNER, t2.BIRTHDT, t2.XSEXM, t2.XSEXF
from
TAB2 t2
inner join part_sel
on t2.partner = part_sel.sid;
ORIG_ID PARTNER BIRTHDT XSEXM XSEXF
ONE P12 19900214 X ?
TWO P13 19900713 X ?
THRE P14 19900407 ? X

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.

This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;

You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);

Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable

I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

Postgres - Query to select fields from multiple tables as columns

I have the following tables
Table 1 : Product
id name
1 Bread
2 Bun
3 Cake
Table 2: Expense Items
product| quantity
1 | 100
2 | 150
3 | 180
1 | 25
2 | 30
Table 3: Income Items
product| quantity
1 | 100
2 | 150
3 | 180
1 | 25
2 | 30
Now I want the results like this
product | sum of quantity of expenseitem | sum of quantity of income item
1 | 125 | 125
2 | 180 | 180
3 | 180 | 180
What is the query to get this result ?
Thanks

You can try to use UNION ALL in a subquery with the condition in the aggregate function
Schema (PostgreSQL v9.6)
CREATE TABLE Product(
id int,
name varchar(50)
);
INSERT INTO Product VALUES (1,'Bread');
INSERT INTO Product VALUES (2,'Bun');
INSERT INTO Product VALUES (3,'Cake');
CREATE TABLE ExpenseItems(
product int,
quantity int
);
INSERT INTO ExpenseItems VALUES (1,100);
INSERT INTO ExpenseItems VALUES (2,150);
INSERT INTO ExpenseItems VALUES (3,180);
INSERT INTO ExpenseItems VALUES (1,25);
INSERT INTO ExpenseItems VALUES (2,30);
CREATE TABLE IncomeItems(
product int,
quantity int
);
INSERT INTO IncomeItems VALUES (1,100);
INSERT INTO IncomeItems VALUES (2,150);
INSERT INTO IncomeItems VALUES (3,180);
INSERT INTO IncomeItems VALUES (1,25);
INSERT INTO IncomeItems VALUES (2,30);
Query #1
SELECT p.id,
SUM(CASE WHEN grp = 1 THEN quantity END) SUMExpenseItems,
SUM(CASE WHEN grp = 2 THEN quantity END) SUMIncomeItems
FROM (
SELECT product, quantity,1 grp
FROM ExpenseItems
UNION ALL
SELECT product, quantity,2
FROM IncomeItems
) t1 JOIN Product p on p.id = t1.product
GROUP BY p.id;
| id | sumexpenseitems | sumincomeitems |
| --- | --------------- | -------------- |
| 1 | 125 | 125 |
| 2 | 180 | 180 |
| 3 | 180 | 180 |
View on DB Fiddle

Similar to #D-Shih's answer, PostgreSQL 9.4+ supports the FILTER() clause for conditional aggregation in place of CASE statements:
SELECT p.id,
SUM(quantity) FILTER (WHERE grp = 1) SUMExpenseItems,
SUM(quantity) FILTER (WHERE grp = 2) SUMIncomeItems
FROM
-- ...same union all query...
GROUP BY p.id

SELECT check the colum of the max row

Here my row with my first select:
SELECT
user.id, analytic_youtube_demographic.age,
analytic_youtube_demographic.percent
FROM
`user`
INNER JOIN
analytic ON analytic.user_id = user.id
INNER JOIN
analytic_youtube_demographic ON analytic_youtube_demographic.analytic_id = analytic.id
Result:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |13-17| 19,6 |
| 1 |18-24| 38.4 |
| 1 |25-34| 22.5 |
| 1 |35-44| 11.5 |
| 1 |45-54| 5.3 |
| 1 |55-64| 1.6 |
| 1 |65+ | 1.2 |
| 2 |13-17| 10 |
| 2 |18-24| 10 |
| 2 |25-34| 25 |
| 2 |35-44| 5 |
| 2 |45-54| 25 |
| 2 |55-64| 5 |
| 1 |65+ | 20 |
---------------------------
The max value by user_id:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |18-24| 38.4 |
| 2 |45-54| 25 |
| 2 |25-34| 25 |
---------------------------
And I need to filter Age in ['25-34', '65+']
I must have at the end :
-----------
| id |
|----------
| 2 |
-----------
Thanks a lot for your help.
Have tried to use MAX(analytic_youtube_demographic.percent). But I don't know how to filter with the age too.
Thanks a lot for your help.

You can use the rank() function to identify the largest percentage values within each user's data set, and then a simple WHERE clause to get those entries that are both of the highest rank and belong to one of the specific demographics you're interested in. Since you can't use windowed functions like rank() in a WHERE clause, this is a two-step process with a subquery or a CTE. Something like this ought to do it:
-- Sample data from the question:
create table [user] (id bigint);
insert [user] values
(1), (2);
create table analytic (id bigint, [user_id] bigint);
insert analytic values
(1, 1), (2, 2);
create table analytic_youtube_demographic (analytic_id bigint, age varchar(32), [percent] decimal(5, 2));
insert analytic_youtube_demographic values
(1, '13-17', 19.6),
(1, '18-24', 38.4),
(1, '25-34', 22.5),
(1, '35-44', 11.5),
(1, '45-54', 5.3),
(1, '55-64', 1.6),
(1, '65+', 1.2),
(2, '13-17', 10),
(2, '18-24', 10),
(2, '25-34', 25),
(2, '35-44', 5),
(2, '45-54', 25),
(2, '55-64', 5),
(2, '65+', 20);
-- First, within the set of records for each user.id, use the rank() function to
-- identify the demographics with the highest percentage.
with RankedDataCTE as
(
select
[user].id,
youtube.age,
youtube.[percent],
[rank] = rank() over (partition by [user].id order by youtube.[percent] desc)
from
[user]
inner join analytic on analytic.[user_id] = [user].id
inner join analytic_youtube_demographic youtube on youtube.analytic_id = analytic.id
)
-- Now select only those records that are (a) of the highest rank within their
-- user.id and (b) either the '25-34' or the '65+' age group.
select
id,
age,
[percent]
from
RankedDataCTE
where
[rank] = 1 and
age in ('25-34', '65+');

How to the write SQL to show the data in my case in Oracle?

I have a table like this -
create table tbl1
(
id number,
role number
);
insert into tbl1 values (1, 1);
insert into tbl1 values (2, 3);
insert into tbl1 values (1, 3);
create table tbl2
(
role number,
meaning varchar(50)
);
insert into tbl2 values (1, 'changing data');
insert into tbl2 values (2, 'move file');
insert into tbl2 values (3, 'dance');
I want the sql result like the following -
id role_meaning is_permitted
1 changing data yes
1 move file no
1 dance yes
2 changing data no
2 move file no
2 dance yes
Please help how can I do this? I have tried several methods but not sure how to do this.

You can use partitioned outer join here.
SQL Fiddle
Query 1:
select tbl1.id,
tbl2.meaning,
case when tbl1.role is NULL then 'no' else 'yes' end is_permitted
from tbl1
partition by (id) right outer join tbl2
on tbl1.role = tbl2.role
order by tbl1.id, tbl2.role
Results:
| ID | MEANING | IS_PERMITTED |
|----|---------------|--------------|
| 1 | changing data | yes |
| 1 | move file | no |
| 1 | dance | yes |
| 2 | changing data | no |
| 2 | move file | no |
| 2 | dance | yes |

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Joins with NOT IN displays incorrect data - sql

Related

How to return columns from nested SELECT queries in the final table?

Get records having the same value in 2 columns but a different value in a 3rd column

Postgres - Query to select fields from multiple tables as columns

SELECT check the colum of the max row

How to the write SQL to show the data in my case in Oracle?

Categories

Resources