Easy way to get a single resultset from three identical tables? - sql

The DB I'm working with has three tables with identical column layouts, OPEX, NOPEX and CAPEX. I would like to query all three for items with a matching AssetId and get a single result set so that I can process them all at the same time in my .Net code.
The twist is that I do need to know which table they came from.
I know I can do this with a series of CASE in the SELECT clause, perhaps using the ID column in each where it's non-zero to decide which of the tables it came from. But I would have to have one for each column and the tables are pretty wide.
Is there some other way to solve this problem?

In order to get them into one set, you would use a combination of UNION and EXISTS() checks. The UNION ALL will give you a single result set that contains data from all three tables, and the EXISTS check on each will confirm the table you are querying from has corresponding records in the other tables.
SELECT *, 'OPEX' AS table_name
FROM OPEX o
WHERE EXISTS (
SELECT 1
FROM NOPEX n
WHERE n.asset_id = o.asset_id)
AND EXISTS (
SELECT 1
FROM CAPEX c
WHERE c.asset_id = o.asset_id)
UNION ALL
SELECT *, 'NOPEX' AS table_name
FROM NOPEX n
WHERE EXISTS (
SELECT 1
FROM Opex o
WHERE o.asset_id = n.asset_id)
AND EXISTS (
SELECT 1
FROM CAPEX c
WHERE c.asset_id = n.asset_id)
UNION ALL
SELECT *, 'CAPEX' AS table_name
FROM CAPEX c
WHERE EXISTS (
SELECT 1
FROM Opex o
WHERE o.asset_id = c.asset_id)
AND EXISTS (
SELECT 1
FROM NOPEX n
WHERE n.asset_id = c.asset_id)
I guess you could also do INNER JOINs?
SELECT c.*, 'CAPEX' AS table_name
FROM CAPEX c
INNER JOIN OPEX o
ON o.asset_id = c.asset_id
INNER JOIN NOPEX n
ON n.asset_id = c.asset_id
UNION ALL
SELECT o.*, 'OPEX' AS table_name
FROM OPEX o
INNER JOIN CAPEX c
ON c.asset_id = o.asset_id
INNER JOIN NOPEX n
ON n.asset_id = o.asset_id
UNION ALL
SELECT n.*, 'NOPEX' AS table_name
FROM NOPEX n
INNER JOIN OPEX o
ON o.asset_id = n.asset_id
INNER JOIN CAPEX c
ON c.asset_id = n.asset_id

Similar answer to dfundako, but resolving sooner where AssetId is in all three tables and less hitting of the indexes on the related tables:
;with cte as (
select
AssetID
from (
select distinct
AssetID
from Opex
union all
select distinct
AssetID
from Nopex
union all
select distinct
AssetID
from Capex
) as AssetIDs
group by AssetId
having count(AssetId) = 3
)
select 'Opex', * from Opex as o
inner join cte
on o.AssetID = cte.AssetID
union all
select 'Nopex', * from Nopex as n
inner join cte
on n.AssetID = cte.AssetID
union all
select 'Capex', * from Capex as c
inner join cte
on c.AssetID = cte.AssetID

Related

From 2 tables, get rows unique to each table

Starting with 2 tables, I want to get all rows with value in a certain column(cName) that is present on 1 table but not the other. I want to do this for both tables. I found a solution to use LEFT JOIN which gives me solution for 1 of the tables and I used UNION to combine. Is this a good way to do this or is there a better way?
select *
from College C1 LEFT JOIN myTestTable T1 on C1.cName = T1.cName
where T1.cName IS NULL
UNION
select *
from myTestTable T1 LEFT JOIN College C1 on T1.cName = C1.cName
where C1.cName IS NULL
You can use full join with a where:
SELECT *
FROM College C1 FULL JOIN
myTestTable T1
ON C1.cName = T1.cName
WHERE T1.cName IS NULL OR C1.cName IS NULL;
I prefer anti-join (NOT EXISTS) operators rather than LEFT JOIN. For one, if CName is not unique the left join produces multiple rows which the UNION must eliminate.
select * from College C1
WHERE NOT EXISTS (SELECT 1 FROM myTestTable T1 WHERE C1.cName = T1.cName)
UNION
select * from myTestTable T1
WHERE NOT EXISTS (SELECT 1 FROM College C1 WHERE T1.cName = C1.cName);
If indexes aren't available on CName you'll have some table scans with either LEFT JOIN or the NOT EXISTS.
You could also do this:
select * from College
union all
select * from myTestTable
MINUS ( select * from College intersect select * from myTestTable );

selecting incremental data from multiple tables in Hive

I have five tables(A,B,C,D,E) in Hive database and I have to union the data from these tables based on logic over column "id".
The condition is :
Select * from A
UNION
select * from B (except ids not in A)
UNION
select * from C (except ids not in A and B)
UNION
select * from D(except ids not in A,B and C)
UNION
select * from E(except ids not in A,B,C and D)
Have to insert this data into final table.
One way is to create a the target table (target)and append it with data for each UNION stage and then using this table for joining with the other UNION stage.
This would be the part of my .hql file :
insert into target
(select * from A
UNION
select B.* from
A
RIGHT OUTER JOIN B
on A.id=B.id
where ISNULL(A.id));
INSERT INTO target
select C.* from
target
RIGHT outer JOIN C
ON target.id=C.id
where ISNULL(target.id);
INSERT INTO target
select D.* from
target
RIGHT OUTER JOIN D
ON target.id=D.id
where ISNULL(target.id);
INSERT INTO target
select E.* from
target
RIGHT OUTER JOIN E
ON target.id=E.id
where ISNULL(target.id);
Is there a better to make this happen ? I assume we anyway have to do the
multiple joins/lookups .I am looking forward for best approach to achieve this
in
1) Hive with Tez
2) Spark-sql
Many Thanks in advance
If id is unique within each table, then row_number can be used instead of rank.
select *
from (select *
,rank () over
(
partition by id
order by src
) as rnk
from (
select 1 as src,* from a
union all select 2 as src,* from b
union all select 3 as src,* from c
union all select 4 as src,* from d
union all select 5 as src,* from e
) t
) t
where rnk = 1
;
I think I would try to do this as:
with ids as (
select id, min(which) as which
from (select id, 1 as which from a union all
select id, 2 as which from b union all
select id, 3 as which from c union all
select id, 4 as which from d union all
select id, 5 as which from e
) x
)
select a.*
from a join ids on a.id = ids.id and ids.which = 1
union all
select b.*
from b join ids on b.id = ids.id and ids.which = 2
union all
select c.*
from c join ids on c.id = ids.id and ids.which = 3
union all
select d.*
from d join ids on d.id = ids.id and ids.which = 4
union all
select e.*
from e join ids on e.id = ids.id and ids.which = 5;

Can criteria be re-used in a T-SQL query?

I have a SQL query that looks something like this:
SELECT
o.name,
o.type_id,
(SELECT COUNT(*) FROM a WHERE type_id = o.type_id AND id IN ('1, 2, 3 ... 1000')) AS count_a,
(SELECT COUNT(*) FROM b WHERE type_id = o.type_id AND id IN ('1, 2, 3 ... 1000')) AS count_b,
(SELECT COUNT(*) FROM c WHERE type_id = o.type_id AND id IN ('1, 2, 3 ... 1000')) AS count_c
FROM o
In the subqueries (count_a, count_b and count_c) the criteria specified in the IN clause is the same for each, but its a REALLY long list of numbers (that aren't in fact sequential) and im concerned that:
a) Im slowing the query down by making it too long
b) Its going to get too long and cause an error eventually
Is there a way to alias/reference that list of criteria (perhaps as a variable?) so that it can be re-used in each of the three places it appears in the query? Or am I worrying for nothing?
UPDATE
Given the suggestion of using a CTE, I have changed the query above to work like this for now:
WITH id_list AS (SELECT id FROM source WHERE id IN ('1, 2, 3 ... 1000'))
SELECT
o.name,
o.type_id,
(SELECT COUNT(*) FROM a WHERE type_id = o.type_id AND id IN (SELECT id FROM id_list)) AS count_a,
(SELECT COUNT(*) FROM b WHERE type_id = o.type_id AND id IN (SELECT id FROM id_list)) AS count_b,
(SELECT COUNT(*) FROM c WHERE type_id = o.type_id AND id IN (SELECT id FROM id_list)) AS count_c
FROM o
This cuts the overall length of the query down to about a third of what it was, and although the DB appears to take a couple of milliseconds longer to execute the query, at least I wont run into an error based on the length of query being too long.
QUESTION: Is there a quick way to break a comma separated list of numbers (1, 2, 3 ... 1000) into a result set that could be used as the CTE?
You could use a common-table-expression(CTE):
WITH Numbers AS
(
SELECT N
FROM (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 1000)
AS T(N)
)
SELECT
o.name,
o.type_id,
(SELECT COUNT(*) FROM a WHERE type_id = o.type_id AND id IN (SELECT N FROM Numbers)) AS count_a,
(SELECT COUNT(*) FROM b WHERE type_id = o.type_id AND id IN (SELECT N FROM Numbers)) AS count_b,
(SELECT COUNT(*) FROM c WHERE type_id = o.type_id AND id IN (SELECT N FROM Numbers)) AS count_c
FROM o
If your ids are foreign keys into some other table, then you could make a table variable from your list, and then join into that repeatedly.
DECLARE #id_list TABLE (
id INT
)
INSERT INTO #id_list
SELECT id
FROM pk_location
WHERE id IN ('1, 2, 3 ... 1000')
SELECT
o.name,
o.type_id,
(SELECT COUNT(*) FROM a INNER JOIN #id_list i ON a.id = i.id WHERE type_id = o.type_id) AS count_a,
(SELECT COUNT(*) FROM b INNER JOIN #id_list i ON b.id = i.id WHERE type_id = o.type_id) AS count_b,
(SELECT COUNT(*) FROM c INNER JOIN #id_list i ON c.id = i.id WHERE type_id = o.type_id) AS count_c
FROM o

How to compare two rows in SQL Server

I'm used to mysql when you can do that with no problems. I would like to run the following statement in SQL Server however it doesn't see the column C_COUNT.
SELECT
A.customers AS CUSTOMERS,
(SELECT COUNT(ID) FROM Partners_customers B WHERE A.ID = B.PIID) AS C_COUNT
FROM Partners A
WHERE CUSTOMERS <> [C_COUNT]
Is it possible to utilize any mathematical operations in the SELECT area like
SELECT (CUSTOMERS - C_COUNT) AS DIFFERENCE
SQL Server does not allow you to use aliases in the WHERE clause. You'll have to have something like this:
SELECT *, Customers - C_COUNT "Difference"
FROM (
SELECT
A.customers AS CUSTOMERS,
(SELECT COUNT(ID)
FROM Partners_customers B WHERE A.ID = B.PIID)
AS C_COUNT FROM Partners A
) t
WHERE CUSTOMERS <> [C_COUNT]
Or, better yet, eliminating an inline count:
select A.customers, count(b.id)
FROM Partners A
LEFT JOIN Partners_customers B ON A.ID = B.PIID
Group By A.ID
having a.customers <> count(b.id)
WITH A AS
(
SELECT
A.customers AS CUSTOMERS,
(SELECT COUNT(ID) FROM Partners_customers B WHERE A.ID = B.PIID) AS C_COUNT
FROM Partners A
WHERE CUSTOMERS <> [C_COUNT]
)
SELECT
*,
(CUSTOMERS - C_COUNT) AS DIFFERENCE
FROM A
Completely untested....
(select * from TabA
minus
select * from TabB) -- Rows in TabA not in TabB
union all
(
select * from TabB
minus
select * from TabA
) -- rows in TabB not in TabA

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID