SQL alternative to sub-query in SELECT Item list - sql

I have RDBMS table and Queries which are working perfectly. I have offloaded data from RDBMS to HIVE table.To run the existing queries on HIVE, we need first to make them compatible to HIVE.
Let's take below example with sub-query in select item list. It is syntactically valid and working fine on RDBMS system. But It Will not work on HIVE As per HIVE manual , Hive supports subqueries only in the FROM and WHERE clause.
Example 1 :
SELECT t1.one
,(SELECT t2.two
FROM TEST2 t2
WHERE t1.one=t2.two) t21
,(SELECT t3.three
FROM TEST3 t3
WHERE t1.one=t3.three) t31
FROM TEST1 t1 ;
Example 2:
SELECT a.*
, CASE
WHEN EXISTS
(SELECT 1
FROM tblOrder O
INNER JOIN tblProduct P
ON O.Product_id = P.Product_id
WHERE O.customer_id = C.customer_id
AND P.Product_Type IN (2, 5, 6, 9)
)
THEN 1
ELSE 0
END AS My_Custom_Indicator
FROM tblCustomer C
INNER JOIN tblOtherStuff S
ON C.CustomerID = S.CustomerID ;
Example 3 :
Select component_location_id, component_type_code,
( select clv.LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'AXLE'
and component_location_id = cl.component_location_id ) as AXLE,
( select clv.LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'SIDE'
and component_location_id = cl.component_location_id ) as SIDE
from stg_dev.component_locations cl ;
I want to know the possible alternative of sub-queries in select item list to make it compatible to hive. Apparently I will be able to transform existing queries in HIVE format.
Any help and guidance is highly appreciated !

The query you provided could be transformed to a simple query with LEFT JOINs.
SELECT
t1.one, t2.two AS t21, t3.three AS t31
FROM
TEST1 t1
LEFT JOIN TEST2 t2
ON t1.one = t2.two
LEFT JOIN TEST3 t3
ON t1.one = t3.three
Since there is no limitation in the subqueries, the joins will return the same data. (The subqueries should return only one or no row for each row in TEST1.)
Please note, that your original query could not handle 1..n connections. In most DBMS, subqueries in the SELECT list should return only with a resultset with one columns and one or no row.

Based on HIVE manual:
SELECT t1.one, t2.two, t3.three
FROM TEST1 t1,TEST2 t2, TEST3 t3
WHERE t1.one=t2.two AND t1.one=t3.three;

SELECT t1.one,t2.two,t3.three FROM TEST1 t1 INNER
JOIN TEST2 t2 ON t1.one=t2.two INNER JOIN TEST3 t3
ON t1.one=t3.three WHERE t1.one=t2.two AND t1.one=t3.three;

SELECT t1.one,t2.two as t21,t3.three as t31 FROM TEST1 t1
INNER JOIN TEST2 t2 ON t1.one=t2.two
INNER JOIN TEST3 t3 ON t1.one=t3.three

Related

Combining 2 select statements

I have 2 select statements having a common column POL.SP_NUM which I wish to combine. I am new to SQL and haven't the slightest clue how to go about with the same.
Query 1:
select POL.SP_NUM POL#
, POL.ASSET_NUM COV#
, count(distinct(POLX.ATTRIB_06)) COUNT_ADDENDA
, count(distinct(POLX.ATTRIB_07)) COUNT_CERT
, sum(POL.QTY) SI
from S_ASSET POL
, S_ASSET_X POLX
Where POL.ROW_ID = POLX.ROW_ID
and POL.SP_NUM in ('000','111','222')
group by
POL.SP_NUM
, POL.ASSET_NUM
Query 1 output:
POL# COV# COUNT_ADDENDA COUNT_CERT SI
000 856 2 0 1000
111 123 0 0 500
222 567 0 1 2000
Query 2:
select POL#, sum(DOCI)
from (
select POL.SP_NUM POL#, sum(Q.AMT + POL.AMT) DOCI
from S_ASSET POL
, S_QUOTE_ITEM Q
where POL.X_QUOTE_ID = Q.ROW_ID
and POL.SP_NUM in ('000','111','222')
group by POL.SP_NUM
UNION ALL
select POL.SP_NUM POL#, sum(QXM.AMT) DOCI
from S_ASSET POL
, S_QUOTE_ITEM Q
, S_QUOTE_ITEM_XM QXM
where POL.X_QUOTE_ID = Q.ROW_ID
and Q.ROW_ID = QXM.PAR_ROW_ID
and POL.SP_NUM in ('000','111','222')
group by POL.SP_NUM
)
group by POL#
Query 2 output:
POL# sum(DOCI)
000 90
111 0
222 10
Desired output:
POL# COV# COUNT_ADDENDA COUNT_CERT SI sum(DOCI)
000 856 2 0 1000 90
111 123 0 0 500 0
222 567 0 1 2000 10
If there is a better way to code this? Suggestions are welcome.
This is no answer to the question, but an answer to the request to explain the join types made in the comments setion.
INNER JOIN (or short: JOIN)
select * from t1 join t2 on t1.colx = t2.coly
only gives you matches. This is the most common join. You could replace the ON clause with a USING clause in case the columns in the ON clause have the same names in the tables. Sometimes usefull to quickly write a query, but I would generally not recommend USING.
LEFT OUTER JOIN (or short: LEFT JOIN)
select * from t1 left join t2 on t1.colx = t2.coly
gives you all t1 records, no matter whether they have a math in t2. So when there is a match or more for a t1 record, then you join these just as wih an inner join, but when a t1 record has no match in t2 then you get the t1 record along with an empty t2 record (all columns are NULL, even the columns you used in the ON clause, which is t2.coly in above example). In other words: you get all records you'd get with an inner join plus all t1 records that have no match in t2.
You can also use a RIGHT JOIN so you'd keep t2 records when there is no t1 match:
select * from t1 right join t2 on t1.colx = t2.coly
but this is regarded less readable by many people, so better don't use right outer joins, but simply swap tables then:
select * from t2 left join t1 on t1.colx = t2.coly
FULL OUTER JOIN (or short: FULL JOIN)
select * from t1 full outer join t2 on t1.colx = t2.coly
this gives you all records from both t1 and t2, no matter whether they have a match in the other table or not. Again: You get all records you'd get with an inner join plus all t1 with no t2 match plus all t2 with no t1 match.
When having several full outer joins the USING clause can come in handy:
select product, sum(p1.amount), sum(p2.amount), sum(p3.amount)
from p1
full outer join p2 using (product)
full outer join p3 using (product);
CROSS JOIN
A cross join joins a table without any criteria, so as to combine each of its records with each of the records already present. This is used to get all combinations and usually followed by a left outer join:
select products.product_id, regions.region_id, count(*)
from products
cross join regions
left join sales on sales.product_id = products.product_id
and sales.region_id = regions.region_id
group by products.product_id, regions.region_id
order by products.product_id, regions.region_id;
This gives you all possible combinations of products and regions and counts the sales therein. So you get a result record even for product / region combinations where nothing was sold (i.e. no entry in table sales).
NATURAL JOIN
looks at common column names to magically join tables. My simple advice: never use this join type.
ANTI JOIN
This is not a join type actually, but a usage of a join, namely an outer join. Here you want to get all records from a table except the matches. You achieve this by outer-joining the tables and then removing matches in the where clause.
select t1.*
from t1
left join t2 on t1.colx = t2.coly
where t2.coly is null;
This looks queer, because we have EXISTS (and IN) to check for existence:
select *
from t1
where not exists (select * from t2 where t2.coly = t1.colx);
So why would one obfuscate things and use the anti join pattern instead? It is a trick used on weak DBMS. When a DBMS is written, joins are the most important thing and the developers of the DBMS put all their effort into making them fast. They may neglect EXISTS and IN at first and only later care about their performance. So it may help then to use a join technique (the anti join) instead. My recommendation: Only use the anti join pattern when running into performance issues with a straight-forward query. So far I've never had to use anti joins it in more than twenty years. (It's good to have that option though. And it's good to know about them, so as to not be confused when stumbling upon such query some time :-)
You can join the queries:
select *
from (your query 1 here) query1
join (your query 2 here) query2 on query2.pol# = query1.pol#;
The same with WITH clauses:
with query1 as (your query 1 here),
query2 as (your query 2 here)
select *
from query1
join query2 on query2.pol# = query1.pol#;

Which performs first WHERE clause or JOIN clause

Which clause performs first in a SELECT statement?
I have a doubt in select query on this basis.
consider the below example
SELECT *
FROM #temp A
INNER JOIN #temp B ON A.id = B.id
INNER JOIN #temp C ON B.id = C.id
WHERE A.Name = 'Acb' AND B.Name = C.Name
Whether, First it checks WHERE clause and then performs INNER JOIN
First JOIN and then checks condition?
If it first performs JOIN and then WHERE condition; how can it perform more where conditions for different JOINs?
The conceptual order of query processing is:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:
CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))
;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))
INSERT INTO test1(name) SELECT 'a' FROM cte
Now run 2 queries:
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1
Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:
1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)
2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)
This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.
Logical order of query processing phases is:
FROM - Including JOINs
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You can have as many as conditions even on your JOINs or WHERE clauses. Like:
Select * from #temp A
INNER JOIN #temp B ON A.id = B.id AND .... AND ...
INNER JOIN #temp C ON B.id = C.id AND .... AND ...
Where A.Name = 'Acb'
AND B.Name = C.Name
AND ....
you can refer to this join optimization
SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
INNER JOIN T3 ON P2(T2,T3)
WHERE P(T1,T2,T3)
The nested-loop join algorithm would execute this query in the following manner:
FOR each row t1 in T1 {
FOR each row t2 in T2 such that P1(t1,t2) {
FOR each row t3 in T3 such that P2(t2,t3) {
IF P(t1,t2,t3) {
t:=t1||t2||t3; OUTPUT t;
}
}
}
}
You can refer MSDN
The rows selected by a query are filtered first by the FROM clause
join conditions, then the WHERE clause search conditions, and then the
HAVING clause search conditions. Inner joins can be specified in
either the FROM or WHERE clause without affecting the final result.
You can also use the SET SHOWPLAN_ALL ON before executing your query to show the execution plan of your query so that you can measure the performance difference in the two.
If you come to this site for the question about logical query processing, you really need to read this article on ITProToday by Itzik Ben-Gan.
Figure 3: Logical query processing order of query clauses
1 FROM
2 WHERE
3 GROUP BY
4 HAVING
5 SELECT
5.1 SELECT list
5.2 DISTINCT
6 ORDER BY
7 TOP / OFFSET-FETCH

LEFT JOIN on 3 tables to get a value

I'm trying to create an new interface for a database but I don't know how to do what I want.
I have 3 tables :
- table1(id1, time, ...)
id11 ..
id12 ..
id13 ..
- table2(id2, price, ...)
id21 ..
id22 ..
id23 ..
- table1_table2(#id1, #id2, value)
id11, id22, 6
id11, id23, 10
id13, id22, 5
So I want to have something like this :
id11, id21, 0
id11, id22, 6
id11, id23, 10
id12, id21, 0
id12, id22, 0
id12, id23, 0
id13, id21, 0
id13, id22, 5
id13, id23, 0
I've tried lots of requests but nothing efficient..
Please, help me ^^
EDIT : I'm using Access ( :'( ) 2007, and apparently, it doesn't support CROSS JOIN...
I tried to use this : http://blog.jooq.org/2014/02/12/no-cross-join-in-ms-access/
but still have a syntax error on the JOIN or the FROM..
EDIT 2 : Here is my query (I'm french, so don't take care of names please ^^)
SELECT Chantier.id_chantier, Indicateur.id_indicateur, Indicateur_chantier.valeur
FROM ((Chantier INNER JOIN Indicateur ON (Chantier.id_chantier*0 = Indicateur.id_indicateur*0))
LEFT JOIN Indicateur_chantier ON ( (Chantier.id_chantier = Indicateur_chantier.id_chantier)
AND (Indicateur.id_indicateur = Indicateur_chantier.id_indicateur) ) )
You should first cross join table1 and table2 to produce their Cartesian product and the left join to get the values where exist :
SELECT t1.id1,t2.id2,ISNULL(t12.value,0)
FROM table1 t1
CROSS JOIN table2 t2
LEFT JOIN table1_table2 t12 on t12.id1=t.id1 and t12.id2=t2.id2
Finally use ISNULL to replace null values with zeros.
Answer may vary by database, this works in SQL Server, you need a CROSS JOIN to get every combination of table1 and table2, then a LEFT JOIN to return pairs with values:
SELECT a.id1, b.id2, COALESCE(c.value,0)
FROM table1 a
CROSS JOIN table2 b
LEFT JOIN table3 c
ON a.id1 = c.id1
AND b.id2 = c.id2
Pairs without values would return NULL, so you can use COALESCE() to return 0 instead.
Demo: SQL Fiddle
In your question you say that Access "doesn't support CROSS JOIN". While it is true that Access SQL does not support
... FROM tableX CROSS JOIN tableY ...
you can perform a cross join in Access by simply using
... FROM tableX, tableY ...
In your case,
SELECT
crossjoin.id1,
crossjoin.id2,
Nz(table1_table2.value, 0) AS [value]
FROM
(
SELECT table1.id1, table2.id2
FROM table1, table2
) AS crossjoin
LEFT JOIN
table1_table2
ON table1_table2.id1 = crossjoin.id1
AND table1_table2.id2 = crossjoin.id2
ORDER BY crossjoin.id1, crossjoin.id2

SQL joining 4 tables issue

I have four tables:
T1
ID ID1 TITLE
1 100 TITLE1
2 100 TITLE2
3 100 TITLE3
T2
ID TEXT
1 LONG1
2 LONG2
T3
ID1 ID2
100 200
T4
ID4 ID2 SUBJECT
1 200 A
2 200 B
3 200 C
4 200 D
5 200 E
I want output in this result format:
TITLE TEXT SUBJECT
TITLE1 LONG1 A
TITLE2 LONG2 B
TITLE3 null C
null null D
null null E
So I made this query but it gives me much more results than it should be.On example titles asre displayed more times than just once etc.
SELECT
t1.title,
t2.text,
t4.subject
FROM t1
LEFT OUTER JOIN t2 ON t1.id=t2.id
INNER JOIN t3 ON t1.id1=t3.id1
LEFT OUTER JOIN t4 ON t4.id2=t3.id2
WHERE
t1.id1=100
Thanks for help
Disclaimer: I don't work with DB2. After some browsing through documentation I have found that DB2 supports row_number() and full outer join, but I might easily be wrong.
To get rid of n:m relationship one has to build additional key. In this case simple solution is to add row number to each record in t1 and t4 and use it as join condition. Row_number does just that, produces numbers for groups of data defined by partition by in ascending sequence in order defined by order by.
As there is difference in number of records in t1 and t4, and it is unknown which one always has more records, I use full outer join to join them.
You can see the test (Sql Server version) # Sql Fiddle.
select t1_rn.title,
t2.[text],
t4_rn.subject
from
(
select t1.id,
t1.title,
t1.id1,
t3.id2,
row_number() over(partition by t1.id1
order by id) rn
from t1
inner join t3
on t1.id1 = t3.id1
) t1_rn
full outer join
(
select t4.subject,
t3.id1,
t4.id2,
row_number() over(partition by t4.id2
order by id4) rn
from t4
inner join t3
on t4.id2 = t3.id2
) t4_rn
on t1_rn.id1 = t4_rn.id1
and t1_rn.id2 = t4_rn.id2
and t1_rn.rn = t4_rn.rn
left join t2
on t1_rn.id = t2.id
This kind of work should definitely be done on presentation side of an application, but I believe that software you are using requires already prepared data.
try this :
select t1.title,t2.text,t4.subject
from t4
left join t3
on t4.id2=t3.id2
left join t1
on t1.id1=t3.id1
left join t2
on t1.id=t2.id
where t1.id=100
You should change your tables. Your last join does that to your output -just analyze your query. for every record from T1 you have every record from T4.
Outer joins are guaranteed to replicate rows, instead of matching only the ones you need. You may want to look at this:
http://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/
To understand what the join types are, and how you can use them.
You are looking for a list of subjects, with associated text and title, but this may not be unique; more than one null exist for each of the titles. You want to drive the join from table 4, and get a list of subjects, with associated titles for each.
Looking at your ouput it appears you want all subjects displayed. Knowing this you should first off build everything off this table.
SELECT columns
FROM T4
Next build up your inner joins.
SELECT columns
FROM T4 subjectTable
INNER JOIN T3 mapTable
ON mapTable.ID2 = subjectTable.ID2
When happy with them, add on your optional columns with the outer join.
SELECT columns
FROM T4 subjectTable
INNER JOIN T3 mapTable
ON mapTable.ID2 = subjectTable.ID2
LEFT OUTER JOIN T2 textTable
ON textTable.ID = subjectTable.ID4
LEFT OUTER JOIN T1 titleTable
ON titleTable.ID1 = mapTable.ID1
WHERE
subjectTable.ID = 100;

SQL Query to get the rows not part of join

I am having below tables.
create table test(int id,int data1);
create table test1(int id,int data2);
insert into test values(1,1,);
insert into test1 values(2,2);
insert into test1 values(3,3);
insert into test1 values(1,1);
Now I want the rows of test, that don't participate in join. i.e I want rows (2,2) and (3,3). I want to be able to do this in mysql.
I don't want to use inner query because of performance.
Thank you
Using LEFT JOIN/IS NULL:
SELECT t1.*
FROM TEST1 t1
LEFT JOIN TEST t ON t.id = t1.id
AND t.data1 = t1.data2
WHERE t.id IS NULL
Assuming the columns being joined on, this is the fastest/most efficient method on MySQL. Otherwise, NOT IN/NOT EXISTS are better choices.
Using NOT EXISTS:
SELECT t1.*
FROM TEST1 t1
WHERE NOT EXISTS(SELECT NULL
FROM TEST t
WHERE t.id = t1.id
AND t.data1 = t1.data2)
Without using sub queries (even the EXISTS variety which I love) you'll need to do a left join and grab the records that didn't join, like so:
select a.* from test1 a
left join test b on a.id = b.id and a.data2 = b.data1
where b.id IS NULL
Perhaps something with the union?
select * from test as a
left outer join test1 as o on a.id = o.id
union all
select * from test as a
right outer join test1 as o on a.id = o.id
where a.id is null;
I assume what you want to achieve if an exclusive join.
http://www.xaprb.com/blog/2005/09/23/how-to-write-a-sql-exclusion-join/