How to express relational algebra about join with on pharse?

How to express relational algebra about join with on pharse? - hive

I don't know how to express relational algebra about SQL Query(In fact HiveQL Query) like below:
SELECT a0.subject FROM ( SELECT subject FROM table_name WHERE predicate='headOf'
and object='Department8' ) a0 JOIN ( SELECT subject FROM table_name WHERE
predicate='teacherOf' and object='GraduateCourse6' ) a1 on a0.subject = a1.subject;
I want to know relational algebra about this query.

Related

What is the reason a FROM-subquery is not visible in the WHERE-subquery?

Consider:
SELECT DISTINCT student_id
FROM (SELECT * FROM Grades WHERE dept_id = 'MT') T
WHERE grade = (SELECT MAX(grade) FROM T);
Oracle complains that T in the subquery in WHERE is not an existing table. I know that I can easily work around this by using WITH, but still I want to understand. What is the rule of SQL that governs this case and the logic behind that rule?

I'm not sure how the reason matters; It is, and knowing why won't change that.
As SQL is declarative, there are rules about scope, order of execution, order of precedence, etc. Rules that enable the cost based planner to generate the plan that will actually be executed. One such rule is that you can't evaluate two independent queries over the same set. Even if T was a material table, referencing it twice would bring it in the plan as two independent sets.
Instead you need a different way to express your requirement that's more in keeping with the language. One where you don't try to parse the same set multiple times.
For example, you can acquire two sets from the same expression in this manner...
WITH
T AS
(
SELECT * FROM Grades WHERE dept_id = 'MT'
)
SELECT DISTINCT student_id
FROM T
WHERE grade = (SELECT MAX(grade) FROM T);
Or, you could use windowed functions and allow the internal engine to determine how best to evaluate all the terms with minimum cost...
SELECT
*
FROM
(
SELECT
Grades.*,
MAX(grade) OVER () AS max_grade
FROM
Grades
WHERE
dept_id = 'MT'
)
T
WHERE
grade = max_grade
VERY LONG EDIT: Subjective and Objective arguments against the proposal
The suggestion is that sets defined in outer queries are usable as independent sets in inner queries.
SELECT DISTINCT
student_id
FROM
(
SELECT * FROM Grades WHERE dept_id = 'MT'
)
newSetDefinition
WHERE
grade = (SELECT MAX(grade) FROM newSetDefinition)
-----------------------------
Functionally Equivalent To...
-----------------------------
WITH
newSetDefinition
AS
(
SELECT * FROM Grades WHERE dept_id = 'MT'
)
SELECT DISTINCT
student_id
FROM
newSetDefinition
WHERE
grade = (SELECT MAX(grade) FROM newSetDefinition)
This implies that the following should also work...
SELECT DISTINCT
newSetDefinition.student_id
FROM
(
SELECT * FROM Grades WHERE dept_id = 'MT'
)
newSetDefinition
INNER JOIN
(
SELECT MAX(grade) AS maxGrade FROM newSetDefinition
)
newSetSummary
ON newSetSummary.maxGrade = newSetDefinition.grade
-----------------------------
Functionally Equivalent To...
-----------------------------
WITH
newSetDefinition
AS
(
SELECT * FROM Grades WHERE dept_id = 'MT'
)
SELECT DISTINCT
newSetDefinition.student_id
FROM
newSetDefinition
INNER JOIN
(
SELECT MAX(grade) AS maxGrade FROM newSetDefinition
)
newSetSummary
ON newSetSummary.maxGrade = newSetDefinition.grade
So far, so good...
With nested queries it become a little hazier, as the following isn't possible to accurately represent with CTEs due to different scope availability and naming collisions. It becomes necessary to define CTEs inside sub-queries...
SELECT
*
FROM
(
SELECT * FROM table WHERE something = 1
)
smeg
INNER JOIN
(
SELECT
*
FROM
(
SELECT * FROM table WHERE somethingElse = 2
)
smeg
INNER JOIN
(
SELECT MAX(id) AS maxID FROM smeg
)
smegSummary
ON smegSummary.maxID = smeg.ID
)
smegSubSet
ON smegSubSet.parentID = smeg.ID
-----------------------------
Functionally Equivalent To...
-----------------------------
WITH
smeg AS
(
SELECT * FROM table WHERE something = 1
)
SELECT
*
FROM
smeg
INNER JOIN
(
WITH
smeg AS
(
SELECT * FROM table WHERE something = 1
)
SELECT
*
FROM
smeg
INNER JOIN
(
SELECT MAX(id) AS maxID FROM smeg
)
smegSummary
ON smegSummary.maxID = smeg.ID
)
smegSubSet
ON smegSubSet.parentID = smeg.ID
Okay, so that's okay, just a bit untidy. CTEs help avoid needing deep nesting, so having nested syntax for CTEs is "messy", but even that's a subjective measure.
When you see a "set reference" you look to outer queries until you find a set with that alias, and if none are found use normal rules; CTES, then tables/views in the current schema, then tables/views in the current database but different schema, all taking into account permissions, etc.
Fine, fairly standard scoping rules.
But this next scenario is more objectively problematic...
SELECT
*
FROM
(
SELECT * FROM smeg WHERE something = 1
)
smeg
In current ANSI-SQL this is fine, provided there is a table with the name smeg.
In AlwaysLearning-SQL it's a circular reference. The "nearest" definition for smeg is the outer query. That "overrides" any tables or views named smeg. So, the inner query is now selecting from...itself...
There's an argument to say "just let it raise a circular reference error then".
But that breaks backward-compatibility.
Imagine if Oracle added this functionality to v13? All of a sudden queries that used to work start raising circular reference errors? Why? To make some sub-queries work like CTEs, under the presumption that doing so is helpful/convenient? To make some aspects of life "convenient" we broke some of your queries.
Breaking backwards compatibility happens. But only when the gains far outweigh the consequences.
In this case anything that can be done with your suggestion can be done with CTEs. And CTEs were added without breaking any legacy behaviours. And (subjectively/arguably) CTEs can do this in a manner that is more structured, more maintainable, simpler to read, easier to debug, etc.
I'm personally very happy that no-one has yet broken some queries to implement some very niche functionality.

Select data from multiple databases dependent on row value

I'm trying to select data across multiple databases. I'm able to join the databases however, i'm not sure of how to dynamically declare which database to query data from.
e.g.
SELECT
UID
,ACCT
,Comp
FROM db1.dbo.tbl1
JOIN db2.dbo.tbl1
ON db2.dbo.tbl1.uid=db1.dbo.tbl1.uid
The problem with this is it does not take into account which comp is defined in db1. (comp = comp1, comp2, comp3). Dependent on value of comp, query should provide results
(I'm sure the following is wrong)
SELECT
uid
,acct
,comp
FROM
(case when comp='comp1'
then db2.dbo.tbl1
when comp='comp2'
then db3.dbo.tbl1
when comp='comp3'
then db2.dbo.tbl1
)
(Insert join clause)
(Insert where clause)

Consider a union query
SELECT UID, ACCT, Comp
FROM db1.dbo.tbl1
JOIN db2.dbo.tbl1
ON db2.dbo.tbl1.uid = db1.dbo.tbl1.uid
WHERE comp IN ('comp1', 'comp3')
UNION
SELECT UID, ACCT, Comp
FROM db1.dbo.tbl1
JOIN db3.dbo.tbl1
ON db3.dbo.tbl1.uid = db1.dbo.tbl1.uid
WHERE comp='comp2'

Shared columns between two tables in DB2

My question is duplicate of this SO question but I want to do it for DB2 database while other question was asked for SQL Server.
I have two tables TABLE1 and TABLE2 in Schema BP. I wish to find names of columns shared between these two tables.
There are many schema on DB Server,
I don't see any generic answers there which would be applicable for all types of DBs.

I'm not sure why #SabirKhan answer has so many sub-queries -- just join the meta information to itself -- an inner join will ensure you get results from both tables.
SELECT A.COLNAME AS DUP
FROM SYSCAT.COLUMNS A
JOIN SYSCAT.COLUMNS B ON A.COLNAME = B.COLNAME AND B.TABNAME='TABLE2' AND B.TABSCHEMA='BP'
WHERE A.TABNAME='TABLE1' AND A.TABSCHEMA='BP'
As for you not seeing answers that work with all database platforms. You are correct -- there are no such answers -- database platforms vary a lot.

A DB2 DBA at my office told me this and its giving me correct results,
SELECT BP.BP_COL FROM
(SELECT COLNAME AS BP_COL FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE1' AND TABSCHEMA='BP' ) BP1
INNER JOIN
(SELECT COLNAME AS AR_COL FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE2' AND TABSCHEMA='BP' ) BP2
ON BP1.BP_COL=BP2.AR_COL
WITH UR;

FWiW: For more than just the effective INTERSECT of the names, the following shows that as well as the unmatched names betwixt; the specification of the qualified SYSCOLUMNS or similar catalog VIEW and the corresponding column names may be specific to the DB2 variant such that adjustments are likely required, but the following was successful, exactly as shown, using the IBM DB2 for i 7.1 SQL:
Setup:
create table bp.TABLE1 (in_both char, common char, only_in_t1 char )
;
create table bp.TABLE2 ( only_in_t2 char, in_both char, common char)
;
Query of the columns:
SELECT t1_col, t2_col
from ( select char( column_name, 25) as t1_col
from syscolumns
where table_name = 'TABLE1' and table_schema='BP' ) as t1
full outer join
( select char( column_name, 25) as t2_col
from syscolumns
where table_name = 'TABLE2' and table_schema='BP' ) as t2
on t1_col = t2_col
; -- report from above query, with headings, follows [where a dash indicates NULL value]:
T1_COL T2_COL
IN_BOTH IN_BOTH
COMMON COMMON
ONLY_IN_T1 -
- ONLY_IN_T2

SQL joins and nested query returning different records

I'm currently working on a project which is supposed to take some arbitrary input data, do some natural language processing, and dynamically generate a corresponding SQL query. I also have a "reference" set of SQL queries which I can use to compare my SQL against to verify that the SQL generation is accurate.
This is one such SQL query that I've generated:
SELECT DISTINCT t0.airline_code
FROM (
SELECT airline.*
FROM airline, flight
WHERE (
( airline.airline_code = flight.airline_code )
AND
( flight.flight_days = 'DAILY' )
)
)
AS t0
INNER JOIN (
SELECT airline.*
FROM airline, flight, airport_service, city
WHERE (
( airline.airline_code = flight.airline_code )
AND
( flight.from_airport = airport_service.airport_code )
AND
( airport_service.city_code = city.city_code )
AND
( city.city_name = 'BOSTON' )
)
)
AS t1
ON t0.airline_code = t1.airline_code
INNER JOIN (
SELECT airline.*
FROM airline, flight, airport_service, city
WHERE (
( airline.airline_code = flight.airline_code )
AND
( flight.to_airport = airport_service.airport_code )
AND
( airport_service.city_code = city.city_code )
AND
( city.city_name = 'DALLAS' )
)
)
AS t2
ON t1.airline_code = t2.airline_code;
Running this returns the following columns:
airline_code
------------
AA
CO
HP
TW
DL
NW
UA
US
The reference SQL, however, returns slightly different results:
SELECT DISTINCT airline.airline_code
FROM airline
WHERE airline.airline_code IN
(SELECT flight.airline_code
FROM flight
WHERE (flight.flight_days = 'DAILY'
AND (flight.from_airport IN
(SELECT airport_service.airport_code
FROM airport_service
WHERE airport_service.city_code IN
(SELECT city.city_code
FROM city
WHERE city.city_name = 'BOSTON'))
AND flight.to_airport IN
(SELECT airport_service.airport_code
FROM airport_service
WHERE airport_service.city_code IN
(SELECT city.city_code
FROM city
WHERE city.city_name = 'DALLAS')))));
Result:
airline_code
------------
AA
DL
TW
UA
US
Obviously, the two are different in that the first one is using joins while the second one uses nested SQL statements. However, this doesn't appear to be causing any problems for the other generated SQL/reference SQL that I'm working with, which are structured similarly (the generated SQL uses joins, the reference SQL is nested).
I'm fairly new to SQL, and know next to nothing about databases, so I could be missing something stupidly obvious, but for the life of me, I can't see why the two SQL statements are returning different results. They seem functionally identical, as best as I can tell. Does anybody know what I'm doing wrong, and how I can fix the generated SQL to match the reference?
If it matters, I'm using Microsoft SQL Server 2012.

bksi is right, the problem is in the first query.
Look: you get all companies having daily flights in first query.
Then you RIGHT JOIN companies having flights from Boston - it means you now selected companies having daily flights (from anywhere) AND (anytime) flights from Boston, but not exactly daily flights from Boston.
And yes, the third join gives you companies that at the same time have daily flights, have flights from Boston and have flights to Dallas.
Second query, with nested statements, gives your only companies with daily flights from Boston to Dallas.

INNER JOIN vs IN

SELECT C.* FROM StockToCategory STC
INNER JOIN Category C ON STC.CategoryID = C.CategoryID
WHERE STC.StockID = #StockID
VS
SELECT * FROM Category
WHERE CategoryID IN
(SELECT CategoryID FROM StockToCategory WHERE StockID = #StockID)
Which is considered the correct (syntactically) and most performant approach and why?
The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.
I have looked at the query plans and havent been able to decipher anything from them.
Query Plan 1
Query Plan 2

The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.
Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.
In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.
Btw, yet a third syntax which is similar to using the IN function:
Select ...
From Category
Where Exists (
Select 1
From StockToCategory
Where StockToCategory.CategoryId = Category.CategoryId
And StockToCategory.Stock = #StockId
)

Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.

T think There are just two ways to specify the same desired result.

for sqlite
table device_group_folders contains 10 records
table device_groups contains ~100000 records
INNER JOIN: 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;
WHERE 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;
IN <1 ms
SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to express relational algebra about join with on pharse? - hive

Related

What is the reason a FROM-subquery is not visible in the WHERE-subquery?

Select data from multiple databases dependent on row value

Shared columns between two tables in DB2

SQL joins and nested query returning different records

INNER JOIN vs IN

Categories

Resources