Does Hive joins accept equality operator such as less than symbol

Does Hive joins accept equality operator such as less than symbol - hive

I am trying to match these two tables only if the grp_id_num from left side table is falling in between the beg_group and end_group of right side table
+-------------+--+
| grp_id_num |
+-------------+--+
| XA0001 |
+-------------+--+
+---------------------------+---------------------------+-----------------------
| detail_lookup.beg_group | detail_lookup.end_group | detail_lookup.agent
+---------------------------+---------------------------+-----------------------+
| XA0000 | XZ9999 | Exchange |
| WW9988 | WW9988 | DEVINE ETHIER |
| P00001 | P99999 | SHOP |
| 002359 | 002359 | LG |
My Select query :
select a.grp_id_num,b.beg_group,b.end_group,b.fundtype from
(select codesetkey as grp_id_num from codedetail limit 10)a
left outer join
detail_lookup b
on(a.grp_id_num >= b.beg_group and a.grp_id_num <= b.end_group)
Error :
Error: Error while compiling statement: FAILED: SemanticException [Error 10017]: Line 5:3 Both left and right aliases encountered in JOIN 'beg_group' (state=42000,code=10017)
Expected result :
XA0001 , XA0000 ,XZ9999,Exchange
Can some one help me on hive 1.1?

Hive only supports equi joins,
Can you try the below query.
WITH res1 AS
(
SELECT codesetkey AS grp_id_num
FROM codedetail
ORDER BY codesetkey limit 10 -- adding order by to get the first set of 10 records
), res2 AS
(
SELECT a.grp_id_num,
b.beg_group,
b.end_group,
b.fundtype
FROM res1 a,
codedetail b
WHERE a.grp_id_num >= b.beg_group
AND a.grp_id_num <= b.end_group)
SELECT codesetkey,
NULL,
,
NULL,
NULL
FROM res1
LEFT OUTER JOIN res2
ON res1.codesetkey=res2.grp_id_num
WHERE res2.grp_id_num IS NULL
UNION
SELECT *
FROM res2

Related

Best Way to Join One Column on Columns From Two Other Tables

I have a schema like the following in Oracle
Section:
+--------+----------+
| sec_ID | group_ID |
+--------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+--------+----------+
Section_to_Item:
+--------+---------+
| sec_ID | item_ID |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
+--------+---------+
Item:
+---------+------+
| item_ID | data |
+---------+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+---------+------+
Item_Version:
+---------+----------+--------+
| item_ID | start_ID | end_ID |
+---------+----------+--------+
| 1 | 1 | |
| 2 | 1 | 3 |
| 3 | 2 | |
| 4 | 1 | 2 |
+---------+----------+--------+
Section_to_Item has FK into Section and Item on the *_ID columns.
Item_version is indexed on item_ID but has no FK to Item.item_ID (ran out of space in the snapshot group).
I have code that receives a list of version IDs and I want to get all items in sections in a given group that are valid for at least one of the versions passed in. If an item has no end_ID, it's valid for anything starting with start_ID. If it has an end_id, it's valid for anything up until (not including) end_ID.
What I currently have is:
SELECT Items.data
FROM Section, Section_to_Items, Item, Item_Version
WHERE Section.group_ID = 1
AND Section_to_Item.sec_ID = Section.sec_ID
AND Item.item_ID = Section_to_Item.item_ID
AND Item.item_ID = Item_Version.item_ID
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that the UNION ALL statement is dynamically generated from the list of passed in versions.
This query currently does a cartesian join and is very slow.
For some reason, if I change the query to join
AND Item_Version.item_ID = Section_to_Item.item_ID
which is not a FK, the query does not do the cartesian join and is much faster.
A) Can anyone explain why this is?
B) Is this the right way to be joining this sequence of tables (I feel weird about joining Item.item_ID to two different tables)
C) Is this the right way to get versions between start_ID and end_ID?
Edit
Same query with inner join syntax:
SELECT Items.data
FROM Item
INNER JOIN Section_to_Items ON Section_to_Items.item_ID = Item.item_ID
INNER JOIN Section ON Section.sec_ID = Section_to_Items.sec_ID
INNER JOIN Item_Version ON Item_Version.item_ID = Item_.item_ID
WHERE Section.group_ID = 1
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that in this case the performance difference comes from joining on Item_Version first and then joining Section_to_Item on Item_Version.item_ID.
In terms of table size, Section_to_Item, Item, and Item_Version should be similar (1000s) while Section should be small.
Edit
I just found out that apparently, the schema has no FKs. The FKs specified in the schema configuration files are ignored. They're just there for documentation. So there's no difference between joining on a FK column or not. That being said, by changing the joins into a cascade of SELECT INs, I'm able to avoid joining the entire Item table twice. I don't love the resulting query, and I don't really understand the difference, but the stats indicate it's much less work (changes the A-Rows returned from the inner most scan on Section from 656,000 to 488 (it used to be 656k starts returning 1 row, now it's 488 starts returning 1 row)).
Edit
It turned out to be stale statistics - the two queries were equivalent the whole time but with the incomplete statistics, the DB happened to notice the correct plan only in the second instance. After updating statistics, both queries generated the same plan.

I'm not sure if this is the best idea but this seems to avoid the cartesian join:
select data
from Item
where item_ID in (
select item_ID
from Item_Version
where item_ID in (
select item_ID
from Section_to_Item
where sec_ID in (
select sec_ID
from Section
where group_ID = 1
)
)
and exists (
select 1
from (
select 2 as version
from dual
union all
select 3 as version
from dual
) versions
where versions.version >= start_ID
and (end_ID is null or versions.version <)
)
)

Select and count in the same query on two tables

I've got these two tables:
___Subscriptions
|--------|--------------------|--------------|
| SUB_Id | SUB_HotelId | SUB_PlanName |
|--------|--------------------|--------------|
| 1 | cus_AjGG401e9a840D | Free |
|--------|--------------------|--------------|
___Rooms
|--------|-------------------|
| ROO_Id | ROO_HotelId |
|--------|-------------------|
| 1 |cus_AjGG401e9a840D |
| 2 |cus_AjGG401e9a840D |
| 3 |cus_AjGG401e9a840D |
| 4 |cus_AjGG401e9a840D |
|--------|-------------------|
I'd like to select the SUB_PlanName and count the rooms with the same HotelId.
So I tried:
SELECT COUNT(*) as 'ROO_Count', SUB_PlanName
FROM ___Rooms
JOIN ___Subscriptions
ON ___Subscriptions.SUB_HotelId = ___Rooms.ROO_HotelId
WHERE ROO_HotelId = 'cus_AjGG401e9a840D'
and
SELECT
SUB_PlanName,
(
SELECT Count(ROO_Id)
FROM ___Rooms
Where ___Rooms.ROO_HotelId = ___Subscriptions.SUB_HotelId
) as ROO_Count
FROM ___Subscriptions
WHERE SUB_HotelId = 'cus_AjGG401e9a840D'
But I get empty datas.
Could you please help ?
Thanks.

You need to use GROUP BY whenever you do some aggregation(here COUNT()). Below query will give you the number of ROO_ID only for the SUB_HotelId = 'cus_AjGG401e9a840D' because you have this condition in WHERE. If you want the COUNTs for all Hotel_IDs then you can simply remove the WHERE filter from this query.
SELECT s.SUB_PlanName, COUNT(*) as 'ROO_Count'
FROM ___Rooms r
JOIN ___Subscriptions s
ON s.SUB_HotelId = r.ROO_HotelId
WHERE r.ROO_HotelId = 'cus_AjGG401e9a840D'
GROUP BY s.SUB_PlanName;
To be safe, you can also use COUNT(DISTINCT r.ROO_Id) if you don't want to double count a repeating ROO_Id. But your table structures seem to have unique(non-repeating) ROO_Ids so using a COUNT(*) should work as well.

SQL union / join / intersect multiple select statements

I have two select statements. One gets a list (if any) of logged voltage data in the past 60 seconds and related chamber names, and one gets a list (if any) of logged arc event data in the past 5 minutes. I am trying to append the arc count data as new columns to the voltage data table. I cannot figure out how to do this.
Note that, there may or may not be arc count rows, for a given chamber name that is in the voltage data table. If there are no rows, I want to set the arc count column value to zero.
Any ideas on how to accomplish this?
Voltage Data:
SELECT DISTINCT dbo.CoatingChambers.Name,
AVG(dbo.CoatingGridVoltage_Data.ChanA_DCVolts) AS ChanADC,
AVG(dbo.CoatingGridVoltage_Data.ChanB_DCVolts) AS ChanBDC,
AVG(dbo.CoatingGridVoltage_Data.ChanA_RFVolts) AS ChanARF,
AVG(dbo.CoatingGridVoltage_Data.ChanB_RFVolts) AS ChanBRF FROM
dbo.CoatingGridVoltage_Data LEFT OUTER JOIN dbo.CoatingChambers ON
dbo.CoatingGridVoltage_Data.CoatingChambersID =
dbo.CoatingChambers.CoatingChambersID WHERE
(dbo.CoatingGridVoltage_Data.DT > DATEADD(second, - 60,
SYSUTCDATETIME())) GROUP BY dbo.CoatingChambers.Name
Returns
Name | ChanADC | ChanBDC | ChanARF | ChanBRF
-----+-------------------+--------------------+---------------------+------------------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848
Arc count table:
SELECT CoatingChambers.Name,
SUM(ArcCount) as ArcCount
FROM CoatingChambers
LEFT JOIN CoatingArc_Data
ON dbo.[CoatingArc_Data].CoatingChambersID = dbo.CoatingChambers.CoatingChambersID
where EventDT > DATEADD(mi,-5, GETDATE())
Group by Name
Returns
Name | ArcCount
-----+---------
L1 | 283
L4 | 0
L6 | 1
S2 | 55
To be clear, I want this table (with added arc count column), given the two tables above:
Name | ChanADC | ChanBDC | ChanARF | ChanBRF | ArcCount
-----+-------------------+--------------------+---------------------+-------------------+---------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618 | 0
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112 | 55
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848 | 0

You can treat the select statements as virtual tables and just join them together:
select
x.Name,
x.ChanADC,
x.ChanBDC,
x.ChanARF,
x.ChanBRF,
isnull( y.ArcCount, 0 ) ArcCount
from
(
select distinct
cc.Name,
AVG(cgv.ChanA_DCVolts) AS ChanADC,
AVG(cgv.ChanB_DCVolts) AS ChanBDC,
AVG(cgv.ChanA_RFVolts) AS ChanARF,
AVG(cgv.ChanB_RFVolts) AS ChanBRF
from
dbo.CoatingGridVoltage_Data cgv
left outer join
dbo.CoatingChambers cc
on
cgv.CoatingChambersID = cc.CoatingChambersID
where
cgv.DT > dateadd(second, - 60, sysutcdatetime())
group by
cc.Name
) as x
left outer join
(
select
cc.Name,
sum(ac.ArcCount) as ArcCount
from
dbo.CoatingChambers cc
left outer join
dbo.CoatingArc_Data ac
on
ac.CoatingChambersID = cc.CoatingChambersID
where
EventDT > dateadd(mi,-5, getdate())
group by
Name
) as y
on
x.Name = y.Name
Also, it's worthwhile to simplify your names with aliases and format the queries for readability...which I shamelessly took a stab at.

Oracle SQL: Optimizing LEFT OUTER JOIN of two similar select statements to be smaller and/or more efficient

So I have this Oracle SQL query:
SELECT man.Toilet_Type, NVL(man.manual_PORTA_POTTY, 0) MANUAL, NVL(reg.regular_PORTA_POTTY, 0) REGULAR FROM (
SELECT A.Visitor Toilet_Type, COUNT(A.Toilet_ID) MANUAL_PORTA_POTTY FROM
BORE.EnragedPotty A,
BORE.SemiEnragedPotty B,
BORE.ManualPotty C
WHERE B.SemiEnragedPotty_ID = C.SemiEnragedPotty_ID
AND B.Toilet_ID = A.Toilet_ID
GROUP BY Visitor
ORDER BY Visitor ASC) man
LEFT OUTER JOIN
(SELECT A.Visitor Toilet_Type, COUNT(B.Toilet_ID) REGULAR_PORTA_POTTY FROM
BORE.EnragedPotty A,
BORE.RegularPotty B
WHERE B.Toilet_ID = A.Toilet_ID
GROUP BY Visitor
ORDER BY Visitor ASC) reg ON man.Toilet_Type = reg.Toilet_Type
This gives two table results. The first query, man, gives me the following output:
+===============+========+
| Toilet_Type | Manual |
+===============+========+
| Portable | 234 |
+---------------+--------+
| Home | 10 |
+---------------+--------+
| Assassination | 2 |
+---------------+--------+
The second query, reg, gives me the same output as above, but with REGULAR instead of MANUAL.
What I want to do is query the databases in a more efficient manner. I want the output to be formatted like so:
+===============+========+=========+
| Toilet_Type | Manual | Regular |
+===============+========+=========+
| Portable | 234 | 444 |
+---------------+--------+---------+
| Home | 10 | 222 |
+---------------+--------+---------+
| Assassination | 2 | 111 |
+---------------+--------+---------+
Surely this can be done in a single query without using a LEFT OUTER JOIN?

This is untested, as I didn't have any sample data, but I think something similar to this might get it done in one query:
SELECT
E.Visitor Toilet_Type,
SUM(case when SE.SemiEnragedPotty_ID is not null and
M.Toilet_ID is not null then 1 else 0 end) MANUAL_PORTA_POTTY,
SUM(case when R.Toilet_ID is not null then 1 else 0 end) REGULAR_PORTA_POTTY
FROM
BORE.EnragedPotty E,
BORE.SemiEnragedPotty SE,
BORE.ManualPotty M,
BORE.RegularPotty R
WHERE
E.SemiEnragedPotty_ID = SE.SemiEnragedPotty_ID (+) AND
E.Toilet_ID = M.Toilet_ID (+)
E.Toilet_ID = R.Toilet_ID (+)
GROUP BY Visitor
ORDER BY Visitor ASC
I may have some of the details off -- I had to rename your aliases to follow which table was which, so it wouldn't shock me if I misplaced one of them.

If you need to pull from the same dataset twice, you should consider using subquery factoring.
WITH
some_result_you_dont_want_to_repeat AS (
-- Chunk of SQL goes here
)
SELECT
-- More SQL here
FROM some_result_you_dont_want_to_repeat once
JOIN some_result_you_dont_want_to_repeat twice
ON ...
In your case, it appears that your A-B table join can be factored out.

Oracle sql Inner join first record in right table

my question is this:
I have two tables such as this:
username | portname | symbol | shares
---------+----------+--------+-------
phil | test | APL | 214
---------+----------+--------+--------
It has more records, but that's just an example. Then I have another table such as this, that has multiple records per symbol
symbol | high | low | timestamp
-------+------+-----+-----------
APL | 200 | 20 | *timestamp object
APL | 400 | 34 | *timestamp object
I want a table to be returned where I join the two, but only the first row from the second table is joined so something like this is returned:
symbol | high | low | timestamp
-------+------+-----+----------
APL | 400 | 34 | *timestamp object
So only one record from the right table is matched. I've tried alot of things but haven't gotten anything to work with group by's or distinct.
Thanks!

SELECT t1.symbol, t3.high, t3.low, t3.timestamp
FROM Table1 t1
JOIN (
SELECT inn.*
FROM (SELECT t2.*, (ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY timestamp DESC)) As Rank
FROM Table2 t2) inn
WHERE inn.Rank=1
) t3
ON t1.symbol = t3.symbol;
See SQL Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Does Hive joins accept equality operator such as less than symbol - hive

Related

Best Way to Join One Column on Columns From Two Other Tables

Select and count in the same query on two tables

SQL union / join / intersect multiple select statements

Oracle SQL: Optimizing LEFT OUTER JOIN of two similar select statements to be smaller and/or more efficient

Oracle sql Inner join first record in right table

Categories

Resources