I want to create the effect of a cross apply in AWS EMR Hive. I've got a little sample code here that runs in SQL Server 2017.
with r as (
select 1 as d
union all
select 2 as d
)
select * from r
cross apply (select 'f' as u) e;
How can I run the equivalent of this in EMR Hive?
I've checked out the Lateral View documentation, but it all references explode, and I don't have an array.
Instead of do CROSS APPLY you may do CROSS JOIN in your case. F.e.:
SET hive.strict.checks.cartesian.product = false;
WITH r AS (
SELECT 1 AS d
UNION ALL
SELECT 2 AS d
)
SELECT *
FROM r
CROSS JOIN (SELECT 'f' AS u) e;
I ended up working around with by just adding an extra field with a single value and joining the two tables together on that to produce the same effect.
It ended up looking something like:
with d as (
select column, 'AreYouKiddingMe' as k from table
), e as (
select column2, 'AreYouKiddingMe' as k from table2
)
select * from d inner join e on d.k = e.k
Related
I have a column that looks like
a
b
c
and I think I can select using some sort of a window function to get
a 1
a 2
b 1
b 2
c 1
c 2
but can't seem to find something suitable.
I know you can do this using a union but would prefer using a window function if it exists.
Considering you just want 2 rows, I would just CROSS JOIN to a VALUES table construct with the values 1 and 2 in it:
SELECT YT.YourColumn,
V.I
FROM dbo.YourTable YT
CROSS JOIN (VALUES(1),(2))V(I);
select t.myColumn, x.N
from myTable t
CROSS JOIN
(SELECT TOP (2)
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2) x
There are two table A and B. Table A has one to many relationship with B.
I want to fetch records from A and corresponding one single record from B (if B has one record),
If there is multiple record in Table B then find the one having status ='Active' find first.
Below is the query, running in oracle, but we want the same functionality running in AWS Athena, however correlated query is not supported in AWS athena sql. Athena supports ANSI Sql.
SELECT b.*
FROM A a ,B b
WHERE a.instruction_id = b.txn_report_instruction_id AND b.txn_report_instruction_id IN
(SELECT b2.txn_report_instruction_id FROM B b2
WHERE b2.txn_report_instruction_id=b.txn_report_instruction_id
GROUP BY b2.txn_report_instruction_id
HAVING COUNT(b2.txn_report_instruction_id)=1
)
UNION
SELECT * FROM
(SELECT b.*
FROM A a , B b
WHERE a.instruction_id = b.txn_report_instruction_id AND b.txn_report_instruction_id IN
(SELECT b2.txn_report_instruction_id
FROM B b2
WHERE b2.txn_report_instruction_id=b.txn_report_instruction_id
AND b2.status ='ACTIVE'
GROUP BY b2.txn_report_instruction_id
HAVING COUNT(b2.txn_report_instruction_id)> 1
)
)
We need to put all the field in select or in aggregate function when using group by so group by not preferable.
A help would be much appreciated.
[]
2
Output result table
Joining the best row can be achieved with a lateral join.
select *
from a
outer apply
(
select *
from b
where b.txn_report_instruction_id = a.instruction_id
order by case when b.status = 'ACTIVE' then 1 else 2 end
fetch first row only
) bb;
Another option is a window function:
select *
from a
left join
(
select
b.*,
row_number() over (partition by txn_report_instruction_id
order by case when status = 'ACTIVE' then 1 else 2 end) as rn
from b
) bb on bb.txn_report_instruction_id = a.instruction_id and bb.rn = 1;
I don't know about amazon athena's SQL coverage. This is all standard SQL, however, except for OUTER APPLY I think. If I am not mistaken, the SQL standard requires LEFT OUTER JOIN LATERAL (...) ON ... instead, for which you need a dummy ON clause, such as ON 1 = 1. So if above queries fail, there is another option for you :-)
I have the need to write a Hive query that has a subquery in the select statement. Im aware that Hive does not support this, therefore I'm looking out for my options.
select
a,
b,
(select max(tbl.c) from sample_table_a tbl where tbl.d like 'X012%') as d,
e,
f
from sample_table_b
How can I implement the above query in hive without using a cross join because sample_table_a contains about 40000 tuples and so does the sample_table_b.
This could be an option:
There is going to be a single row in table t so join should not be that much a problem
select
a,
b,
max_value,
e,
f
from
sample_table_b
inner join
(
select
max(tbl.c) as max_value
from
sample_table_a tbl
where
tbl.d like 'X012%'
)
t;
In Teradata 16.20 is there a way to update or merge from two CTEs?
For example, in MSSQL we have a first CTE, second CTE using the first CTE, then an update:
with CTE1 as (
select alpha, beta
from someTable a
join otherTable b on a.aleph = b.aleph
), CTE2 as (
select beta, gamma
from CTE1 c
join anotherTable d on c.alpha = d.alpha
)
update u
set u.gamma = e.gamma
from updateTable u
join CTE2 e on u.beta = e.beta;
In Teradata 16.20 this certainly works with one CTE, like this:
merge into mydb.mytable
using (
select alpha, beta
from someTable a
join otherTable b on a.aleph = b.aleph
) as CTE (alpha, beta)
on mytable.alpha = CTE.alpha
when matched then update
set beta = CTE.beta;
Is there a way to do this with two or more CTEs?
You can't use WITH (CTE) inside a derived table (which is what you have in the USING clause of the MERGE statement above), but you can use nested derived tables:
merge into mydb.mytable u
using (
select beta, gamma
from (
select alpha, beta
from someTable a
join otherTable b on a.aleph = b.aleph
) CTE1
join anotherTable d on CTE1.alpha = d.alpha
) CTE2
on u.beta = CTE2.beta
when matched then update
set gamma = CTE2.gamma;
Or if MERGE is not applicable (e.g. join predicates don't include all the Primary Index columns) the same approach with joined UPDATE:
UPDATE u FROM mydb.mytable u,
(
select beta, gamma
from (
select alpha, beta
from someTable a
join otherTable b on a.aleph = b.aleph
) CTE1
join anotherTable d on CTE1.alpha = d.alpha
) CTE2
set gamma = CTE2.gamma
WHERE u.beta = CTE2.beta;
Let us have two tables
create table A (
fkb int,
groupby int
);
create table B (
id int,
search int
);
insert into A values (1, 1);
insert into B values (1, 1);
insert into B values (2, 1);
then the following query
select B.id, t.max_groupby - B.search diff
from B
cross apply (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
) t
return the expected result as follows
id diff
---------
1 0
2 NULL
However, when I add the group by A.fkb into the cross apply, the B row where the corresponding A.fkb does not exist, disappear.
select B.id, t.max_groupby - B.search diff
from B
cross apply (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
group by A.fkb
) t
I was testing on SQL Server as well as on PostgreSQL (with cross join lateral instead of cross apply). Why the group by makes the row disappear? It seems that the cross apply behaves as an outer join in the first case and as an inner join in the latter case. However, it is not clear to me why.
You can see this when you look at the result of the inner query separately:
select max(A.groupby) max_groupby
from A
where A.fkb = 2;
returns a single row with max_groupby = null:
max_groupby
-----------
(null)
However as there is no row with A.fkb = 2 grouping by it yields an empty result which you can see when you run:
select max(A.groupby) max_groupby
from A
where A.fkb = 2
group by A.fkb
and thus the cross join does not return return rows for fkb = 2
You need to use an outer join in order to include the row from B.
In Postgres you would have to write this as:
select B.id, t.max_groupby - B.search diff
from B
left join lateral (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
group by A.fkb
) t on true
I don't know what the equivalent to left join lateral would be in SQL Server. on true would need to be written as on 1=1.
It happens because:
GROUP BY returns nothing when A.fkb = 2
without GROUP BY returns NULL
So your query CROSS APPLY returns different results.
select B.id, t.max_groupby - B.search diff
from B
outer apply (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
group by A.fkb
) t
OUTPUT:
id diff
1 0
2 NULL