Nested queries in Hive SQL - sql

I have a database, and I use a query to produce an intermediate table like this:
id a b
xx 1 2
yy 7 11
and I would like to calculate the standard deviations of b for the users who have a < avg(a)
I calculate avg(a) that way and it works fine:
select avg(select a from (query to produce intermediate table)) from table;
But the query:
select stddev_pop(b)
from (query to produce intermediate table)
where a < (select avg(select a
from (query to produce intermediate table))
from table);
Returns me an error, and more precisely, I am told that the "a" from avg(select a from...) is not recognised. This makes me really confused, as it works in the previous query.
I would be grateful if somebody could help.
EDIT:
I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem.
The non working query becomes:
select stddev_pop(b) from temp where a < (select avg(a) from temp);
while this works:
select avg(a) from temp;

OK, a colleague helped me to do it. I'll post the answer in case someone runs into the same problem:
select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;
Basically hive doesn't do caching of a table as a variable.

You likely need to move your parentheses in your WHERE clause. Try this:
select stddev_pop(b)
from (query to produce intermediate table)
where c < ( select avg(a)
from (query to produce intermediate table)
);
And, your question refers to a column c; did you mean a?
UPDATE: I saw a similar question with MySQL today; sorry I don't know Hive. See if this works:
select stddev_pop(b)
from temp
where a < ( select *
from (select avg(a) from temp) x
);

ok , first of all hive doesnt support sub queries anywhere only than the from clause.
so you can't use subquery in where clause you have to create a temp table in from clause and you can use that table.
Now if you create a temp table and than you are using it in your where clause than to refer that temp table it has to again run the fetching query so again it will not support .
Bob I think hive will not support this
select stddev_pop(b)
from temp
where a < ( select *
from (select avg(a) from temp) x
);
but yes
select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;
if we can create a temp table physically and put the data select avg(a) as average from temp into that then we can refer this .

Related

SQL Logic: Finding Non-Duplicates with Similar Rows

I'll do my best to summarize what I am having trouble with. I never used much SQL until recently.
Currently I am using SQL Server 2012 at work and have been tasked with trying to find oddities in SQL tables. Specifically, the tables contain similar information regarding servers. Kind of meta, I know. So they each share a column called "DB_NAME". After that, there are no similar columns. So I need to compare Table A and Table B and produce a list of records (servers) where a server is NOT listed in BOTH Table A and B. Additionally, this query is being ran against an exception list. I'm not 100% sure of the logic to best handle this. And while I would love to get something "extremely efficient", I am more-so looking at something that just plain works at the time being.
SELECT *
FROM (SELECT
UPPER(ta.DB_NAME) AS [DB_Name]
FROM
[CMS].[dbo].[TABLE_A] AS ta
UNION
SELECT
UPPER(tb.DB_NAME) AS [DB_Name]
FROM
[CMS].[dbo].[TABLE_B] as tb
) AS SQLresults
WHERE NOT EXISTS (
SELECT *
FROM
[CMS].[dbo].[TABLE_C_EXCEPTIONS] as tc
WHERE
SQLresults.[DB_Name] = tc.DB_NAME)
ORDER BY SQLresults.[DB_Name]
One method uses union all and aggregation:
select ab.*
from ((select upper(name) as name, 'A' as which
from CMS.dbo.TABLE_A
) union all
(select upper(name), 'B' as which
from CMS.dbo.TABLE_B
)
) ab
where not exists (select 1
from CMS.dbo.TABLE_C_EXCEPTION e
where upper(e.name) = ab.name
)
having count(distinct which) <> 2;
SQL Server is case-insensitive by default. I left the upper()s in the query in case your installation is case sensitive.
Here is another option using EXCEPT. I added a group by in each half of the union because it was not clear in your original post if DB_NAME is unique in your tables.
select DatabaseName
from
(
SELECT UPPER(ta.DB_NAME) AS DatabaseName
FROM [CMS].[dbo].[TABLE_A] AS ta
GROUP BY UPPER(ta.DB_NAME)
UNION ALL
SELECT UPPER(tb.DB_NAME) AS DatabaseName
FROM [CMS].[dbo].[TABLE_B] as tb
GROUP BY UPPER(tb.DB_NAME)
) x
group by DatabaseName
having count(*) < 2
EXCEPT
(
select DN_Name
from CMS.dbo.TABLE_C_EXCEPTION
)

Sum Distinct By Other Column

I have a problem with PL/SQL since i am new in PL/SQL world.
Let's say i have table like this.
COlumnA COlumnB COlumnC
1 5000000000 X
1 5000000000 X
2 4350000000 X
2 4350000000 X
3 10000000000 X
3 10000000000 X
3 10000000000 X
4 1809469720 Y
5 10000000000 X
5 10000000000 X
6 3000000000 X
6 3000000000 X
And i want to produce select statement as below.
ColumnC |Sum
X |32350000000
Y |1809469720
I have solved this problem in Oracle 12c with inner query, but when the system need to go to Oracle 11g, my query doesn't work anymore, i need to have the expected result with only one select statement.
Could anyone please advise?
Thank you!
This is what I came up with... using an inline view rather than a correlated subquery in the SELECT list.
SELECT d.columnc AS "ColumnC"
, SUM(d.columnb) AS "Sum"
FROM ( SELECT t.columna
, t.columnb
, t.columnc
FROM tablea t
GROUP
BY t.columna
, t.columnb
, t.columnc
) d
GROUP
BY d.columnc
This uses an inline view (aliased as "d") to return a "distinct" set of rows from tablea. We can get a distinct set, using a GROUP BY clause, or including the DISTINCT keyword, or even by writing a query that uses a UNION set operator.
Just wrap that query in parens, assign an alias, and use it in the FROM clause, as if it were a table or view.
The statement operates similarly to referencing a VIEW in the FROM clause.
You don't need to do this, but to illustrate how the query above operates. We could create a view, like this:
CREATE VIEW d AS
SELECT t.columna
, t.columnb
, t.columnc
FROM tablea t
GROUP
BY t.columna
, t.columnb
, t.columnc
And then we can reference the view in the FROM clause of another query, for example
SELECT d.columnc AS "ColumnC"
, SUM(d.columnb) AS "Sum"
FROM d
GROUP
BY d.columnc
But we don't actually need to create the VIEW object. We can include the view query as an "inline view".
I don't believe that Oracle 11g has a restriction on the nesting of inline views to three levels. I suspect that the restriction you are running into is related to correlated subqueries. The subquery can reference columns from the outer query, but only up one level... columns from the query it's used in. It can't reference columns in a query that is further out. (I've not confirmed with testing, but that's my recollection.)
This is where the actual ORA- and/or PLS- error message from Oracle would be of some help in identifying the restriction you are running into.
First find the distinct values of COlumnA,COlumnB and COlumnC then do the aggregation
Try this
select COlumnC,sum(COlumnB) from
(
select distinct COlumnA,COlumnB,COlumnC
from Table1
)
Group by COlumnC
Or you can simple use this query.
Select sum(columnB) as sum,columnC from table_name group by ColumnC;

Loop inside SELECT in SQL Server query

I'm trying to get a few columns using a function F(a,b,x) from SQL Server for say 20 values. Basically it's like
SELECT
col1, col2, F(a,b,1), F(a,b,2), ... F(a,b,20)
FROM
table
Is that possible to use a loop to SELECT F(a,b,#i) where 0 < #i < 21?
Thanks!
There is no "looping" in a SQL statement. But, you can come close by doing:
select col1, col2, n.n, f(a, b, n.n)
from table t cross join
(select 1 as n union all select 2 union all select 3 . . .
select 20
) n;
The exact syntax depends on the database you are using. There are also ways to generate numbers, once again, depending on the database.
This generates 20 rows for each row in the table, rather than 20 columns. The results can be pivoted if you really need them in columns.

SQL "WITH" to include multiple derived tables

Can I write something like below. But this is not giving proper output in WinSQL/Teradata
with
a (x) as ( select 1 ),
b (y) as ( select * from a )
select * from b
Do you really need to use CTEs for this particular solution when derived tables would work as well:
SELECT B.*
FROM (SELECT A.*
FROM (SELECT 1 AS Col1) A
) B;
That being said, I believe multiple CTEs are available in Teradata 14.10 or 15. I believe support for a single CTE and the WITH clause were introduced in Teradata 12 or 13.
You call the dependent 1st and then the parent
like this and it will work. Why is it like that ? Teradata likes people to play with it longer and spend more time with it, making it feel important
with
"b" (y) as ( select * from "a" ),
"a" (x) as ( select '1' )
select * from b

Query where two columns are in the result of nested query

I'm writing a query like this:
select * from myTable where X in (select X from Y) and XX in (select X from Y)
Values from columns X and XX has to be in the result of the same query: select X from Y.
I think that this query is invoked twice so its senseless. Is there any other option I can write this query more efficiently? Maybe temp table?
Actually no, there isn't a smarter way to write this (without visiting Y twice) given the X that myTable.X and myTable.YY matches to may not be from the same row.
As an alternative, the EXISTS form of the query is
select *
from myTable A
where exists (select * from Y where A.X = Y.X)
and exists (select * from Y where A.XX = Y.X)
If Y contains X values of 1,2,3,4,5, and x.x = 2 and x.xx = 4, they both exist (on different records in Y) and the record from myTable should be shown in output.
EDIT: This answer previously stated that You could rewrite this using _EXISTS_ clauses which will work faster than _IN_. AS Martin has pointed out, this is not true (certainly not for SQL Server 2005 and above). See links
http://explainextended.com/2009/06/16/in-vs-join-vs-exists/
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
It will probably not be particularly efficient to try to write this query by only referencing Y once. However, given that you are using SQL Server 2008, there are variations that can be used:
Select ...
From MyTable As T
Where Exists (
Select 1
From Y
Where Y.X = T.X
Intersect
Select 1
From Y
Where Y.X = T.XX
)
Addition
Actually, I can think of a way you could do it without using Y more than once (Nothing was said about using MyTable more than once). However, this is more for academic reasons as I think that using my first solution will likely perform better:
Select ...
From MyTable As T
Where Exists (
Select 1
From Y
Where Exists(
Select 1
From MyTable1 As T1
Where T1.X = Y.X
Intersect
Select 1
From MyTable1 As T2
Where T2.XX = Y.X
)
And Y.X In(T.X, T.XX)
)
WITH
w_tmp AS(
SELECT x
FROM y
)
SELECT *
FROM myTable
WHERE x IN (SELECT x FROM w_tmp)
AND xx IN (SELECT x FROM w_tmp)
(I've read this in Oracle docs, but I think MS able to do this optimizations too)
This way optimizer knows for sure that you are doing same query and can create temporary table to cash results (But it's still up to optimizer to decide whether it's worth it. For tiny queries, overhead of creating temp table can be too high).
Also (and actually this is way more important for me), when subquery is 50 lines, it's easier for human to see, that the same thing is used in both cases. Pretty much like factoring long functions into subroutines
Docs on MSDN
Not sure what the problem is but isn't simple JOIN an answer?
SELECT t.*
FROM myTable
JOIN Y y1 ON y1.X = myTable.X
JOIN Y y2 ON y2.X = myTable.XX
or
SELECT t.*
FROM myTable, Y y1, Y y2
WHERE y1.X = myTable.X AND y2.X = myTable.XX
ADDED: if there is a strong need to eliminate a second query for Y, let's reverse the logic:
;WITH A(X)
AS (
-- this will select all values that can be found in Y and myTable X and XX fields.
SELECT Y.X -- if there are a lot of dups, add DISTINCT
FROM Y, myTable
WHERE Y.X IN (myTable.X, myTableXX)
)
-- now join back to the orignal table and filter.
SELECT t.*
FROM myTable
-- similar to what has been mentioned before
WHERE EXISTS(SELECT TOP 1 * from A where A.X = myTable.X)
AND EXISTS(SELECT TOP 1 * from A where A.X = myTable.XX)
If you don't like WITH, you may use SELECT INTO clause and create in-memory table.