Random sample using hive - hive

I'm trying to get a 5% random sample from a huge table.
create table database.five_percent_table as select * from (select distinct id from database.customer_list) where rand() <= 0.05 and month = 06;
Error while compiling statement: FAILED: ParseException line 3:0
cannot recognize input near 'where' 'rand' '(' in subquery source
I couldn't figure out the reason. Any help here is highly appreciated. Thanks in advance.

CREATE TABLE database.five_percent_table AS
SELECT * FROM (
SELECT distinct id
FROM database.customer_list
) alias
WHERE rand() <= 0.05 AND month = 06;
From the docs:
Hive supports subqueries only in the FROM clause (through Hive 0.12).
The subquery has to be given a name because every table in a FROM
clause must have a name. Columns in the subquery select list must have
unique names. The columns in the subquery select list are available in
the outer query just like columns of a table. The subquery can also be
a query expression with UNION. Hive supports arbitrary levels of
subqueries.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

Related

Query working in MySQL but trowing error in Oracle can someone please explain. and tell me how to rewrite this same query in oracle to avoid error [duplicate]

I have a query
SELECT COUNT(*) AS "CNT",
imei
FROM devices
which executes just fine. I want to further restrict the query with a WHERE statement. The (humanly) logical next step is to modify the query followingly:
SELECT COUNT(*) AS "CNT",
imei
FROM devices
WHERE CNT > 1
However, this results in a error message ORA-00904: "CNT": invalid identifier. For some reason, wrapping the query in another query produces the desired result:
SELECT *
FROM (SELECT COUNT(*) AS "CNT",
imei
FROM devices
GROUP BY imei)
WHERE CNT > 1
Why does Oracle not recognize the alias "CNT" in the second query?
Because the documentation says it won't:
Specify an alias for the column
expression. Oracle Database will use
this alias in the column heading of
the result set. The AS keyword is
optional. The alias effectively
renames the select list item for the
duration of the query. The alias can
be used in the order_by_clause but not
other clauses in the query.
However, when you have an inner select, that is like creating an inline view where the column aliases take effect, so you are able to use that in the outer level.
The simple answer is that the AS clause defines what the column will be called in the result, which is a different scope than the query itself.
In your example, using the HAVING clause would work best:
SELECT COUNT(*) AS "CNT",
imei
FROM devices
GROUP BY imei
HAVING COUNT(*) > 1
To summarize, this little gem explains:
10 Easy Steps to a Complete Understanding of SQL
A common source of confusion is the simple fact that SQL syntax
elements are not ordered in the way they are executed. The lexical
ordering is:
SELECT [ DISTINCT ]
FROM
WHERE
GROUP BY
HAVING
UNION
ORDER BY
For simplicity, not all SQL clauses are listed. This lexical ordering
differs fundamentally from the logical order, i.e. from the order of
execution:
FROM
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
UNION
ORDER BY
As a consequence, anything that you label using "AS" will only be available once the WHERE, HAVING and GROUP BY have already been performed.
I would imagine because the alias is not assigned to the result column until after the WHERE clause has been processed and the data generated. Is Oracle different from other DBMSs in this behaviour?

DB2 SELECT EXCEPT with WHERE clause

I'm trying to compare two tables in a DB2 database in z/OS using SPUFI to submit SQL queries.
I'm doing this by using EXCEPT to see the difference between two SELECT queries.
I need to filter the SELECT statement from the first query with a WHERE clause.
SELECT KEY_FIELD_1,LOOKUP_FIELD_1
FROM TABLE_1
WHERE FILTER_FIELD = '1'
EXCEPT
SELECT KEY FIELD_2,LOOKUP_FIELD_2
FROM TABLE_2
I got results back, but it also returned an error -199 Is this because the WHERE clause is not present in the second SELECT statement?
ERROR: ILLEGAL USE OF KEYWORD EXCEPT.
TOKEN <ERR_STMT> <WNG_STMT> GET SQL
SAVEPOINT HOLD FREE ASSOCIATE WAS EXPECTED
Try introducing parentheses e.g.
( SELECT KEY_FIELD_1,LOOKUP_FIELD_1
FROM TABLE_1
WHERE FILTER_FIELD = '1' )
EXCEPT
( SELECT KEY FIELD_2,LOOKUP_FIELD_2
FROM TABLE_2 )

how do i filter a column with multiple values

how do i filter a column col1 with multiple values
select * from table where col1=2 and col1=4 and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1'
i tired
select * from table where col1=2 or col1=4 and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1'
the result of both queries is incorrect
the first one gives zero results
second one gives 3 results
the correct is 2 results
the sql will run against the sqlite db.
AND is evaluated before OR, so your query is equivalent to:
select *
from table
where col1=2 or (col1=4 and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1')
You need to explicitly group the conditions when mixing AND and OR:
select *
from table
where (col1=2 or col1=4) and userID='740b9738-63d2-67ff-ba21-801b65dd0ae1'

Understanding Oracle aliasing - why isn't an alias not recognized in a query unless wrapped in a second query?

I have a query
SELECT COUNT(*) AS "CNT",
imei
FROM devices
which executes just fine. I want to further restrict the query with a WHERE statement. The (humanly) logical next step is to modify the query followingly:
SELECT COUNT(*) AS "CNT",
imei
FROM devices
WHERE CNT > 1
However, this results in a error message ORA-00904: "CNT": invalid identifier. For some reason, wrapping the query in another query produces the desired result:
SELECT *
FROM (SELECT COUNT(*) AS "CNT",
imei
FROM devices
GROUP BY imei)
WHERE CNT > 1
Why does Oracle not recognize the alias "CNT" in the second query?
Because the documentation says it won't:
Specify an alias for the column
expression. Oracle Database will use
this alias in the column heading of
the result set. The AS keyword is
optional. The alias effectively
renames the select list item for the
duration of the query. The alias can
be used in the order_by_clause but not
other clauses in the query.
However, when you have an inner select, that is like creating an inline view where the column aliases take effect, so you are able to use that in the outer level.
The simple answer is that the AS clause defines what the column will be called in the result, which is a different scope than the query itself.
In your example, using the HAVING clause would work best:
SELECT COUNT(*) AS "CNT",
imei
FROM devices
GROUP BY imei
HAVING COUNT(*) > 1
To summarize, this little gem explains:
10 Easy Steps to a Complete Understanding of SQL
A common source of confusion is the simple fact that SQL syntax
elements are not ordered in the way they are executed. The lexical
ordering is:
SELECT [ DISTINCT ]
FROM
WHERE
GROUP BY
HAVING
UNION
ORDER BY
For simplicity, not all SQL clauses are listed. This lexical ordering
differs fundamentally from the logical order, i.e. from the order of
execution:
FROM
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
UNION
ORDER BY
As a consequence, anything that you label using "AS" will only be available once the WHERE, HAVING and GROUP BY have already been performed.
I would imagine because the alias is not assigned to the result column until after the WHERE clause has been processed and the data generated. Is Oracle different from other DBMSs in this behaviour?

SQL Query Syntax : Using table alias in a count is invalid? Why?

Could someone please explain to me why the following query is invalid? I'm running this query against an Oracle 10g database.
select count(test.*) from my_table test;
I get the following error: ORA-01747: invalid user.table.column, table.column, or column specification
however, the following two queries are valid.
select count(test.column) from my_table test;
select test.* from my_table test;
COUNT(expression) will count all rows where expression is not null. COUNT(*) is an exception, it returns the number of rows: * is not an alias for my_table.*.
As far as I know, Count(Table.*) is not officially supported in the SQL specification. Only Count(*) (count all rows returned) and Count(Table.ColumnName) (count all non-null values in the given column). So, even if the DBMS supported it, I would recommend against using it.`
This syntax only works in PostgreSQL and only because it has a record datatype (for which test.* is a meaningful expression).
Just use COUNT(*).
This query:
select count(test.column) from my_table test;
will return you the number of records for which test.column is not NULL.
This query:
select test.* from my_table test;
will just return you all records from my_table.
COUNT as such is probably the only aggregate that makes sense without parameters, and using an expression like COUNT(*) is just a way to call a function without providing any actual parameters to it.
You might reasonably want to find the number of records where test.column is not NULL if you are doing an outer join. As every table should have a PK (which is not null) you should be able to count the rows like that if you want:
select count(y.pk)
from x
left outer join y on y.pk = x.ck
COUNT(*) is no good here because the outer join is creating a null row for the table that is deficient in information.