Sequence function in Amazon Athena - sql

I am not able to use the sequence function amazon athena.
It shows a syntax error saying : Not a window function: sequence
I wrote the following code :
SELECT sequence(1, 1) OVER () as seq_num
FROM <table_name>

You might want to use ROW_NUMBER(). You can either use:
SELECT * FROM UNNEST(sequence(1, 5))
or
SELECT ROW_NUMBER() OVER() as seq_num FROM campaign_lookup

Related

Limiting SQL result with query

I am trying to print the table results on the basis of output generated by another table.
so, lets say my table II produces an integer output of some value then I want to use that value to limit my results for table I.
I am unable to make this query work. I wrote this but cant see the problem.
Is there any other way of writing this?
I read somewhere on SO that subquery in limit is not allowed but I am not sure.
select *
from stage_II_final_suzuki_1648155456 `
limit ( select count(*)*0.80
from tmp.stage_II_final_suzuki_1648155456
);
I want to correct the syntax for compiler
ERROR> Failed validation Syntax error: Unexpected "(" at [31:11]
It appears that Google Ads Data Hub uses BigQuery SQL. The syntax for LIMIT accepts a number (literal constant); you can't use any expression that needs evaluation.
It does support a ROW_NUMBER() function though, and you can try:
select *
from (
select s1.*, row_number() over() as rn
from stage_II_final_suzuki_1648155456 s1 `
) s2
where s2.rn <= ( select count(*)*0.80
from tmp.stage_II_final_suzuki_1648155456
);

weird error in hive query:SemanticException Failed to breakup Windowing invocations into Groups

I want a set of random data from hive, for example row_number between 772001 and 773000.
My sql is as below:
select * from (
select *, row_number() over (order by `name`) as row_dsa
from `jck_bonc_demo`.`frjc_jbxx`
)tmp_table where row_dsa between 772001 and 773000
and I get the following error:
[Cloudera][Hardy] (80) Syntax or semantic analysis error thrown in
server while executing query. Error message from server: Error while
compiling statement: FAILED: SemanticException Failed to breakup
Windowing invocations into Groups. At least 1 group must only depend
on input columns. Also check for circular dependencies.
What can I do for this error, anyone can help?
I think this is the syntax you want:
select *
from (select *, row_number() over (order by `name`) as row_dsa
from `jck_bonc_demo`.`frjc_jbxx`
) x
where row_dsa between 772001 and 773000;
You need a subquery to use row_dsa in a where clause.
Use select s.*, ... (with table alias) if you want to select all from table plus one more calculated column, not select *. Also, no need to back-quote non-reserved words:
select *
from (select s.*, row_number() over (order by name) as row_dsa
from jck_bonc_demo.frjc_jbxx s
) x
where row_dsa between 772001 and 773000;
there is a bug in my program,name is not the col of the specified table,the error message is weird. Tanks for your answer #Gordon Linoff #eftjoin

Calcite query to select a given percentage of data from the table

I am looking out for a way to select a given percentage of data from a table using calcite query,
for example lets say we have a table name sample which has around 800 records and I want to select only 30% from the total data present in sample i.e i n result we get only 240 records from 800 records.
Adding to it if we can also use any criteria with the above situation will be an add on.
Thanks in advance.
Calcite's SQL parser supports the SQL standard TABLESAMPLE keyword, for example
SELECT *
FROM t TABLESAMPLE BERNOULLI(30) REPEATABLE(42)
But it is not documented in the Calcite SQL reference, and I've not tried it recently. Give it a try, and if it doesn't work, please log a Calcite JIRA case.
You can use window functions:
select t.*
from (select t.*, count(*) over () as cnt,
row_number() over (order by rand()) as seqnum
from t
) t
where seqnum <= 0.4 * cnt;

Handling duplicates in BigQuery (Nested Table)

I think this is a very simple question but I would like some guidance: I didn't want to have to drop a table to send a new table with the deduplicated records, like using DELETE FROM based on the query below using BigQuery, is it possible? PS: This is a nested table!
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY id, date_register) row_number
FROM
dataset.table)
WHERE
row_number = 1
order by id, date_register
To de-duplicate in place, without re-creating the table - use MERGE:
MERGE `temp.many_random` t
USING (
SELECT DISTINCT *
FROM `temp.many_random`
)
ON FALSE
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
It's simpler than the current accepted answer, as it won't ask you to match the current partitioning or clustering - it will just respect it.
Update: please also check Felipe Hoffa's answer which is simpler, and learn more on this post: BigQuery Deduplication.
You need to exclude row_number from output and overwrite your table using CREATE OR REPLACE TABLE:
CREATE OR REPLACE TABLE your_table AS
PARTITION BY DATE(date_register)
SELECT
* EXCEPT(row_number)
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY id, date_register) row_number
FROM your_table)
WHERE
row_number = 1
If you donĀ“t have a partition field defined at the source, I recommend that you create a new table with the partition field to make this query work so that you can automate the process.

Return Row of Every Nth Record

I'm trying to developer the Oracle SQL version of the accepted answer here:
Return row of every n'th record
What I have so far is:
SELECT ROW_ID, CUST_ACCT_SITE_ID
FROM
(
SELECT CUST_ACCT_SITE_ID as CUST_ACCT_SITE_ID, ROW_NUMBER() OVER (ORDER BY CUST_ACCT_SITE_ID) AS ROW_ID
FROM XXDMX_VOICE_CUSTOMERS_TBL
) AS t
WHERE ROW_ID % 10000 = 0
ORDER BY CUST_ACCT_SITE_ID;
I get the error
ERROR
ORA-00933: SQL command not properly ended
I've tried lots of variations and can't think of what I am doing wrong. Any ideas, Oracle experts?
Try writing the query like this:
SELECT rn, CUST_ACCT_SITE_ID
FROM (SELECT CUST_ACCT_SITE_ID as CUST_ACCT_SITE_ID,
ROW_NUMBER() OVER (ORDER BY CUST_ACCT_SITE_ID) AS rn
FROM XXDMX_VOICE_CUSTOMERS_TBL
) t
WHERE mod(rn, 10000) = 0
ORDER BY CUST_ACCT_SITE_ID;
The primary difference is removing the as for the table alias. Oracle doesn't allow this syntax. I also changed row_id to something else, because "rowid" means something in Oracle and its use could be confusing (see here).
In PL/SQL (the name for "Oracle SQL"), the modulus operator uses this syntax:
WHERE MOD(ROW_ID, 10000) = 0