Unable to use exact_count_distinct BigQuery aggregate function - google-bigquery

I am getting Resources exceeded during query execution error when I execute following query:
SELECT EXACT_COUNT_DISTINCT( id ) FROM [bigquery-public-data:github_repos.contents]
I used EXACT_COUNT_DISTINCT aggregate function as COUNT([DISTINCT]) function gives only a statistical approximation.

try below
for BigQuery Legacy SQL
SELECT COUNT(1) AS cnt FROM (
SELECT id
FROM [bigquery-public-data:github_repos.contents]
GROUP BY id
)
for BigQuery Standard SQL
SELECT COUNT(DISTINCT id) as cnt
FROM `bigquery-public-data.github_repos.contents`

Related

How to get sample data from a table or a view in Aster Teradata without using order by?

I am trying to get sample data from a table in Aster Teradata using order by using the following code:
SELECT "col"
FROM (SELECT "col",
Row_number()
OVER (
ORDER BY 1) AS RANK
FROM "nisha_test"."test_table") a
WHERE rank <= 10000
I want to get random 10000 rows without using order by.
If you want a sample you should use the built-in sample feature.
For Aster (or Vantage MLE, but with a slightly different syntax) there's a RandomSample operator, e.g.
SELECT * FROM RandomSample (
ON (SELECT 1) PARTITION BY 1 -- dummy data, but needed
InputTable ('nisha_test.test_table')
NumSample ('10000')
)
For Teradata there's the SAMPLE clause, e.g.
select *
from nisha_test.test_table
SAMPLE 10000
You can also use the QUALIFY clause in Teradata to remove the outer SELECT:
SELECT col
FROM nisha_test.test_table
QUALIFY ROW_NUMBER() OVER (ORDER BY NULL) <= 10000
In Teradata, I think you can use a constant value in the ORDER BY. You may even be able to exclude the ORDER BY altogether: ROW_NUMBER() OVER()
We can use the LIMIT keyword to get random values from a table or a view in Aster DB.
select * from "nisha_test"."test_table" limit 10000;

How to select count distinct in DolphinDB?

I would like to execute the following SQL query in DolphinDB.
select count(distinct symbol) from loadTable("dfs://share","stock1")
It throws an exception Can't call an aggregated function in the argument of another aggregated function call.
Is my query wrong?
you can try like below using subquery
select count(*) from (
select distinct symbol from loadTable("dfs://share","stock1")
) a

Using SQL generate pivot table

i'm trying to learn using SQL to generate pivot table. But no matter how i try i keep getting ORA-00936: missing expression error from oracle.
Here is my query:
SELECT * FROM (SELECT HOS_PAY_ID, AMOUNT FROM HOSPITAL_PAYMENT)
PIVOT (SUM (AMOUNT) FOR AMOUNT IN ([10000],[8000],[7000],[9000],[11000],[13000]) AS TEST
ORDER BY HOS_PAY_ID;
and this is my data:
Thank You.
Try this
SELECT * FROM
(
SELECT HOS_PAY_ID, AMOUNT
FROM HOSPITAL_PAYMENT
)
PIVOT (
SUM (AMOUNT) FOR AMOUNT IN (10000,8000,7000,9000,11000,13000)
)
ORDER BY HOS_PAY_ID;

SELECT statement in WHERE clause on BigQuery not working

I'm trying to run the following query on Google BigQuery:
SELECT SUM(var1) AS Revenue
FROM [table1]
WHERE timeStamp = (SELECT MAX(timeStamp) FROM [table1])
I'm getting the following error:
Error: Encountered "" at line 3, column 19. Was expecting one of:
Is this not supported in BigQuery? If so, would there be an elegant alternative?
Subselect in a comparison predicate is not supported, but you can use IN.
SELECT SUM(var1) AS Revenue
FROM [table1]
WHERE timeStamp IN (SELECT MAX(timeStamp) FROM [table1])
I would use Rank() to get the max timestamp, and filter the #1s in the where clause.
select SUM(var1) AS Revenue
From
(SELECT var1
,RANK() OVER (ORDER BY timestamp DESC) as RNK
FROM [table1]
)
where RNK=1
I don't know how it works with BQ, but in other DB technologies it would be more efficient as it involves only single table scan rather than 2.

Evaluating the mean absolute deviation of a set of numbers in Oracle

I'm trying to implement a procedure to evaluate the median absolute deviation of a set of numbers (usually obtained via a GROUP BY clause).
An example of a query where I'd like to use this is:
select id, mad(values) from mytable group by id;
I'm going by the aggregate function example but am a little confused since the function needs to know the median of all the numbers before all the iterations are done.
Any pointers to how such a function could be implemented would be much appreciated.
In Oracle 10g+:
SELECT MEDIAN(ABS(value - med))
FROM (
SELECT value, MEDIAN(value) OVER() AS med
FROM mytable
)
, or the same with the GROUP BY:
SELECT id, MEDIAN(ABS(value - med))
FROM (
SELECT id, value, MEDIAN(value) OVER(PARTITION BY id) AS med
FROM mytable
)
GROUP BY
id