BigQuery Querying Events Table - sql

Objective : To take the users count which has event_name= 'Wallet'
Problem : I have limited the query's result to 100 to check so the expected result must be 100 but when I use count(params.value.string_value) it shows 124 .
Code : SELECT count(params.value.string_value) FROM "myproj.analytics_197163127.events_20190528",UNNEST(event_params) as params where event_name ='Wallet' and params.key = 'UserId' limit 100
Expected Result : if the query is returning 100 records the count should be 100 but how is it showing 124?
Hope the question is clear

limit is applied to the result set produced by the query.
Your query is an aggregation query with no group by. Such an aggregation always returns one row. So, the limit does not affect the results.
If you want to see 100 for the result set, use a CTE or subquery:
SELECT count(params.value.string_value)
FROM (SELECT params
FROM "myproj.analytics_197163127.events_20190528" e CROSS JOIN
UNNEST(e.event_params) params
WHERE e.event_name ='Wallet' AND params.key = 'UserId'
LIMIT 100
) ep

The query shows 100 records because of the limit 100 at the end:
SELECT event_date,event_timestamp,event_name, params.value.string_value
FROM myproj.analytics_197163127.events_20190528, UNNEST(event_params) as params
where event_name ='Wallet' and params.key = 'UserId'
limit 100
Remove that and check again.
The LIMIT 100 specifies the number of rows to be returned from the SQL statement. It does not influence the COUNT() in your query. So there is a difference between :
select count(*) from table limit 100
this will return a single value with the number of rows in the table. On the other hand :
select count(*) from (select * from table limit 100)
This will return 100 (if the table has more than 100 rows - otherwise it will return the number of rows in table)

Related

How to join a query result with a value, received from another query?

I want to calculate transaction costs in USD
for a number of most recent transactions
on the Rootstock blockchain.
I have a PostgreSQL database table with token
prices reports.token_prices
from which I select the value
of the latest available RBTC price in USD:
select tp.price_in_usd
from reports.token_prices tp
where tp.chain_id = 30
and tp.coingecko_token_id = 'rootstock'
order by tp.dt desc
limit 1
(note that tp.dt is a timestamp)
Result of the query:
16995.771
Then I have a table with all transactions,
chain_rsk_mainnet.block_transactions,
from which I select the gas fees
for the 5 most recent ones:
select
bt.fees_paid
from chain_rsk_mainnet.block_transactions bt
order by bt.block_id desc, bt.tx_offset
limit 5
(note that instead of using a timestamp, I'm using bt.block_id and bt.tx_offset for transaction order)
Result:
0
4469416300800
4469416300800
16450260000000
0
Now I want to multiply each of these numbers
by the result of the first query.
How can I do this in SQL?
Without further information your simplest option would be just convert the first query into a CTE then Join that result in the second query.
with price_cte(price_in_usd) as
(select tp.price_in_usd
from reports.token_prices tp
where tp.chain_id = 30
and tp.coingecko_token_id = 'rootstock'
order by tp.dt desc
limit 1
)
select bt.fees_paid * p.price_in_usd) "Fees Paid in USD"
from chain_rsk_mainnet.block_transactions bt
cross join price_cte p
order by bt.block_id desc, bt.tx_offset
limit 5;
NOTE: Not tested, no sample data nor results.

Why is this BigQuery WHERE NOT IN statement giving no results?

We would expect that this Google BigQuery query would remove at most 10 rows of results - but this query gives us zero results - despite that table A has thousands of rows all with unique ENCNTR_IDs.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_B`
LIMIT 10
)
If we make the query self-referential, it behaves as expected: we get thousands of results with just 10 rows removed.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_A` # <--- same table name
LIMIT 10
)
What are we doing wrong? Why does the first query give us zero results rather than just remove 10 rows of results?
Solution: Use NOT EXISTS instead of NOT IN when dealing with possible nulls:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE NOT EXISTS (SELECT * FROM UNNEST([2,3,null]) i2 WHERE i=i2)
# 1
Previous guess - which turned out to be the cause:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3])
# 1
vs
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3,null])
# This query returned no results.
Are there any nulls in that project.dataset.table_B?

Hive: how to get global ordering with sort by

order by has only one reducer, so slow.I'm trying to find a fast way.sort by sorts in each reducer,then how can we get global ordering?
I got this by search engine:
select * from
(select title,cast(price as FLOAT) p from tablename
distribute by time
sort by p desc
limit 10 ) t
order by t.p desc
limit 10;
Then try to validate it.
1.Get right answer in my hive table.There are 215666 records in the table named tablename.
SELECT title,cast(price as FLOAT) p
from tablename
WHERE dt='2020-03-08'
and price IS NOT NULL
ORDER BY p DESC
LIMIT 10
;
2.Use the searched clause.
set hive.execution.engine=mr;
set mapred.reduce.tasks=5;
SELECT title,cast(price as FLOAT) p
from tablename
WHERE dt='2020-03-08'
and price IS NOT NULL
DISTRIBUTE BY title
SORT BY p desc
LIMIT 10
;
The result is the same as the right answer!
Here are my questions:
1.Why only return 10 lines? There are 5 reducer, each reducer returns 10, should be 5*10=50?
2.If should return 10 lines, why the result is global ordering? This 10 line is not from the same reducer ? The limit is random, it cannot get global order in 5 reducer.
3.If should return 10 lines, the outer part in the searched clause is redundant?
select * from
(
) t
order by t.p desc
limit 10;
Consider using total order partitioner, see https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad#HBaseBulkLoad-PrepareRangePartitioning for details (just ignore part with HBase)

how put result of a select query to a function in postgresql

I have a function named get_open_profit that calculated some data.
input of this function does not work properly.
I have a table named results that if we querying on it the result is :
select sum_buy_trades from results order by sum_buy_trades limit 1 : 274
select total_avg_buy from results order by sum_buy_trades limit 1 : 2019746
when I write function like this
select get_open_profit(274, 2019746) result is : 30192700
But if write like this I got error
select get_open_profit(select sum_buy_trades from results order by sum_buy_trades limit 1, select total_avg_buy from results order by sum_buy_trades limit 1
why it does not worked?
If you want to use scalar subqueries (that is, subqueries that return one value), then each needs their own parentheses:
select get_open_profit( (select sum_buy_trades
from results
order by sum_buy_trades
limit 1
),
(select total_avg_buy
from results
order by sum_buy_trades
limit 1
)
);
In this case, though, the query might be more naturally written as:
select get_open_profit( r.sum_buy_trades, r.total_avg_buy )
from (select sum_buy_trades, total_avg_buy
from results
order by sum_buy_trades
limit 1
) r;

Count query giving wrong column name error

select COUNT(analysed) from Results where analysed="True"
I want to display count of rows in which analysed value is true.
However, my query gives the error: "The multi-part identifier "Results.runId" could not be bound.".
This is the actual query:
select ((SELECT COUNT(*) AS 'Count'
FROM Results
WHERE Analysed = 'True')/failCount) as PercentAnalysed
from Runs
where Runs.runId=Analysed.runId
My table schema is:
The value I want for a particular runId is: (the number of entries where analysed=true)/failCount
EDIT : How to merge these two queries?
i) select runId,Runs.prodId,prodDate,prodName,buildNumber,totalCount as TotalTestCases,(passCount*100)/(passCount+failCount) as PassPercent,
passCount,failCount,runOwner from Runs,Product where Runs.prodId=Product.prodId
ii) select (cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
I tried this :
select runId,Runs.prodId,prodDate,prodName,buildNumber,totalCount as TotalTestCases,(passCount*100)/(passCount+failCount) as PassPercent,
passCount,failCount,runOwner,counts.runId,(cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs,Product
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
where Runs.prodId=Product.prodId
but it gives error.
Your problems are arising from improper joining of tables. You need information from both Runs and Results, but they aren't combined properly in your query. You have the right idea with a nested subquery, but it's in the wrong spot. You're also referencing the Analysed table in the outer where clause, but it hasn't been included in the from clause.
Try this instead:
select (cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
I've set this up as an inner join to eliminate any runs which don't have analysed results; you can change it to a left join if you want those rows, but will need to add code to handle the null case. I've also added casts to the two numbers, because otherwise the query will perform integer division and truncate any fractional amounts.
I'd try the following query:
SELECT COUNT(*) AS 'Count'
FROM Results
WHERE Analysed = 'True'
This will count all of your rows where Analysed is 'True'. This should work if the datatype of your Analysed column is either BIT (Boolean) or STRING(VARCHAR, NVARCHAR).
Use CASE in Count
SELECT COUNT(CASE WHEN analysed='True' THEN analysed END) [COUNT]
FROM Results
Click here to view result
select COUNT(*) from Results where analysed="True"