BigQuery - Ethereum dataset. smart contracts from ethereum balances - google-bigquery

a question for BigQuery experts. I know here is this request in order to query ethereum balances:
#standardSQL
select *
from `bigquery-public-data.crypto_ethereum.balances`
order by eth_balance desc
I rummaged through documentation and tried, but can't find how to filter it so that it gets me in responce only smart contracts, without EOA, with a filter from the largest balance to the smallest. Maybe someone knows the correct query and can prompt. Thanks

Related

Sessions count in GA4 is quite bigger than events_intraday table in bigQuery

Issue
For example, GA4's sessions number of November is 559,555 take from (Report / Acquistion / Traffic Acquistion). But if I calculate session number from bigquery table, it is 468,991.
There are big different. I guess bigquery's number close to our actual traffic and google analytics 360 number.
Actually this is start from we implemented ecommerce event in our site. But we not sure this is related or not.
question
GA4's screen number and data in bigquery should be same (or close)??
How can we solve this issue? We would like to have close number.
FYI
We used this for calculating session number in bigquery.
SELECT
HLL_COUNT.EXTRACT(
HLL_COUNT.INIT(
CONCAT(
user_pseudo_id,
(SELECT `value` FROM UNNEST(event_params) WHERE key = 'ga_session_id' LIMIT 1).int_value),
12)) AS session_count,
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
https://developers.google.com/analytics/blog/2022/hll
I'd really appreciated it if you guys give us an advice.

How to get count of active users grouped by version? (from Firebase using BigQuery)

Problem description
I'm trying to get the information of how many active users I have in my app separated by the 2 or 3 latest versions of the app.
I've read some documentations and other stack questions but none of them was solving my problem (and some others had outdated solutions).
Examples of solutions I tried:
https://support.google.com/firebase/answer/9037342?hl=en#zippy=%2Cin-this-article (N-day active users - This solution is probably the best, but even changing the dataset name correctly and removing the _TABLE_SUFFIX conditions it kept returning me a single column n_day_active_users_count = 0 )
https://gist.github.com/sbrissenden/cab9bd3a043f1879ded605cba5005457
(this is not returning any values for me, didn't understand why)
How can I get count of active Users from google analytics (this is not a good fit because the other part of my job is already done and generating charts on Data Studio, so using REST API would be harder to join my two solutions - one from BigQuery and other from REST API)
Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export (this one uses outdated variables)
So, I started to write the solution out of my head, and this is what I get so far:
SELECT
user_pseudo_id,
app_info.version,
ROUND(COUNT(DISTINCT user_pseudo_id) OVER (PARTITION BY app_info.version) / SUM(COUNT(DISTINCT user_pseudo_id)) OVER (), 3) AS adoption
FROM `projet-table.events_*`
WHERE platform = 'ANDROID'
GROUP BY app_info.version, user_pseudo_id
ORDER BY app_info.version
Conclusions
I'm not sure if my logic is correct, but I think I can use user_pseudo_id to calculate it, right? The general idea is: user_of_X_version/users_of_all_versions.
(And the results are kinda close to the ones showing at Google Analytics web platform - I believe the difference is due to the date that I turned on the BigQuery integration. But.... I'd like some confirmation on that: if my logic is correct).
The biggest problem in my code now is that I cannot write it without grouping by user_pseudo_id (Because when I don't BigQuery says: "SELECT list expression references column
user_pseudo_id which is neither grouped nor aggregated at [2:3]") and that's why I have duplicated rows in the query result
Also, about the first link of examples... Is there any possibility of a record with engagement_time_msec param with value < 0? If not, why is that condition in the where clause?

How to get the total amount sent to an Etherum address

I'm a noob at Ethereum and need to make statistics from blockchain data.
I have a full node up and running with RPC ready, but can't find a clue in the doc on how to query :
Total number of transactions in/out for a contract/address
Total number of ether sent/received for a contract/address
The only solution I found was to parse all blocks and transactions in a double for loop and the performance is way too slow to do this.
Can someone help me ?
TL;DR : How do you do getTotalAmountSentTo(address) and getTotalNumberOfTransactionsFor(address) in Ethereum?
If you are using web3JS you can use web3.eth.filter function to find all the transactions you are interested in.

Confirmation of how to calculate bigquery query costs

I want to double check what I need to look at when assessing query costs for BigQuery. I've found the quoted price per TB here which says $5 per TB, but for precisely 1 TB of what? I have been assuming up until now (before it seemed to matter) that the relevant number would be that which the BigQuery UI outputs above the results, so for this example query:
...in this case 2.34GB. So as a fraction of a terabyte and multiplied by $5 this would cost around 1.2cents assuming I'd used up my allowance for the month.
Can anyone confirm that I'm correct? Checking this before I process something I think could rack up some non-negligible costs for once. I should say I've never been stung with a sizeable BigQuery bill before it seems difficult to do.
Can anyone confirm that I'm correct?
Confirmed
Please note - BigQuery UI in fact uses DryRun which only estimates Total Bytes Processed . The final cost is based on Bytes Billed which reflects some nuances - minimum 10MB per each table involved in query as an example. You can see more details here - https://cloud.google.com/bigquery/pricing#on_demand_pricing
I know I am late but this might help you.
If you are pushing your audit logs to another dataset, you can do below on that dataset.
WITH data as
(
SELECT
protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent
FROM
`administrative-audit-trail.gcp_audit_logs.cloudaudit_googleapis_com_data_access_20190227`
)
SELECT
principalEmail,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
data
WHERE
jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY principalEmail
ORDER BY Estimated_USD_Cost DESC
Reference: https://cloud.google.com/bigquery/docs/reference/auditlogs/

Is Bigtable (or BigQuery) the right platform for correlation analysis of logs?

I'm faced with the challenge of analysing different system logfiles based on following requirements:
several hundred systems
millions of logs every day in different formats
Beside many other objectives my biggest challenge is a realtime correlation analysis of all incoming logs on all current system logs and also on partially historical log events.
Currently we're focusing on MongoDB, ElasticSearch, Hadoop, ... to meet this challenge.
On the other hand I've read some interesting things about Google Bigtable and Bigquery.
So my question is, is Bigtable and/or Bigquery a solution worth looking at, in order to do this realtime analysis ?
I've no experience with these two products, so I'm hoping for some tips whether these Google solutions could be an alternative for my requirements.
THX & BR
bdriven
EDIT:
too broad. you need to show actual analisis you need to make. bigquery will be much much cheaper that homemade with nosql
Our goal is, to develop a system, which is able to generate warnings based on current log events (or a combination of different log events) and their past interactions on other systems behavior.
Therefore we have to be able to do fast correlation analysis for current events against huge amounts of unstructured historical data.
I know that this requirement description is probably not the most specific one, but we're right at the beginning of this project.
So my goal with this question is to get some arguments for our next team meeting, whether we should consider to take a closer look at Bigtable / Bigquery or not.
One of my favorite features of BigQuery is its ability to run correlations.
Here's a correlations with BigQuery tutorial I wrote a couple years ago: http://nbviewer.ipython.org/gist/fhoffa/6459195
For example, to rank and find the most correlated airports in terms of flight delays:
SELECT a.departure_state, b.departure_state, corr(a.avg, b.avg) corr, COUNT(*) c
FROM
(SELECT date, departure_state, AVG(departure_delay) avg , COUNT(*) c
FROM [bigquery-samples:airline_ontime_data.flights]
GROUP BY 1,2 HAVING c > 5
) a
JOIN
(SELECT date, departure_state ,
AVG(departure_delay) avg, COUNT(*) c FROM [bigquery-samples:airline_ontime_data.flights]
GROUP BY 1,2 HAVING c > 5 ) b
ON a.date=b.date
WHERE a.departure_state < b.departure_state
GROUP EACH BY 1, 2
HAVING c > 5
ORDER BY corr DESC;
Try it yourself in the next 5 minutes! A quick getting started tutorial: https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_50_billion_wikipedia_pageviews_in_5/