Metrics available in BigQuery data from Ethereum

Metrics available in BigQuery data from Ethereum - google-bigquery

I want to find metrics available in BigQuery data from the Ethereum blockchain. Are any of the following available in BigQuery?
numbers of transactions on a blockchain
number of addresses
different types of oracles
possibility to count smart contracts
number of smart contract dedicated to Chainlink

Check out the BigQuery public datasets.
Here is a dataset for ethereum, there is also one for ethereum classic.
https://console.cloud.google.com/marketplace/product/ethereum/crypto-ethereum-blockchain?project=ez-dbt-test
It includes tables for balances, blocks, logs, contracts, transactions and more. They appear to be raw data so you would have to generate the definition of the metrics you are referring to via sql. They do however give some examples in the link above.
One word of caution though, understand bigquery pricing before you query these tables. Some are large enough to start racking up a decent bill quickly.

Related

If I pull the data from BiqQuery, will Google charge me or not for sending the data to Data Studio?

If I pull the data from BiqQuery, will Google charge me or not for sending the data to Data Studio?

That depends. BigQuery is a consumption based model, unless you purchased slots. What that means is, any time you query you're utilizing resources and then getting charged at their defined rate, $5 per TB of data scanned.
There are a few caveats to that however one being that the first TB of data scanned per month is free, and not every query issued will scanned data as it may use cache. If you are concerned about the associated cost one option would be to utilize the BigQuery sandbox. It has limited functionality but will not charge you, however there are limitations.
https://cloud.google.com/bigquery/docs/quickstarts/quickstart-cloud-console

BigQuery runs queries that you pay.
DataStudio runs queries on BigQuery that you pay.
There is no cost of transfer between the two systems.

Custom Dataflow Template - BigQuery to CloudStorage - documentation? general solution advice?

I am consuming a BigQuery table datasource. It is 'unbounded' as it is updated via a batch process. It contains session keyed reporting data from server logs where each row captures a request. I do not have access to the original log data and must consume the BigQuery table.
I would like to develop a custom Java based google Dataflow template using beam api with the goals of :
collating keyed session objects
deriving session level metrics
deriving filterable window level metrics based on session metrics, e.g., percentage of sessions with errors during previous window and percentage of errors per filtered property, e.g., error percentage per device type
writing the result as a formatted/compressed report to cloud storage.
This seems like a fairly standard use case? In my research thus far, I have not yet found a perfect example and still have not been able to determine the best practice approach for certain basic requirements. I would very much appreciate any pointers. Keywords to research? Documentation, tutorials. Is my current thinking right or do I need to consider other approaches?
Questions :
beam windowing and BigQuery I/O Connector - I see that I can specify a window type and size via beam api. My BQ table has a timestamp field per row. Am I supposed to somehow pass this via configuration or is it supposed to be automagic? Do I need to do this manually via a SQL query somehow? This is not clear to me.
fixed time windowing vs. session windowing functions - examples are basic and do not address any edge cases. My sessions can last hours. There are potentially 100ks plus session keys per window. Would session windowing support this?
BigQuery vs. BigQueryClientStorage - The difference is not clear to me. I understand that BQCS provides a performance benefit, but do I have to store BQ data in a preliminary step to use this? Or can I simply query my table directly via BQCS and it takes care of that for me?

For number 1 you can simply use a withTimestamps function before applying windowing, this assigns the timestamp to your items. Here are some python examples.
For number 2 the documentation states:
Session windowing applies on a per-key basis and is useful for data that is irregularly distributed with respect to time. [...] If data arrives after the minimum specified gap duration time, this initiates the start of a new window.
Also in the java documentation, you can only specify a minimum gap duration, but not a maximum. This means that session windowing can easily support hour-lasting sessions. After all, the only thing it does is putting a watermark on your data and keeping it alive.
For number 3, the differences between the BigQuery IO Connector and the BigQuery storage APIs is that the latter (an experimental feature as of 01/2020) access directly data stored, without the logical passage through BigQuery (BigQuery data isn't stored in BigQuery). This means that with storage APIs, the documentation states:
you can't use it to read data sources such as federated tables and logical views
Also, there are different limits and quotas between the two methods, that you can find in the documentation link above.

How to download a lot of data from Bloomberg?

I am trying to download as much information from Bloomberg for as many securities as I can. This is for a machine learning project, and I would like to have the data reside locally, rather than querying for it each time I need it. I know how to download information for a few fields for a specified security.
Unfortunately, I am pretty new to Bloomberg. I've taken a look at the excel add-in, and it doesn't allow me to specify that I want ALL securities and ALL their data fields.
Is there a way to blanket download data from Bloomberg via excel? Or do I have to do this programmatically. Appreciate any help on how to do this.

Such a request in unreasonable. Bloomberg has dozens of thousands of fields for each security. From fundamental fields like sales, through technical analysis like Bollinger bands and even whether CEO is a woman and if the company abides by Islamic law. I doubt all of these interest you.
Also, some fields come in "flavors". Bloomberg allows you to set arguments when requesting a field, these are called "overrides". For example, when asking for an analyst recommendation, you could specify whether you're interested in yearly or quarterly recommendation, you could also specify how do you want the recommendation consensus calculated? Are you interested in GAAP or IFRS reporting? What type of insider buys do you want to consider? I hope I'm making it clear, the possibilities are endless.
My recommendation is, when approaching a project like you're describing: think in advance what aspects of the security do you want to focus on? Are you looking for value? growth? technical analysis? news? Then "sit down" with a Bloomberg rep and ask what fields apply to this aspect. Then download those fields.
Also, try to reduce your universe of securities. Bloomberg has data for hundreds of thousands of equities. The total number of securities (including non equities) is probably many millions. You should reduce that universe to securities that interest you (only EU? only US? only above certain market capitalization?). This could make you research more applicable to real life. What I mean is that if you find out that certain behavior indicates a stock is going to go up - but you can't buy that stock - then that's not that interesting.
I hope this helps, even if it doesn't really answer the question.

They have specific "Data Licence" products available if you or your company can fork out the (likely high) sums of money for bulk data dumps. Otherwise, as has been mentioned, there are daily and monthly restrictions on how much data (and what type of data) is downloaded via their API. These limits are not very high at all and so, by the sounds of your request, this will take a long and frustrating time. I think the daily limits are something like 500,000 hits, where one hit is one item of data, e.g. a price for one stock. So if you wanted to download only share price data for the 2500 or so US stocks, you'd only managed 200 days for each stock before hitting the limit. And they also monitor your usage, so if you were consistently hitting 500,000 each day - you'd get a phone call.
One tedious way around this is to manually retrieve data via the clipboard. You can load a chart of something (GP), right click and copy data to clipboard. This stores all data points that are on display, which you can dump in excel. This is obviously an extremely inefficient method but, crucially, has no impact on your data limits.
Unfortunately you will find no answer to your (somewhat unreasonable) request, without getting your wallet out. Data ain't cheap. Especially not "all securities and all data".

You say you want to download "ALL securities and ALL their data fields." You can't.
You should go to WAPI on your terminal and look at the terms of service.
From the "extended rules:"
There is a daily limit to the number of hits you can make to our data servers via the Bloomberg API. A "hit" is defined as one request for a singled security/field pairing. Therefore, if you request static data for 5 fields and 10 securities, that will translate into a total of 50 hits.
There is a limit to the number of unique securities you can monitor at any one time, where the number of fields is unlimited via the Bloomberg API.
There is a monthly limit that is based on the volume of unique securities being requested per category (i.e. historical, derived, intraday, pricing, descriptive) from our data servers via the Bloomberg API.

How can I retrieve data from SAP EWM from the query 0WM_MP17_Q0001?

I wanted to know how one can retrieve data from the various query tools available in SAP EWM.
I found the queries in the following link: Extended Warehouse Management - SAP Library
The description of the query 0WM_MP17_Q0001 says:
0WM_MP17_Q0001
You can use this query to see the number and duration of confirmed warehouse orders by day, week, or month. This allows you to see when typical warehouse trends are changing, and thus take actions such as:
Adjusting work schedules to meet demands
Hiring new workers, or letting existing workers go
Requesting budget for expenses such as extra equipment
And I need to retrieve the data for the reasons above.
However, is there a transaction code that I can run to get this report? How can I retrieve this data?

I think you already asked this question on SDN and got a response, see your message and response.
This is BI content.

Data warehouse, data update strategy with Bigquery

We have a MIS where stores all the information about Customers, Accounts, Transactions and etc. We are building a data warehouse with BigQuery.
I am pretty new on this topic, Should we
1. everyday extract ALL the customer's latest information and append them to a BigQuery table with timestamp,
2. or we only extract the updated customer's information on that day?
First solution uses a lot of storage and takes time to upload data, and got lots of duplicates. But it's very clear for me to run query. For 2nd solution, given a specific date how can I get the latest record for that day?
Similar for Account data, an example of simplified Account table, only 4 fields here.
AccountId, CustomerId, AccountBalance, Date
If I need to build a report or graphic of a group of customers' AccountBalance everyday, I need to know the balance of each account on every specific date. So should I extract each account record everyday, even it's the same as last day, or I can only extract the account when the balance changed?
What is the best solution or your suggestion? I prefer the 2nd one because there are no duplicates, but how can I construct the query in BigQuery, will performance be an issue?
What else should I consider? Any recommendation for me to read?

When designing DWH you need to start from business questions, translate them to KPIs, measures, dimensions, etc.
When you have those in place...
you chose technology based on some of the following questions (and many more):
who are your users? in what frequency and what resolutions they consume the data? what are your data sources? are they structured? what are the data volumes? what is your data quality? how often your data structure changes? etc.
when choosing technology you need to think of the following: ETL, DB, Scheduling, Backup, UI, Permissions management, etc.
after you have all those defined... data schema design is pretty straight forward and is derived from "The purpose of the DWH" and your technology limits.
You have pointed out some of the points to consider, but the answer is based of your needs... and is not related to specific DB technology.
I am afraid your question is too general to be answered without deep understanding of your needs.
Referring to your comment bellow:
How reliable is your source data? Are you interested in the analyzing trends or just snapshots? Does your source system allow "Select all" operations? what are the data volumes? What resources does your source allow for extraction (locks, bandwidth, etc.)?
If you just need a daily snapshot of the current balance, and there are no limits by your source system,
it would be much simpler to run a daily snapshot.
this way you don't need to manage "increments", handle data integrity issues and systems discrepancies etc. however, this approach might have undesired impact on your source system, and your network costs...
If you do have resources limits, and you chose the incremental ETL approach, you can either
create a "Changes log" table and query it, you can use row_number()
in order to find latest record per account.
or yo can construct a copy of the source accounts table, merging
changes everyday to an existing table.
each approach has its own aspect of simplicity, costs, and resource consumption...
Hope this helps

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas