Bitcoin in BigQuery: blockchain analytics on public data wrong ommited results - google-bigquery

I am trying to use the Bitcoin in BigQuery to extract bitcoin transactions related to some addresses.
I tried the below query to retrieve this information, but I always get empty results.
SELECT
timestamp,
inputs.input_pubkey_base58 AS input_key,
outputs.output_pubkey_base58 AS output_key,
outputs.output_satoshis as satoshis
FROM `bigquery-public-data.bitcoin_blockchain.transactions`
JOIN UNNEST (inputs) AS inputs
JOIN UNNEST (outputs) AS outputs
WHERE outputs.output_pubkey_base58 = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
AND outputs.output_satoshis >= 0
AND inputs.input_pubkey_base58 IS NOT NULL
AND outputs.output_pubkey_base58 IS NOT NULL
GROUP BY timestamp, input_key, output_key, satoshis
Also, when I change the address to one with more transactions, I get results but with some transactions omitted.
I do not know if I am writing something wrong or what. Can anyone help, please?
Thanks
I saw a similar question in a previous post and tried what was suggested but it did not work:
BigQuery Blockchain Dataset is Missing Data?
I am expecting to get 3 address when trying
WHERE outputs.output_pubkey_base58 = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
and 2 addresses when chenge the condition to consider the address in the input side

I am expecting to get 3 address when trying WHERE outputs.output_pubkey_base58 = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
Your expectations are correct, but the issue here is that the dataset you are using is outdated and has been migrated to bigquery-public-data.crypto_bitcoin. Updates to the data are being sent to the new version of this dataset, whose schema is better aligned with our other cryptocurrency offerings.
To get started run below to see that expected data is there
#standardSQL
SELECT COUNT(*)
FROM `bigquery-public-data.crypto_bitcoin.transactions`,
UNNEST(outputs) AS output,
UNNEST(output.addresses) AS address
WHERE address = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
with output
Row f0_
1 3

Related

BigQuery UNNEST and JOIN the result from a remote function search function using user query from datastudio

I am trying to implement a custom text search in lookerstudio (formerly datastudio) dashboard using a custom SQL query as the datasource and a paramater which will be a sentence to search on.
The sentence will be passed to a BQ remote function and the cloud function will return matching results.
So far I have mocked the cloud function to return a string of matching IDs as the BQ remote function expects the result length to match the call length.
'{"replies":["ID1,ID2,ID3"]}'
I have tried the following to get the results back initially:
#standardSQL
WITH query AS(SELECT "test sentence query" AS user_query)
SELECT
S.Description,
SPLIT(`data`.search_function(user_query)) as ID
FROM query
LEFT JOIN `data.record_info` AS S
ON ID = S.ID
The SPLIT IDs are coming out into 1 row ID (when I run the query without the left join). In addition I can't seem to get it unnested and the description column pulled in, I get the error:
Expecting 14552 results but got back 1
Is this method of search in datastudio going to be possible?
Posting this here in case anyone else needs a solution to this problem
WITH Query AS(SELECT "test sentence query" AS user_query)
SELECT
S.Description,
ID
FROM
Query,
UNNEST(SPLIT(`data`.search_function(user_query))) as ID
LEFT JOIN `data.record_info` AS S
ON ID = S.ID
The main difference here is the need for the inclusion of the UNNEST function since SPLIT won't separate the input into multiple rows, even if it appears to do so.

Looking for guidance on my sql query that apparently includes an array

Quite new to sql, and looking for help on what i'm doing wrong.
With the code below, i'm getting the error "cannot access field value on a value with type array<struct> at [1:30]"
The "audience size value" comes from the dataset public_campaigns where as the engagement rate comes from the data set public_instagram_channels
I think the dataset that's causing the issue here is the public_campaigns.
thanks in advance for your help!
SELECT creator_audience_size.value, AVG(engagement_rate/1000000) AS avgER
FROM `public_instagram_channels` AS pic
JOIN `public_campaigns`AS pc
ON pic.id=pc.id
GROUP BY creator_audience_size.value
This is to do with the type of one of the columns using REPEATED mode.
In Google BigQuery you have to use UNNEST on these repeated columns to get their individual values in the result set.
It's unclear from what you've posted which column is the repeated type - looking at the table definition for public_instagram_channels and public_campaigns will reveal this - look for the word REPEATED in the Mode column of the table definition.
Once you've found it, include UNNEST in your query, as per this untested example:
SELECT creator_audience_size.value, AVG(engagement_rate/1000000) AS avgER
FROM `public_instagram_channels` AS pic,
UNNEST(`column_name`) AS whatever_you_want
JOIN `public_campaigns`AS pc ON pic.id = pc.id
GROUP BY creator_audience_size.value

Quick one on Big Query SQL-Ecommerce Data

I am trying to replicate the Google Analyitcs data in Big Query but couldnt do that.
Basically I am using Custom Dimension 40 (user subscription status)
but I am getting wrong numbers in BQ.
Can someone help me on this?
I am using this query but couldn't find it out the exact one.
SELECT
(SELECT value FROM hits.customDimensions where index=40) AS UserStatus,
COUNT(hits.transaction.transactionId) AS Unique_Purchases
FROM
`xxxxxxxxxxxxx.ga_sessions_2020*` AS GA, --new rollup
UNNEST(GA.hits) AS hits
WHERE
(SELECT value FROM hits.customDimensions where index=40) IN ("xx001","xxx002")
GROUP BY 1
I am getting this from big query which is wrong.
I have check out the dates also but dont know why its wrong.
Your question is rather unclear. But because you want something to be unique and numbers are mysteriously not what you want, I would suggest using COUNT(DISTINCT):
COUNT(DISTINCT hits.transaction.transactionId) AS Unique_Purchases
As far as I understand, you imported Google Analytics data into Bigquery and you are trying to group the custom dimension with index 40 and values ("xx001","xxx002") in order to know how many hit transactions were performed in function of these dimension values.
Replicating your scenario and trying to execute the query you posted, I got the following error.
However, I created a query that could help with your use-case. At first, it selects the transactionId and dimension values with the transactionId different from null and with index value equal to 40, then the grouping is done by the dimension value, filtered with values equals to "xx001"&"xxx002".
WITH tx AS (
SELECT
HIT.transaction.transactionId,
CD.value
FROM
`xxxxxxxxxxxxx.ga_sessions_2020*` AS GA,
UNNEST(GA.hits) AS HIT,
UNNEST(HIT.customDimensions) AS CD
WHERE
HIT.transaction.transactionId IS NOT NULL
AND
CD.index = 40
)
SELECT tx.value AS UserStatus, count(tx.transactionId) AS Unique_Purchases
FROM tx
WHERE tx.value IN ("xx001","xx002")
GROUP BY tx.value
For further details about the format and schema of the data that is imported into BigQuery, I found this document.

Stream Analytics Left outer join Not Producing Rows

What I am trying to do:
I want to "throttle" an input stream to its output. Specifically, as I receive multiple similar inputs, I only want to produce an output if one hasn't already been produced in the last N hours.
For example, the input could be thought of as "send an email", but I will get dozens/hundreds of those events. I only want to send an email if I haven't already sent one in the last N hours (or have never sent one).
See the final example here: https://learn.microsoft.com/en-us/stream-analytics-query/join-azure-stream-analytics#examples for something similar to what I am trying to do
What my setup looks like:
There are two inputs to my query:
Ingress: this is the "raw" input stream
Throttled-Sent: this is just a consumer group off of my output stream
My query is as follows:
WITH
AllEvents as (
/* This CTE is here only because we can't seem to use GetMetadataPropertyValue in a join clause, so "materialize" it here for use- later */
SELECT
*,
GetMetadataPropertyValue([Ingress], '[User].[Type]') AS Type,
GetMetadataPropertyValue([Ingress], '[User].[NotifyType]') AS NotifyType,
GetMetadataPropertyValue([Ingress], '[User].[NotifyEntityId]') AS NotifyEntityId
FROM
[Ingress]
),
UseableEvents as (
SELECT *
FROM AllEvents
WHERE NotifyEntityId IS NOT NULL
),
AlreadySentEvents as (
/* These are the events that would have been previously output (referenced here via a consumer group). We want to capture these to make sure we are not sending new events when an older "already-sent" event can be found */
SELECT
*,
GetMetadataPropertyValue([Throttled-Sent], '[User].[Type]') AS Type,
GetMetadataPropertyValue([Throttled-Sent], '[User].[NotifyType]') AS NotifyType,
GetMetadataPropertyValue([Throttled-Sent], '[User].[NotifyEntityId]') AS NotifyEntityId
FROM
[Throttled-Sent]
)
SELECT i.*
INTO Throttled
FROM UseableEvents i
/* Left join our sent events, looking for those within a particular time frame */
LEFT OUTER JOIN AlreadySentEvents s
ON i.Type = s.Type
AND i.NotifyType = s.NotifyType
AND i.NotifyEntityId = s.NotifyEntityId
AND DATEDIFF(hour, i, s) BETWEEN 0 AND 4
WHERE s.Type IS NULL /* The is null here is for only returning those Ingress rows that have no corresponding AlreadySentEvents row */
The results I'm seeing:
This query is producing no rows to the output. However, I believe it should be producing something because the Throttled-Sent input has zero rows to begin with. I have validated that my Ingress events are showing up (by simply adjusting the query to remove the left join and checking the results).
I feel like my problem is probably linked to one of the following areas:
I can't have an input that is a consumer group off of the output (but I don't know why that wouldn't be allowed)
My datediff usage/understanding is incorrect
Appreciate any help/guidance/direction!
For throttling, I would recommend looking at IsFirst function, it might be easier solution that will not require reading from the output.
For the current query, I think order of DATEDIFF parameters need to be changed as s comes before i: DATEDIFF(hour, s, i) BETWEEN 0 AND 4

SQL add up rows in a column

I'm running SQL queries in Orion Report Writer for Solarwinds Netflow Traffic Analyzer and am trying to add up data usage for specific conversations coming from the same general sources. In this case it is netflix. I've made some progress with my query.
SELECT TOP 10000 FlowCorrelation_Source_FlowCorrelation.FullHostname AS Full_Hostname_A,
SUM(NetflowConversationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
SUM(NetflowConversationSummary.TotalBytes) AS Total_Bytes
FROM
((NetflowConversationSummary LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Source_FlowCorrelation ON (NetflowConversationSummary.SourceIPSort = FlowCorrelation_Source_FlowCorrelation.IPAddressSort)) LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Dest_FlowCorrelation ON (NetflowConversationSummary.DestIPSort = FlowCorrelation_Dest_FlowCorrelation.IPAddressSort)) INNER JOIN Nodes ON (NetflowConversationSummary.NodeID = Nodes.NodeID)
WHERE
( DateTime BETWEEN 41539 AND 41570 )
AND
(
(FlowCorrelation_Source_FlowCorrelation.FullHostname LIKE 'ipv4_1.lagg0%')
)
GROUP BY FlowCorrelation_Source_FlowCorrelation.FullHostname, FlowCorrelation_Dest_FlowCorrelation.FullHostname, Nodes.Caption, Nodes.NodeID, FlowCorrelation_Source_FlowCorrelation.IPAddress
So I've got an output that filters everything but netflix sessions (Full_Hostname_A) and their total usage for each session (Sum_Of_Bytes_Transferred)
I want to add up Sum_Of_Bytes_Transferred to get a total usage for all netflix sessions
listed, which will output to Total_Bytes. I created the column Total_Bytes, but don't know how to output a total to it.
For some asked clarification, here is the output from the above query:
I want the Total_Bytes Column to be all added up into one number.
I have no familiarity with the reporting tool you are using.
From reading your post I'm thinking you want the the first 2 columns of data that you've got, plus at a later point in the report, a single figure being the sum of the total_bytes column you're already producing.
Your reporting tool probably has some means of totalling a column, but you may need to get the support people for the reporting tool to tell you how to do that.
Aside from this, if you can find a way of calling a separate query in a latter section of the report, or if you embed a new report inside your existing report, after the detail section, and use that to run a separate query then you should be able to get the data you want with this:
SELECT Sum(Total_Bytes) as [Total Total Bytes]
FROM ( yourExistingQuery ) x
yourExistingQuery means the query you've already got, in full (doesnt have to be put on one line), the paretheses are required, and so is the "x". (The latter provides a syntax-required name for the virtual table which your query defines).
Hope this helps.