EXTERNAL_QUERY suddenly started to return BYTE value instead of STRING - google-bigquery

I'm using Query which joins external data through EXTERNAL_QUERY() LIKE THIS
(this is just example, not actual one)
SELECT
ext.program_id,
SUM(price) AS total_price
FROM a_dataset.purchases pcs
LEFT OUTER JOIN (
SELECT
program_id,
version
FROM EXTERNAL_QUERY(
'CONNECTION_INFO',
'SELECT program_id, version FROM products'
)
) ext ON pcs.program_id = ext.program_id
This query actually worked at my environment.
However, from today, this part ↓
EXTERNAL_QUERY(
'CONNECTION_INFO',
'SELECT program_id, version FROM products'
)
starts to return byte value which looks like encrypted and
turns out to show this message
No matching signature for operator = for argument types: STRING, BYTES. Supported signatures: ANY = ANY at [37:9]
'CONNECTION_INFO' refers Cloud SQL, read replica instance of MySQL.
Do you have any ideas how to fix this, or why these return values started to changed ?

Related

How to get From & To Ip Address from CIDR BigQuery

BigQuery provides updated geoip2 public dataset here [bigquery-publicdata -> geolite2 -> ipv4_city_blocks] which contains network column with IPv4 CIDR values.
How do I convert the CIDR values in the network column via BigQuery SQL (and not via a utility outside BigQuery) into start & end ip-address values so that I can find if an IP address is within a range or no? Would be helpful if you can provide the query to obtain the range ips for a CIDR value in the table.
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION cidrToRange(CIDR STRING)
RETURNS STRUCT<start_IP STRING, end_IP STRING>
LANGUAGE js AS """
var beg = CIDR.substr(CIDR,CIDR.indexOf('/'));
var end = beg;
var off = (1<<(32-parseInt(CIDR.substr(CIDR.indexOf('/')+1))))-1;
var sub = beg.split('.').map(function(a){return parseInt(a)});
var buf = new ArrayBuffer(4);
var i32 = new Uint32Array(buf);
i32[0] = (sub[0]<<24) + (sub[1]<<16) + (sub[2]<<8) + (sub[3]) + off;
var end = Array.apply([],new Uint8Array(buf)).reverse().join('.');
return {start_IP: beg, end_IP: end};
""";
SELECT network, IP_range.*
FROM `bigquery-public-data.geolite2.ipv4_city_blocks`,
UNNEST([cidrToRange(network)]) IP_range
It took about 60 sec to process all 3,037,858 rows with result like below
This query will do the job:
# replace with your source of IP addresses
# here I'm using the same Wikipedia set from the previous article
WITH source_of_ip_addresses AS (
SELECT REGEXP_REPLACE(contributor_ip, 'xxx', '0') ip, COUNT(*) c
FROM `publicdata.samples.wikipedia`
WHERE contributor_ip IS NOT null
GROUP BY 1
)
SELECT city_name, SUM(c) c, ST_GeogPoint(AVG(longitude), AVG(latitude)) point
FROM (
SELECT ip, city_name, c, latitude, longitude, geoname_id
FROM (
SELECT *, NET.SAFE_IP_FROM_STRING(ip) & NET.IP_NET_MASK(4, mask) network_bin
FROM source_of_ip_addresses, UNNEST(GENERATE_ARRAY(9,32)) mask
WHERE BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(ip)) = 4
)
JOIN `fh-bigquery.geocode.201806_geolite2_city_ipv4_locs`
USING (network_bin, mask)
)
WHERE city_name IS NOT null
GROUP BY city_name, geoname_id
ORDER BY c DESC
LIMIT 5000`
Find more details on:
https://towardsdatascience.com/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds-e9e652480bd2
The first thing you need to check is, if that function already exists, so please refer to the BigQuery Functions and Operators documentation.
If not, you need to use Standard SQL User-Defined Functions (UDF), which lets you create a function using another SQL expression or another programming language, such as JavaScript.
Keep in mind when using UDF JavaScript function, BigQuery initializes a JavaScript environment with the function's contents on every shard of execution. There is no optimization to avoid loading the environment, so it can slow down the query.
Regarding to GeoIP2 City and Country CSV Databases site, there is a utility to convert 'network' column to start/end IPs or start/end integers. Refer to Github site for details.
January 2023 solution
Just wanted to respond to Felipe's comment here. I'm not sure why he is suggesting an alternate solution using Snowflake, as his existing solution works just fine. The only difference is that you need to create the dataset yourself.
I managed to solve this by going through the exact same steps listed in Felipe's very helpful original blog article:
Sign-up to MaxMind and download the Geolite2 databases (link)
Download the two CSV files GeoLite2-City-Blocks-IPv4.csv and GeoLite2-City-Locations-en.csv, upload them to a GCP bucket, and create tables from them. I lazily used the BQ automated schema feature and it worked just fine :)
Simply create a geolite2_locs table using a query similar to the one below (just keep or drop your columns as required for your use-case)
CREATE OR REPLACE TALBLE `dataset.geolite2_locs` OPTIONS() AS (
SELECT
ip_ref.network,
NET.IP_FROM_STRING(REGEXP_EXTRACT(ip_ref.network, r'(.*)/' )) network_bin,
CAST(REGEXP_EXTRACT(ip_ref.network, r'/(.*)' ) AS INT64) mask,
ip_ref.geoname_id,
city_ref.continent_name as continent_name,
city_ref.country_name as country_name,
city_ref.city_name as city_name,
city_ref.subdivision_1_name as subdivision_1_name,
city_ref.subdivision_2_name as subdivision_2_name,
ip_ref.latitude as latitude,
ip_ref.longitude as longitude,
FROM `geolite2`.`geolite2-ipv4` ip_ref LEFT JOIN `geolite2`.`geolite2-city-en` city_ref USING (geoname_id)
);
Adapt the query in Felipe's guide or just replace the fh-bigquery.geocode.201806_geolite2_city_ipv4_locs with your new table in his answer above.
Should take you at max 1 hour to get this going. Hope it helps.

Azure Stream analytics default field values for missing fields

I have some json values coming in from an IOT datasource to stream analytics. They want to change the json in a later version to have extra fields but older versions will not have these fields. Is there a way I can detect the field is missing and set up a default value for it before it gets to the output? for example they would like to add an e.OSversion which if it did not exist would default to "unknown". The output is a sql database as it happens.
WITH MetricsData AS
(
SELECT * FROM [MetricsData]
PARTITION BY LID
WHERE RecordType='UseList'
)
SELECT
e.LID as LID
,e.EventEnqueuedUtcTime AS SubmitDate
,CAST (e.UsedDate as DateTime) AS UsedDate
,e.Version as Version
,caUsedList.ArrayValue.Module AS Module
,caUsedList.ArrayValue.UsageCount AS UsedCount
INTO
[ModuleUseOutput]
FROM
Usagedata as e
CROSS APPLY getElements (e.UsedList) as caUsedList
Please use case..when.. operator.
Example:
select j.id, case when j.version is null then 'unknown' else j.version end as version
from jsoninput as j
Output:
Or you could just set the default value in the sql database column directly.

How to Join (equal) two data columns that belongs to the String and Double types respectively in Alibaba MaxCompute?

I am not authorized to share the table details.
For instance, let me consider an Example:
I am trying to join two columns of String and Double data types respectively in Alibaba MaxCompute.
In the earlier version of MaxCompute, the String and Double data types are converted to the bigint data type at the cost of precision. 1.1 = “1” in a Join condition.
Whereas the same code does not work in the new version of the MaxCompute. The code syntax is like follows:
SELECT * FROM t1 JOIN t2 ON t1.double_value = t2.string_value;
Error:
WARNING:[1,48] implicit conversion from STRING to DOUBLE, potential data loss, use CAST function to suppress
What is the correct syntax to do the join operation in Alibaba MaxCompute V2?
I did a bit of digging and it seems like this SQL command is the recommended way of getting around this issue.
select * from t1 join t2 on t.double_value = cast(t2.string_value as double);
As the error message suggests:
SELECT *
FROM t1 JOIN
t2
ON CAST(t1.double_value as string) = t2.string_value;

Firebird - How to use "(? is null)" for selecting blank parameters

I am working with an Excel Report linked to a Firebird 2.0 DB and I have various parameters linked to cell references that correspond to drop down lists.
If a parameter is left blank, I want to select all the possible options. I am trying to accomplish this by putting ... WHERE... (? is null), as described in http://www.firebirdsql.org/refdocs/langrefupd25-sqlnull.html , but I get an "Invalid Data Type" error.
I found some Firebird documentation (http://www.firebirdfaq.org/faq92/) where it talks about this error, but it states that "The solution is to cast the value to appropriate datatype, so that all queries return the same datatype for each column." and I'm not quite sure what that means in my situation.
SELECT C.COSTS_ID,
C.AREA_ID,
S.SUB_NUMBER,
S.SUB_NAME,
TP.PHASE_CODE,
TP.PHASE_DESC,
TI.ITEM_NUMBER,
TI.ITEM_DESC,
TI.ORDER_UNIT,
C.UNIT_COST,
TI.TLPE_ITEMS_ID
FROM TLPE_ITEMS TI
INNER JOIN TLPE_PHASES TP ON TI.TLPE_PHASES_ID = TP.TLPE_PHASES_ID
LEFT OUTER JOIN COSTS C ON C.TLPE_ITEMS_ID = TI.TLPE_ITEMS_ID
LEFT OUTER JOIN AREA A ON C.AREA_ID = A.AREA_ID
LEFT OUTER JOIN SUPPLIER S ON C.SUB_NUMBER = S.SUB_NUMBER
WHERE ((C.AREA_ID = 1 OR C.AREA_ID = ?) OR **(? IS NULL))**
AND ((S.SUB_NUMBER = ?) OR **(? IS NULL))**
AND ((TI.ITEM_NUMBER = ?) OR **(? IS NULL))**
AND ((TP.PHASE_CODE STARTING WITH ?) OR **(? IS NULL))**
ORDER BY TP.PHASE_CODE
Any help is greatly appreciated.
If you are not using Firebird 2.5 (but version 2.0 or higher), or if you are using a driver that doesn't support the SQL_NULL datatype introduced in Firebird 2.5, then you need to use an explicit CAST, eg;
SELECT *
FROM TLPE_ITEMS TI
WHERE TI.ITEM_NUMBER = ? OR CAST(? AS INTEGER) IS NULL
This will identify the second parameter as an INTEGER to the driver (and to Firebird), allowing you to set it to NULL.
Now the faq you reference mentions cast the value to appropriate datatype, what they mean is that you should not cast to a data type that might result to conversion errors if it isn't null.
In my example I cast to INTEGER, but if the values are actually strings and you use say "IX0109302" as a value, you will get a conversion error as it isn't an appropriate INTEGER. To prevent that, you would need to cast to a (VAR)CHAR of sufficient length (otherwise you get a truncation error).
If you are using Firebird 1.5 or earlier this trick will not work, see CORE-778, in that case you might get away with something like TI.ITEM_NUMBER = ? OR 'T' = ?, where you set the second parameter to either 'T' (true) or 'F' (false) to signal whether you want everything or not; this means that you need to move the NULL detection to your calling code.

Querying Netezza via SquirrelSQL returns WKT geometry in unknown encoding

I am using SquirrelSQL to write and execute SQL queries on a Netezza database. Using Netezza's spatial capabilities (which are essentially the same as those of PostGIS) I've executed a query and returned a single result that contains a geometry. Here's the query, for reference:
SELECT t.SHAPE
FROM (SELECT * FROM OS_AB_PLUS..E12_ADDRESSBASE WHERE POSTCODE = 'RH1 6NE'
AND PAO_START_NUMBER = '14') as a, OS_TOPO..TOPOGRAPHICAREA as t
WHERE inza..ST_Within(a.shape, t.shape) = TRUE
My issue is that the geometry field, which should contain the polygon coordinates represented as Well-Known Text (WKT), looks instead like this:
g¹ AË Affff¬0AÍÌÌÌî0AÒ 3333Ê AÍÌÌÌî0A» Aë0Afffæ» AffffÒ0A¹ AÒ0A333³¹ A3333¿0AŒ AffffÀ0AÍÌÌLŒ Affff¬0AË A¯0AëQ8Ê A3333í0A3333Ê AÍÌÌÌî0A
I can't seem to find anywhere in SquirrelSQL to specify the encoding of VARCHAR columns, and I've seen the column returned without encoding issues in Aginity (another SQL client). Any suggestions on how to proceed would be much appreciated.
Turns out my issue was not really related to encoding at all. The human-readable version of the geometry in a PostGIS-like database will only be returned when ST_AsText is used in the select statement. So my SQL query becomes:
SELECT inza..ST_AsText(t.SHAPE)
FROM (SELECT * FROM OS_AB_PLUS..E12_ADDRESSBASE WHERE POSTCODE = 'RH1 6NE'
AND PAO_START_NUMBER = '14') as a, OS_TOPO..TOPOGRAPHICAREA as t
WHERE inza..ST_Within(a.shape, t.shape) = TRUE
Which returns. as intended:
POLYGON ((526696.15 148931.9, 526703.94 148932.34, 526703.8 148935.2, 526705.5 148935.3, 526705.4 148937.8, 526695.9 148937.35, 526696.15 148931.9))