I'm trying to cast JSON object from table column to varchar with Trino I tried with these docs here but every time throws an error. https://trino.io/docs/current/functions/json.html
If someone can post an example how I can make that it will be great.
Use json_format/json_parse to handle json object conversions instead of casting:
select json_parse('{"property": 1}') objstring_to_json, json_format(json '{"property": 2}') jsonobj_to_string
Output:
objstring_to_json
jsonobj_to_string
{"property":1}
{"property":2}
Related
I am trying to run a query in Snowflake to convert a GeoJSON object into Snowflake's baked-in geospatial data types:
SELECT id, -- some other columns
TRY_TO_GEOGRAPHY(raw_row) -- raw row is my GeoJSON object
FROM ...
The Snowflake documentation for TRY_TO_GEOGRAPHY says:
Parses an input and returns a value of type GEOGRAPHY.
This function is essentially identical to TO_GEOGRAPHY except that it returns NULL when TO_GEOGRAPHY would issue an error.
However when I run the query I get the following very uninformative error (no mention of column, record or even SQL line that produced the error):
Failed to cast variant value "null" to OBJECT
I pinpointed the TRY_TO_GEOGRAPHY to be the cause since the query works if I comment that line. Shouldn't Snowflake return NULL in this case? How can I fix the problem? And if there's faulty data in my table, is there any way for me to find the rows where the function fails (it has 9 digits row count, I can't do it manually)?
Is your raw_row of VARIANT type or VARCHAR?
The Snowflake documentation for TRY_TO_GEOGRAPHY also mentions (emphasis mine)
<variant_expression>
The argument must be an OBJECT in GeoJSON format.
So if you try to parse a variant value that is not an "OBJECT in GeoJson format" it will indeed raise an error:
SELECT
TRY_TO_GEOGRAPHY('' :: VARCHAR), -- NULL
TRY_TO_GEOGRAPHY('' :: VARIANT), -- error!
TRY_TO_GEOGRAPHY('{"geometry": null, "type": "Feature"}' :: VARCHAR), -- NULL
TRY_TO_GEOGRAPHY('{"geometry": null, "type": "Feature"}' :: VARIANT); -- error!
However you can argue the documentation is not all that clear on this issue, and there are some cases where they both act the same. Surprisingly:
SELECT
-- just removing {"type": "Feature"} from the error above
TRY_TO_GEOGRAPHY('{"geometry": null}' :: VARIANT), -- NULL
-- just removing {"geometry": null} from the error above
-- a Feature without "geometry" and "properties" is ill-defined by the spec
TRY_TO_GEOGRAPHY('{"type": "Feature"}' :: VARIANT); -- NULL
So it's unclear if it's a bug in Snowflake or if it's intended behavior. Note TRY_TO_GEOGRAPHY will work as expected if the object is a valid GeoJSON format but the geography itself is invalid. For example a polygon where edges cross each other will return null for the TRY_ version but fail with an error with TO_GEOGRAPHY.
How to fix it: the simplest way is to convert the variant column to varchar, which is a bit silly but it works. You can then query the NULL values produced and check the GeoJSON values that produced an error, hoping it's not that many to manually check them.
You may need to convert the object under key geometry.
The Geojson may have other keys, such as feature and properties. Your table can be designed with these 1st level keys as columns (in my case I just drop feature). In case the geometry is null, TRY_TO_GEOGRAPHY will return NULL. In case it is one of the others, there is no need to do a conversion (just use type OBJECT or VARIANT).
The problem seems to be getting a GEOGRAPHY data from a non-viable object as source (e.g., having the object under geometry as null).
I was trying to query the duration of a WWE wrestler in the matches which he won and here I have successfully retrieved the data:
I would like to remove those zeros and I tried the following piece of code:
strftime(%H:%M:%S, R1.time_in_match)
which doesn't work and give the following error:
': near "%": syntax error
The data I imported were from a data frame which stores the time as time object and put them into the database as type DATETIME. The typeof function used on the "time_in_match" field returned type TEXT.
I was wondering if there is any way to format the time.
Thanks!
SQLite provides 2 functions for this case:
strftime('%H:%M:%S', R1.time_in_match)
and simpler:
time(R1.time_in_match)
because:
The time() function returns the time as HH:MM:SS
(from Date And Time Functions)
See the demo.
Your expression works as intended if you surround the format specifier with single quotes:
strftime('%H:%M:%S', time_in_match)
You could also just use string functions here:
substr(time_in_match, 1, 8)
Demo on DB Fiddle
I'm trying to convert varchar WKT format to geometry using presto function ST_GeometryFromText but I get this error
Error running query: Invalid WKT: 0101000020E6100000000000407BF43E40000000203CFA3D40
The point format in the database is stored in this format 0101000020E6100000000000407BF43E40000000203CFA3D40 as varchar, i just want to convert it to a geometry point, i used to use ST_X & ST_Y in postgis but after migrating to presto these two functions aren't supported.
If you run
SELECT ST_AsText('0101000020E6100000000000407BF43E40000000203CFA3D40')
...in postgis, you will get the point POINT(30.955005645752 29.9774799346924).
If you want to separate longitude and latitude, run:
SELECT ST_X(ST_AsText('0101000020E6100000000000407BF43E40000000203CFA3D40')), ST_Y(ST_AsText('0101000020E6100000000000407BF43E40000000203CFA3D40'))
I found the answer is simply by removing this part of the string '20E61000', once removed, the function works fine, I've used this function
ST_GEOMFROMBINARY(FROM_HEX(REPLACE('0101000020E6100000000000407BF43E40000000203CFA3D40', '20E61000')))
and it worked fine, also I've verified the answer using python Shapley wkb function.
I had the same issue...had to massage the heck out of this thing.
select ST_GeomFromBinary(from_hex(to_utf8(replace(geom,'20E61000')))) as geom from ...
BigQuery Standard SQL documentation suggests that BYTE fields can be coerced into STRINGS.
We have a byte field that is the result of SHA256 hashing a field using BigQuery itself.
We now want to coerce it to a STRING, yet when we run "CAST(field_name to STRING)" we get an error:
Query Failed Error: Invalid cast of bytes to UTF8 string
What is preventing us from getting a string from this byte field? Is it surmountable? If so, what is the solution?
Below example should show you an idea
#standardSQL
WITH t AS (
SELECT SHA256('abc') x
)
SELECT x, TO_BASE64(x)
FROM t
in short - you can use TO_BASE64() for this
If you want to see the "traditional" representation of the hash in String, you have to use TO_HEX() function.
WITH table AS (
SELECT SHA256('abc') as bytes_field
)
SELECT bytes_field, TO_HEX(bytes_field) as string_field
FROM table
By default in the UI, BigQuery shows you the base64 representation but if you want to compare it with other sha256 function from other language, for example, you have to use TO_HEX()
You can try SAFE_CONVERT_BYTES_TO_STRING() function.
reference: SAFE_CONVERT_BYTES_TO_STRING
I'm doing some ETL from a CSV file in GCS to BQ, everything works fine, except for dates. The field name in my table is TEST_TIME and the type is DATE, so in the TableRow I tried passing a java.util.Date, a com.google.api.client.util.DateTime, a String, a Long value with the number of seconds, but none worked.
I got error messages like these:
Could not convert non-string JSON value to DATE type. Field: TEST_TIME; Value: ...
When using DateTime I got this error:
JSON object specified for non-record field: TEST_TIME.
//tableRow.set("TEST_TIME", date);
//tableRow.set("TEST_TIME", new DateTime(date));
//tableRow.set("TEST_TIME", date.getTime()/1000);
//tableRow.set("TEST_TIME", dateFormatter.format(date)); //e.g. 05/06/2016
I think that you're expected to pass a String in the format YYYY-MM-DD, which is similar to if you were using the REST API directly with JSON. Try this:
tableRow.set("TEST_TIME", "2017-04-06");
If that works, then you can convert the actual date that you have to that format and it should also work.
While working with google cloud dataflow, I used a wrapper from Google for timestamp - com.google.api.client.util.DateTime.
This worked for me while inserting rows into Big Query tables. So, instead of
tableRow.set("TEST_TIME" , "2017-04-07");
I would recommend
tableRow.set("TEST_TIME" , new DateTime(new Date()));
I find this to be a lot cleaner than passing timestamp as a string.
Using the Java class com.google.api.services.bigquery.model.TableRow, to set milliseconds since UTC into a BigQuery TIMESTAMP do this:
tableRow.set("timestamp", millisecondsSinceUTC / 1000.0d);
tableRow.set() expects a floating point number representing seconds since UTC with up to microsecond precision.
Very non-standard and undocumented (set() boxes the value in an object, so it's unclear what data types set() accepts. The other proposed solution of using com.google.api.client.util.DateTime did not work for me.)