Parse json array and load into a table - sql

I'm having issues with parsing a json array and loading into a hive table.
Json array lives in table dmetrics and in column metrics
The json looks like this"
{"impressions":8256,"requests":67.....
I keep getting the following error :
{"cannot resolve 'get_json_object(dmetrics.metrics, '$.imppressions')' due to data type mismatch: argument 1 requires string type, however, 'dmetrics.metrics' is of struct<confirmedimpressions:bigint,requests:bigint,....
I'm using the following code:
get_json_object(dmetrics.metrics, '$.impression.included)

Related

How do you convert text column data with Ruby JSON format ("key" => "value") to standard JSON?

I have data in the comment column of the payments table. The data is stored as plain text in the following format:
{"foo"=>"bar"}
I need to query the value of the specific "foo" key and tried the following:
select comment::json -> 'foo' from payments
but because the data stored is not in JSON format I get the following error:
invalid input syntax for type json DETAIL: Token "=" is invalid. CONTEXT: JSON data, line 1: {"foo"=>"bar"}
which refers to the => that Ruby uses for Hashes.
Is there a way to convert the text data to JSON data on-the-fly so I can then access the specific keys I need?
You can replace the => with a : to make that example a valid JSON value:
replace(comment, '=>', ':')::jsonb ->> 'foo'
It sounds like the data is technically valid ruby which means we can do something a bit clever.
require 'json'
def parse_data(data_string)
eval(data_string).to_json
end
Should do the trick so long as the data is trusted.

Extract JSON content in Metabase SQL query

Using: Django==2.2.24, Python=3.6, PostgreSQL is underlying DB
Working with Django ORM, I can easily make all sort of queries, but I started using Metabase, and my SQL might be a bit rusty.
The problem:
I am trying to get a count of the items in a list, under a key in a dictionary, stored as a JSONField:
from django.db import models
from jsonfield import JSONField
class MyTable(models.Model):
data_field = JSONField(blank=True, default=dict)
Example of the dictionary stored in data_field:
{..., "my_list": [{}, {}, ...], ...}
Under "my_list" key, the value stored is a list, which contains a number of other dictionaries.
In Metabase, I am trying to get a count for the number of dictionaries in the list, but even more basic things, none of which work.
Some stuff I tried:
Attempt:
SELECT COUNT(elem->'my_list') as my_list_count
FROM my_table, json_object_keys(data_field:json) AS elem
Error:
ERROR: syntax error at or near ":" Position: 226
Attempt:
SELECT ARRAY_LENGTH(elem->'my_list') as my_list_count
FROM my_table, JSON_OBJECT_KEYS(data_field:json) AS elem
Error:
ERROR: syntax error at or near ":" Position: 233
Attempt:
SELECT JSON_ARRAY_LENGTH(data_field->'my_list'::json)
FROM my_table
Error:
ERROR: invalid input syntax for type json Detail: Token "my_list" is invalid. Position: 162 Where: JSON data, line 1: my_list
Attempt:
SELECT ARRAY_LENGTH(JSON_QUERY_ARRAY(data_field, '$.my_list'))
FROM my_table
Error:
ERROR: function json_query_array(text, unknown) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 140
Basically, I think the issue is that I am using the wrong signatures (most of the time) in the methods I am trying to use.
I used this query to make sure I can at least get the keys from the dictionary:
SELECT JSON_OBJECT_KEYS(data_field::json)
FROM my_table
I was not able to use JSON_OBJECT_KEYS() without adding the ::json cast, I was getting this error:
ERROR: function json_object_keys(text) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 127
But with the json cast, I am getting all the keys as intended.
Thank you for taking a look!
EDIT:
I also found this interesting article with different solution but none of the solutions worked.
Also seen this SO post which did not help.
Ok, after some more digging around, I found this article, which had the correct format/syntax.
This code is what I used to fetch the list from the JSON object successfully:
select data_field::json->'my_list' as the_list
from my_table
Then, I used json_array_length() to get the number of elements:
select json_array_length(data_field::json->'my_list') as number_of_elements
from my_table
All done! :)
EDIT:
I just found the reason to this whole shenanigan.
In the code (which goes years back) we used this package:
jsonfield==1.0.3
And used this way:
from jsonfield import JSONField
The issue is that in the background, Postgres saves the data as a string, so it needs to be cast into a JSON.
Later Django introduced its own JSONField, which stores data as you would expect, without a need to cast:
from django.contrib.postgres.fields import JSONField

PostgresSQL - Converting String in JSONB to Actual JSONB, Error: Token is invalid

I am having trouble working with the JSONB structure in PostgreSQL. currently my data is saved as follows:
"{\"Hello\":\"World\",\"idx\":0}"
Which obviously is not correct 😀 so I am trying to "repair" this and get the actual JSON representation for querying with:
SELECT regexp_replace(trim('"' FROM json_data::text), '\\"', '"', 'g')::jsonb FROM My_table
However when trying this, I get the following error:
ERROR: invalid input syntax for type json
DETAIL: Token "Рыба" is invalid.
CONTEXT: JSON data, line 1: ...х, как : Люди X, Пароль \\"Рыба...
SQL state: 22P02
So I am thinking that this is due to the character encoding that is not being accepted by the JSONB standard.
My main question then is though, how can I repair this kind of table so that I am still able to query it? I tried utilizing conver_from and convert_to but am unable to figure out how to fix this error... did anyone encounter this already?
Update: Found it! (thanks to Convert JSON string to JSONB), utilizing
SELECT (json_data#>>'{}')::jsonb FROM my_table
fixed it

Google Cloud Datalow:Getting a below error at runtime

I am writing data into nested array BQ table(array name inside the table is -merchant_array)using my dataflow template.
Sometime its running fine and loading the data but sometime its giving me that error at run time.
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.fasterxml.jackson.databind.JsonMappingException: Null key for a Map not allowed in JSON (use a converting NullKeySerializer?) (through reference chain: com.google.api.services.bigquery.model.TableRow["null"])
"message" : "Error while reading data, error message: JSON parsing error in row starting at position 223615: Only optional fields can be set to NULL. Field: merchant_array; Value: NULL",
Anyone has any idea why I am getting this error.
Thanks in advance.
here I got the issue that was causing error so I am posting my own question's answer,it might be helpful for anyone.
So the error was like-
Only optional fields can be set to NULL. Field: merchant_array; Value: NULL",
And here merchant_array is defined as an array that contains record (repetitive) data.
As per google doc the the array can not be-
ARRAYs cannot be NULL.
NULL ARRAY elements cannot persist to a table.
At the same time I was using arraylist in my code, that allows null values. So before making a record type data in code or setting the data in arraylist, just remove the NULL tablerows if exist.
hope this will helpful.

invalid input syntax for type double precision: " chargebackvalue"

I'm trying to upload a .csv file to Postgres and I'm getting this error:
invalid input syntax for type double precision: " chargebackvalue"
image error
Here it is the structure of the table:
table structure here
The code of the .csv file:
stoneid; mundipaggid; cardnumber; emblem; chargebackvalue; cardmask; chargebackdate; emitter; description; purchasedate; clientName; tacomorderid; useremail
0155477; 'or_3E2W0X5s5jtPjWYO';0670000546857; 'Visa'; 60.6; '498453******3271'; '2019-10-17'; 'Banco do Brasil S.A.'; 'Teste'; '2019-10-10'; 'Silvana Teixeira Da Silva';99854; 'teste#teste.com'
This is too long for a comment.
I would recommend loading the data into a staging table, where all the columns are strings.
Then, select from that table to load the final table. This makes it easier to track down problems in the data that might occur during the load.
Clearly the row you have shown is not the cause of the error. Or, if this is the entire file, then you simply have not skipped the first line because it has header names rather than values.