I have a project which uses jooq + postgres with multiple tables and relations between them.
while I was creating a select query with jooq I had to use arrayAgg for my specific scenario.
dslContext.select(arrayAgg(tableName.INTEGER_LETS_SAY).as("static_name")
the specific column INTEGER_LETS_SAY is nullable.
when the results passed in arrayAgg are all null then the response of the postgres is '{null}' ( tested with getQuery().getSql() ) but the where statement cannot return true for all the methods I tried.
for example :
field("static_name", Long[].class).isNull()
field("static_name", Long[].class).equal(new Long[] {null})
field("static_name", Long[].class).equal(DSL.castNull(Long[].class)
field("static_name", Long[].class).cast(String.class).eq(DSL.value("{null}")))
field("static_name", Long[].class).cast(String.class).eq(DSL.value("'{null}'")))
any clue what am I doing wrong?
Note : I did try the query with plain sql and static_name = '{null}' worked
{NULL} is PostgreSQL's text representation of an array containing one SQL NULL value. You can try it like this:
select (array[null]::int[])::text ilike '{null}' as a
It yields:
a |
----|
true|
Note, I'm using ilike for case insensitive comparison. On my installation, I'm getting {NULL}, not {null}. If you wanted to compare things as text, you could do it using Field.likeIgnoreCase(). E.g. this works for me:
System.out.println(ctx.select(
field(val(new Long[] { null }).cast(String.class).likeIgnoreCase("{null}")).as("a")
).fetch());
Producing:
+----+
|a |
+----+
|true|
+----+
But much better, do not work with the text representation. Instead, follow this suggestion here. In SQL:
select true = all(select a is null from unnest(array[null]::int[]) t (a)) as a
In jOOQ:
System.out.println(ctx.select(
field(inline(true).eq(all(
select(field(field(name("a")).isNull()))
.from(unnest(val(new Long[] { null })).as("t", "a"))
))).as("a")
).fetch());
It gets a bit verbose because of all the wrapping Condition in Field<Boolean> using DSL.field(Condition).
Alternatively, use e.g. NUM_NONNULLS() (Credits to Vik Fearing for this appraoch):
System.out.println(ctx.select(
field("num_nonnulls(variadic {0})", INTEGER, val(new Long { null }))
).fetch());
Related
I have simple elastic SQL query like this:
GET /_sql?format=txt
{
"query" :"""
DESCRIBE "index_name"
"""
}
and it works, and the output is like this:
column | type | mapping
-----------------------------------------------------------
column_name1 | STRUCT | object
column_name1.Id | VARCHAR | text
column_name1.Id.keyword | VARCHAR | keyword
Is there a possibility to the prepare above query using filter or where, for example something like this:
GET /_sql?format=txt
{
"query":"""
DESCRIBE "index_name"
""",
"filter": {"terms": {"type.keyword": ["STRUCT"]}}
}
or
GET /_sql?format=txt
{
"query":"""
DESCRIBE "index_name"
WHERE "type" = 'STRUCT'
"""
}
That is not possible, no.
While the DESCRIBE sql command seems to return tabular data, it is not a query and it does not support WHERE clauses or can be used within a SELECT statement. That is actually not specific to Elasticsearch, but the same in RDBMs.
The same apparently is true for the Elasticsearch filter clause. This again will work with SELECT SQL statements, but with DESCRIBE or SHOW COLUMNS - while not producing an error - it simply will have no effect on the results.
In "real" SQL, you could work around this by querying information_schema.COLUMNS, but that is not an option in Elasticsearch.
I have a value in a JSON column that is sometimes all null in an Azure Databricks table. The full process to get to JSON_TABLE is: read parquet, infer schema of JSON column, convert the column from JSON string to deeply nested structure, explode any arrays within. I am working in SQL with python-defined UDFs (json_exists() checks the schema to see if the key is possible to use, json_get() gets a key from the column or returns a default) and want to do the following:
SELECT
ID, EXPLODE(json_get(JSON_COL, 'ARRAY', NULL)) AS SINGLE_ARRAY_VALUE
FROM
JSON_TABLE
WHERE
JSON_COL IS NOT NULL AND
json_exists(JSON_COL, 'ARRAY')==1
When the data has at least one instance of JSON_COL containing ARRAY, the schema is such that this has no problems. If, however, the data has all null values in JSON_COL.ARRAY, an error occurs because the column has been inferred as a string type (error received: input to function explode should be array or map type, not string). Unfortunately, while the json_exists() function returns the expected values, the error still occurs even when the returned dataset would be empty.
Can I get around this error via casting or replacement of nulls? If not, what is an alternative that still allows inferring the schema of the JSON?
Note: This is a simplified example. I am writing code to generate SQL code for hundreds of similar data structures, so while I am open to workarounds, a direct solution would be ideal. Please ask if anything is unclear.
Example table that causes error:
| ID | JSON_COL |
| 1 | {"_corrupt_record": null, "otherInfo": [{"test": 1, "from": 3}]} |
| 2 | {"_corrupt_record": null, "otherInfo": [{"test": 5, "from": 2}]} |
Example table that does not cause error:
| ID | JSON_COL |
| 1 | {"_corrupt_record": null, "array": [{"test": 1, "from": 3}]} |
| 2 | {"_corrupt_record": null, "otherInfo": [{"test": 5, "from": 2}]} |
This question seems like it might hold the answer, but I was not able to get anything working from it.
You can filter the table before calling json_get and explode, so that you only explode when json_get returns a non-null value:
SELECT
ID, EXPLODE(json_get(JSON_COL, 'ARRAY', NULL)) AS SINGLE_ARRAY_VALUE
FROM (
SELECT *
FROM JSON_TABLE
WHERE
JSON_COL IS NOT NULL AND
json_exists(JSON_COL, 'ARRAY')==1
)
I had exported a bunch of tables (>30) as CSV files from MySQL database using phpMyAdmin. These CSV file contains NULL values like:
"id","sourceType","name","website","location"
"1","non-commercial","John Doe",NULL,"California"
I imported many such csv to a PostgreSQL database with TablePlus. However, the NULL values in the columns are actually appearing as text rather than null.
When my application fetches the data from these columns it actually retrieves the text 'NULL' rather than a null value.
Also SQL command with IS NULL does not retrieve these rows probably because they are identified as text rather than null values.
Is there a SQL command I can do to convert all text NULL values in all the tables to actual NULL values? This would be the easiest way to avoid re-importing all the tables.
PostgreSQL's COPY command has the NULL 'some_string' option that allows to specify any string as NULL value: https://www.postgresql.org/docs/current/sql-copy.html
This would of course require re-importing all your tables.
Example with your data:
The CSV:
"id","sourceType","name","website","location"
"1","non-commercial","John Doe",NULL,"California"
"2","non-commercial","John Doe",NULL,"California"
The table:
CREATE TABLE import_with_null (id integer, source_type varchar(50), name varchar(50), website varchar(50), location varchar(50));
The COPY statement:
COPY import_with_null (id, source_type, name, website, location) from '/tmp/import_with_NULL.csv' WITH (FORMAT CSV, NULL 'NULL', HEADER);
Test of the correct import of NULL strings as SQL NULL:
SELECT * FROM import_with_null WHERE website IS NULL;
id | source_type | name | website | location
----+----------------+----------+---------+------------
1 | non-commercial | John Doe | | California
2 | non-commercial | John Doe | | California
(2 rows)
The important part that transforms NULL strings into SQL NULL values is NULL 'NULL' and could be any other value NULL 'whatever string'.
UPDATE For whoever comes here looking for a solution
See answers for two potential solutions
One of the solutions provides a SQL COPY method which must be performed before the import itself. The solution is provided by Michal T and marked as accepted answer is the better way to prevent this from happening in the first place.
My solution below uses a script in my application (Built in Laravel/PHP) which can be done after the import is already done.
Note- See the comments in the code and you could potentially figure out a similar solution in other languages/frameworks.
Thanks to #BjarniRagnarsson suggestion in the comments above, I came up with a short PHP Laravel script to perform update queries on all columns (which are of type 'string' or 'text') to replace the 'NULL' text with NULL values.
public function convertNULLStringToNULL()
{
$tables = DB::connection()->getDoctrineSchemaManager()->listTableNames(); //Get list of all tables
$results = []; // an array to store the output results
foreach ($tables as $table) { // Loop through each table
$columnNames = DB::getSchemaBuilder()->getColumnListing($table); //Get list of all columns
$columnResults = []; // array to store the results per column
foreach ($columnNames as $column) { Loop through each column
$columnType = DB::getSchemaBuilder()->getColumnType($table, $column); // Get the column type
if (
$columnType == 'string' || //check if column type is string or text
$columnType == 'text'
) {
$query = "update " . $table . " set \"" . $column . "\"=NULL where \"" . $column . "\"='NULL'"; //Build the update query as mentioned in comments above
$r = DB::update($query); //perform the update query
array_push($columnResults, [
$column => $r
]); //Push the column Results
}
}
array_push($results, [
$table => $columnResults
]); // push the table results
}
dd($results); //Output the results
}
Note I was using Laravel 8 for this.
Here's the issue. I have a column in my database (type nvarchar(max)) that I am storing JSON in. I am storing both plain strings or objects in that column like the following:
JsonTable
|--|-------------------|
|Id|JsonValue |
|--|-------------------|
|0 |{"sample":"object"}|
|--|-------------------|
|1 |"plain-string" |
|--|-------------------|
I am trying to use JSON_MODIFY to merge these values with another table's values.
The following works fine for just objects, but not strings:
SELECT JSON_MODIFY('{}', '$.Values', JSON_QUERY(JsonValue))
FROM JsonTable
WHERE Id = 0 -- Fails when string is included in results
-- Result = |------------------------------|
|{"Values":{"sample":"object"} |
|------------------------------|
However it fails to parse the ordinary string (understandably since it is not JSON)
So then my solution was to add a case statement to handle strings. However this does not work as wrapping it in a CASE statement string escapes the JSON_QUERY object and garbles it up in the final JSON_MODIFY result.
The following does not work as expected:
SELECT JSON_MODIFY('{}', '$.Values',
CASE
WHEN ISJSON(JsonValue) > 0 THEN JSON_QUERY(JsonValue)
ELSE REPLACE(JsonValue, '"','')
END)
FROM JsonTable
-- Result = |-------------------------------------|
|{"Values":"{\"sample\"::\"object\"}" |
|-------------------------------------|
|{"Values":"plain-string" |
|-------------------------------------|
So I was unable to figure out really why wrapping JSON_QUERY in a CASE statement doesnt return properly, but instead I started using this workaround which is a bit verbose and messy but it works perfectly fine:
SELECT
CASE
WHEN ISJSON(JsonValue) > 0
THEN
(SELECT JSON_MODIFY('{}', '$.Values', JSON_QUERY(JsonValue)))
ELSE
(SELECT JSON_MODIFY('{}', '$.Values', REPLACE(JsonValue, '"','')))
END
FROM JsonTable
-- Result = |-------------------------------------|
|{"Values":{"sample":"object"} |
|-------------------------------------|
|{"Values":"plain-string" |
|-------------------------------------|
Have you tried using the string_escape function to format your string for JSON; i.e. assuming your issue is related to the correctly escaping the quotes? http://sqlfiddle.com/#!18/9eecb/24391
SELECT JSON_MODIFY('{}', '$.Values',
case
when ISJSON(JsonValue) = 1 then JSON_QUERY(JsonValue)
else STRING_ESCAPE(JsonValue,'json')
end
)
FROM (
values
(0, '{"sample":"object"}')
,(1, 'plain-string')
,(2, '"plain-string2"')
) JsonTable(Id,JsonValue)
STRING_ESCAPE documentation
I'm trying to query using JSONB however I have a problem where I don't know what the first key could be.
Ideally I would be able to use a wildcard inside my query.
eg: The following works
WHERE json_data #> '{first_key,second_key}' = '"value-of-second-key"'
but I may not know the name of the first_key or want to match any of the nested sub keys.
Something like.
WHERE json_data #> '{*,second_key}' = '"value-of-second-key"'
Would be ideal using a wildcard like '*'
Any advice or approaches to this would be greatly appreciated.
You can't use wildcard for #> operator, but you can use jsonb_each function to unnest the first level of the JSON:
SELECT *
FROM jsonb_each('{"foo": {"second_key": "xxx"}, "bar": {"other_second_key": "xxx"}, "baz": {"second_key": "yyy"}}') AS e(key, value)
WHERE e.value #> '{"second_key": "xxx"}';
Result:
key | value
-----+-----------------------
foo | {"second_key": "xxx"}
(1 row)
If you just want to search for the row matching it though (and not the exact json element, as above) you can use EXISTS:
SELECT ...
FROM the_table t
WHERE EXISTS(
SELECT 1
FROM jsonb_each(t.the_jsonb_column) AS e(key, value)
WHERE e.value #> '{"second_key": "xxx"}'
)
Logically, this approach works fine, but be warned that it can't get advantage of an index as would e.value #> '{"foo": {"second_key": "xxx"}}', so if performance is really a matter, you may want to rethink your schema.