Get all entries for a specific json tag only in postgresql - sql

I have a database with a json field which has multiple parts including one called tags, there are other entries as below but I want to return only the fields with "{"tags":{"+good":true}}".
"{"tags":{"+good":true}}"
"{"has_temps":false,"tags":{"+good":true}}"
"{"tags":{"+good":true}}"
"{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}"
I can get part of the way there with this statement in my where clause trips.metadata->'tags'->>'+good' = 'true' but that returns all instances where tags are good and true including all entries above. I want to return entries with the specific statement "{"tags":{"+good":true}}" only. So taking out the two entries that begin has_temps.
Any thoughts on how to do this?

With jsonb column the solution is obvious:
with trips(metadata) as (
values
('{"tags":{"+good":true}}'::jsonb),
('{"has_temps":false,"tags":{"+good":true}}'),
('{"tags":{"+good":true}}'),
('{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}')
)
select *
from trips
where metadata = '{"tags":{"+good":true}}';
metadata
-------------------------
{"tags":{"+good":true}}
{"tags":{"+good":true}}
(2 rows)
If the column's type is json then you should cast it to jsonb:
...
where metadata::jsonb = '{"tags":{"+good":true}}';

If I get you right, you can check text value of the "tags" key, like here:
select true
where '{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}'::json->>'tags'
= '{"+good":true}'

Related

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

How to use JSON_MODIFY to change all of the keys in a column that has an array of JSON objects?

I have a column in my database that looks like this (3 separate rows of data)
Columns
[{"header":"C", "value":"A"},{"header":"D","value":"A2"},{"header":"E","value":"A3"}]
[{"header":"C", "value":"B"},{"header":"D","value":"B2"},{"header":"E","value":"B3"}]
[{"header":"C", "value":"C"},{"header":"D","value":"C2"},{"header":"E","value":"C3"}]
I want to null out all of the values of the "header" key and change the name to be test
I also want to change the name all of the "value"'s to be newHeader
I tried running a script like this to change all of the headers inside the array to be test but it doesn't expect the '*' character.
UPDATE Files
SET Columns = JSON_MODIFY(
JSON_MODIFY(Columns,'$.test', JSON_VALUE(Columns,'$[*].header')),
'$[*].header',
NULL
)
The end result I want to be like this:
Columns
[{"test":"", "newHeader":"A"},{"test":"","newHeader":"A2"},{"test":"","newHeader":"A3"}]
[{"test":"", "newHeader":"B"},{"test":"","newHeader":"B2"},{"test":"","newHeader":"B3"}]
[{"test":"", "newHeader":"C"},{"test":"","newHeader":"C2"},{"test":"","newHeader":"C3"}]

TSQL - Need to query a database column which is populated by XML

TSQL - Need to query a database column which is populated by XML.
The Database has an iUserID column with an Application ID and VCKey
TxtValue is the Column name and the contained Data is similar to this
<BasePreferencesDataSet xmlns="http://tempuri.org/BasePreferencesDataSet.xsd">
<ViewModesTable>
<iViewID>1</iViewID>
</ViewModesTable>
<ViewMode_PreferenceData>
<iViewID>1</iViewID>
<iDataID>0</iDataID>
<strValue>False</strValue>
</ViewMode_PreferenceData>
<ViewMode_PreferenceData>
<iViewID>1</iViewID>
<iDataID>5</iDataID>
<strValue>True</strValue>
</ViewMode_PreferenceData>
<ViewMode_PreferenceData>
<iViewID>1</iViewID>
<iDataID>6</iDataID>
<strValue>True</strValue>
</ViewMode_PreferenceData>
<ViewMode_PreferenceData>
<iViewID>1</iViewID>
<iDataID>4</iDataID>
<strValue>False</strValue>
I want to be able to identify any iUserID in which the StrValue for iDataID's 5 and 6 are not set to True.
I have attempted to use a txtValue Like % statement but even if I copy the contents and query for it verbatim it will not yield a result leading me to believe that the XML data cannot be queried in this manner.
Screenshot of Select * query for this DB for reference
You can try XML-method .exist() together with an XPath with predicates:
WITH XMLNAMESPACES(DEFAULT 'http://tempuri.org/BasePreferencesDataSet.xsd')
SELECT *
FROM YourTable
WHERE CAST(txtValue AS XML).exist('/BasePreferencesDataSet
/ViewMode_PreferenceData[iDataID=5 or iDataID=6]
/strValue[text()!="True"]')=1;
The namespace-declaration is needed to address the elements without a namespace prefix.
The <ViewMode_PreferenceData> is filtered for the fitting IDs, while the <strValue> is filtered for a content !="True". This will return any data row, where there is at least one entry, with an ID of 5 or 6 and a value not equal to "True".
So without sample data (including tags; sorry you're having trouble with that) it's tough to craft the complete query, but what you're looking for is XQuery, specifically the .exists method in T-SQL.
Something like
SELECT iUserID
FROM tblLocalUserPreferences
WHERE iApplicationID = 30
AND vcKey='MonitorPreferences'
AND (txtValue.exist('//iDataID[text()="5"]/../strValue[text()="True"]') = 0
OR txtValue.exist('//iDataID[text()="6"]/../strValue[text()="True"]')=0)
This should return all userID's where either iDataID 5 or 6 do NOT contain True. In other words, if both are true, you won't get that row back.

Select rows from table with jsonb column based on arbitrary jsonb filter expression

Test data
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3}')
, ('{"a":11,"b":12, "c":13}')
, ('{"a":21,"b":22, "c":23}')
Problem statement: I want to receive an arbitrary JSONB parameter which acts as a filter on column t.data, such as
{ "b":{ "from":0, "to":20 }, "c":13 }
and use this to select matching rows from my test table t.
In this example, I want rows where b is between 0 and 20 and c = 13.
No error is required if the filter specifies a "column" (or "tag") which does not exist in t.data - it just fails to find a match.
I've used numeric values for simplicity but would like an approach which generalises to text as well.
What I have tried so far. I looked at the containment approach, which works for equality conditions, but am stumped on a generic way of handling range conditions:
select * from t
where t.data#> '{"c":13}'::jsonb;
Background: This problem arose when building a generic table-preview page on a website (for Admin users).
The page displays a filter based on various columns in whichever table is selected for preview.
The filter is then passed to a function in Postgres DB which applies this dynamic filter condition to the table.
It returns a jsonb array of the rows matching the filter specified by the user.
This jsonb array is then used to populate the Preview resultset.
The columns which make up the filter may change.
My Postgres version is 9.6 - thanks.
if you want to parse { "b":{ "from":0, "to":20 }, "c":13 } you need a parser. It is out of scope of json functions, but you can write "generic" query using AND and OR to filter by such json, eg:
https://www.db-fiddle.com/f/jAPBQggG3p7CxqbKLMbPKw/0
with filt(f) as (values('{ "b":{ "from":0, "to":20 }, "c":13 }'::json))
select *
from t
join filt on
(f->'b'->>'from')::int < (data->>'b')::int
and
(f->'b'->>'to')::int > (data->>'b')::int
and
(data->>'c')::int = (f->>'c')::int
;
Thanks for the comments/suggestions.
I will definitely look at GraphQL when I have more time - I'm working under a tight deadline at the moment.
It seems the consensus is that a fully generic solution is not achievable without a parser.
However, I got a workable first draft - it's far from ideal but we can work with it. Any comments/improvements are welcome ...
Test data (expanded to include dates & text fields)
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3, "d":"2018-03-10", "e":"2018-03-10", "f":"Blah blah" }')
, ('{"a":11,"b":12, "c":13, "d":"2018-03-14", "e":"2018-03-14", "f":"Howzat!"}')
, ('{"a":21,"b":22, "c":23, "d":"2018-03-14", "e":"2018-03-14", "f":"Blah blah"}')
First draft of code to apply a jsonb filter dynamically, but with restrictions on what syntax is supported.
Also, it just fails silently if the syntax supplied does not match what it expects.
Timestamp handling a bit kludgey, too.
-- Handle timestamp & text types as well as int
-- See is_timestamp(text) function at bottom
with cte as (
select t.data, f.filt, fk.key
from t
, ( values ('{ "a":11, "b":{ "from":0, "to":20 }, "c":13, "d":"2018-03-14", "e":{ "from":"2018-03-11", "to": "2018-03-14" }, "f":"Howzat!" }'::jsonb ) ) as f(filt) -- equiv to cross join
, lateral (select * from jsonb_each(f.filt)) as fk
)
select data, filt --, key, jsonb_typeof(filt->key), jsonb_typeof(filt->key->'from'), is_timestamp((filt->key)::text), is_timestamp((filt->key->'from')::text)
from cte
where
case when (filt->key->>'from') is null then
case jsonb_typeof(filt->key)
when 'number' then (data->>key)::numeric = (filt->>key)::numeric
when 'string' then
case is_timestamp( (filt->key)::text )
when true then (data->>key)::timestamp = (filt->>key)::timestamp
else (data->>key)::text = (filt->>key)::text
end
when 'boolean' then (data->>key)::boolean = (filt->>key)::boolean
else false
end
else
case jsonb_typeof(filt->key->'from')
when 'number' then (data->>key)::numeric between (filt->key->>'from')::numeric and (filt->key->>'to')::numeric
when 'string' then
case is_timestamp( (filt->key->'from')::text )
when true then (data->>key)::timestamp between (filt->key->>'from')::timestamp and (filt->key->>'to')::timestamp
else (data->>key)::text between (filt->key->>'from')::text and (filt->key->>'to')::text
end
when 'boolean' then false
else false
end
end
group by data, filt
having count(*) = ( select count(distinct key) from cte ) -- must match on all filter elements
;
create or replace function is_timestamp(s text) returns boolean as $$
begin
perform s::timestamp;
return true;
exception when others then
return false;
end;
$$ strict language plpgsql immutable;

SQL Server - XQuery for XML

Just similar other post, I need to retrieve any rows from table applying criteria on Xml column, for instance, supposing you have an xml column like this:
<DynamicProfile xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/WinTest">
<AllData xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<d2p1:KeyValueOfstringstring>
<d2p1:Key>One</d2p1:Key>
<d2p1:Value>1</d2p1:Value>
</d2p1:KeyValueOfstringstring>
<d2p1:KeyValueOfstringstring>
<d2p1:Key>Two</d2p1:Key>
<d2p1:Value>2</d2p1:Value>
</d2p1:KeyValueOfstringstring>
</AllData>
</DynamicProfile>
My query would be able to return all rows where node value <d2p1:Key> = 'some key value' AND node value <d2p1Value = 'some value value'.
Imagine of that just as a dynamic table where KEY node represent the column name and Value node represent column's value.
The following query does not work because key and value nodes are not sequential:
select * from MyTable where
MyXmlField.exist('//d2p1:Key[.="One"]') = 1
AND MyXmlField.exist('//d2p1:Value[.="1"]') = 1
Instead of looking for //d2p1:key[.="One"] and //d2p1:Value[.="1"] as two separate searches, do a single query that looks for both at once, like so:
//d2p1:KeyValueOfstringstring[./d2p1:Key="One"][./d2p1:Value=1]