Check if field exists in CosmosDB JSON with SQL - nodeJS - sql

I am using Azure CosmosDB to store documents (JSON).
I am trying to query all documents that contain the field "abc", and not return the documents that do not have the field "abc". For example, return the first object below and not the second
{
"abc": "123"
}
{
"jkl": "098"
}
I am trying to use the following code:
client.queryDocuments(
collectionUrl,
`SELECT r.id, r.authToken.instagram,r.userName FROM root r WHERE r.abc`
)
I assumed the above would check if abc exists similar to if (r.abc) {}
I have tried using WHERE r.abc IS NOT NULL
Thanks in advance

If you want to know if a field exists you should use the IS_DEFINED("FieldName")
If you want to know if the field's value has a value the
FieldName != null or
FieldName <> null (apparently)
I use variations of this in production:
SELECT c.FieldName
FROM c
WHERE IS_DEFINED(c.FieldName)

All you need to do is change your query to
SELECT r.id, r.authToken.instagram,r.userName FROM root r WHERE r.abc != null
or
SELECT r.id, r.authToken.instagram,r.userName FROM root r WHERE r.abc <> null
Both operators work (tested on the Data Explorer)

Add the NOT operator in the SQL query to negate.
SELECT r.id, r.authToken.instagram,r.userName
FROM root r
WHERE NOT IS_DEFINED(r.abc)
to include all entries where the FieldName abc doesn't exist.

Related

Does BigQuery have a safe navigation operator?

Does BigQuery have a safe navagation operator, i.e. a null-safe variant of its field navigation operator?
Ideally I'm looking for an operator akin to ?. in Swift/TypeScript, &. in Ruby, etc., but a function I could call would suffice as well.
Right now my query looks like:
SELECT a.b.c.d.e
FROM myTable AS a
WHERE
a.b IS NOT NULL
&& a.b.c IS NOT NULL
&& a.b.c.d IS NOT NULL
&& a.b.c.d.e = "my desired value"
Edit: This doesn't actually work.
Name b not found inside a at [12:34]
I'd wish it could be something like:
SELECT a.b.c.d.e
FROM myTable AS a
WHERE a?.b?.c?.d?.e = "my desired value"
afaik, there is no safe navagation operator for STRUCT type in bigquery.
what I can come up with is to conver nested STRUCT type to JSON type and utilize json path with which you can navigate safely.
WITH myTable AS (
SELECT STRUCT(STRUCT(STRUCT('my_desired_value' AS e) AS d) AS c) AS b
)
SELECT TO_JSON(b).c.d.e, --
TO_JSON(b).f.d.e, -- non-existing path
-- b.f.d.e --> error - Field name f does not exist ...
FROM myTable AS a;
To check field path of struct type, you can use INFORMATION_SCHEMA.COLUMN_FIELD_PATHS.
SELECT *
FROM `your-project.your_dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`
WHERE table_name = 'myTable';

Dynamic where condition PostgreSQL

I am building a CRUD application that allows the user to input some search criteria and get the documents corresponding to those criteria. Unfortunately i have some difficulties in creating a query in postgres that uses different conditions in the where part, based on the input sent by the user.
For example if the user set as search criteria only the document number the query would be defined like this:
select * from document where document_num = "value1"
On the other hand if the user gave two criteria the query would be set up like this:
select * from document where document_num = "value1" and reg_date = "value2"
How can i set up a query that is valid for all the cases? Looking in other threads i saw as a possible solution using coalesce in the where part:
document_num = coalesce("value1", document_num)
The problem with this approach is that when no value is provided postgres converts the condition to document_num IS NOT NULL which is not what i need (my goal is to set the condition to always true).
Thanks in advance
So the solution by #D-shih will work if you have a default value and you can also use COALESCE as below.
SELECT *
FROM document
WHERE document_num = COALESCE("value1", default_value)
AND reg_date = COALESCE("value2", default_value);
If you don't have default values then you can create your query using CASE WHEN(here I am supposing you have some variables from which you will determine which conditions to apply like when to apply document_num or when to apply reg_date or when to apply both). Giving a little example below.
SELECT *
FROM document
WHERE
(
CASE
WHEN "value1" IS NOT NULL THEN document_num = "value1"
ELSE TRUE
END
)
AND (
CASE
WHEN "value2" IS NOT NULL THEN reg_date = "value2"
ELSE TRUE
END
)
You can read more how to use CASE WHEN here.
If I understand correctly, you can try to pass the user input value by parameter.
parameter default value might design that the user can't pass if the user didn't want to use the parameter it will use the default value.
we can use OR to judge whether use parameter otherwise ignores that.
SELECT *
FROM document
WHERE (document_num = :value1 OR :value1 = [default value])
AND (reg_date = :value2 OR :value2 = [default value])

Select rows from table with jsonb column based on arbitrary jsonb filter expression

Test data
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3}')
, ('{"a":11,"b":12, "c":13}')
, ('{"a":21,"b":22, "c":23}')
Problem statement: I want to receive an arbitrary JSONB parameter which acts as a filter on column t.data, such as
{ "b":{ "from":0, "to":20 }, "c":13 }
and use this to select matching rows from my test table t.
In this example, I want rows where b is between 0 and 20 and c = 13.
No error is required if the filter specifies a "column" (or "tag") which does not exist in t.data - it just fails to find a match.
I've used numeric values for simplicity but would like an approach which generalises to text as well.
What I have tried so far. I looked at the containment approach, which works for equality conditions, but am stumped on a generic way of handling range conditions:
select * from t
where t.data#> '{"c":13}'::jsonb;
Background: This problem arose when building a generic table-preview page on a website (for Admin users).
The page displays a filter based on various columns in whichever table is selected for preview.
The filter is then passed to a function in Postgres DB which applies this dynamic filter condition to the table.
It returns a jsonb array of the rows matching the filter specified by the user.
This jsonb array is then used to populate the Preview resultset.
The columns which make up the filter may change.
My Postgres version is 9.6 - thanks.
if you want to parse { "b":{ "from":0, "to":20 }, "c":13 } you need a parser. It is out of scope of json functions, but you can write "generic" query using AND and OR to filter by such json, eg:
https://www.db-fiddle.com/f/jAPBQggG3p7CxqbKLMbPKw/0
with filt(f) as (values('{ "b":{ "from":0, "to":20 }, "c":13 }'::json))
select *
from t
join filt on
(f->'b'->>'from')::int < (data->>'b')::int
and
(f->'b'->>'to')::int > (data->>'b')::int
and
(data->>'c')::int = (f->>'c')::int
;
Thanks for the comments/suggestions.
I will definitely look at GraphQL when I have more time - I'm working under a tight deadline at the moment.
It seems the consensus is that a fully generic solution is not achievable without a parser.
However, I got a workable first draft - it's far from ideal but we can work with it. Any comments/improvements are welcome ...
Test data (expanded to include dates & text fields)
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3, "d":"2018-03-10", "e":"2018-03-10", "f":"Blah blah" }')
, ('{"a":11,"b":12, "c":13, "d":"2018-03-14", "e":"2018-03-14", "f":"Howzat!"}')
, ('{"a":21,"b":22, "c":23, "d":"2018-03-14", "e":"2018-03-14", "f":"Blah blah"}')
First draft of code to apply a jsonb filter dynamically, but with restrictions on what syntax is supported.
Also, it just fails silently if the syntax supplied does not match what it expects.
Timestamp handling a bit kludgey, too.
-- Handle timestamp & text types as well as int
-- See is_timestamp(text) function at bottom
with cte as (
select t.data, f.filt, fk.key
from t
, ( values ('{ "a":11, "b":{ "from":0, "to":20 }, "c":13, "d":"2018-03-14", "e":{ "from":"2018-03-11", "to": "2018-03-14" }, "f":"Howzat!" }'::jsonb ) ) as f(filt) -- equiv to cross join
, lateral (select * from jsonb_each(f.filt)) as fk
)
select data, filt --, key, jsonb_typeof(filt->key), jsonb_typeof(filt->key->'from'), is_timestamp((filt->key)::text), is_timestamp((filt->key->'from')::text)
from cte
where
case when (filt->key->>'from') is null then
case jsonb_typeof(filt->key)
when 'number' then (data->>key)::numeric = (filt->>key)::numeric
when 'string' then
case is_timestamp( (filt->key)::text )
when true then (data->>key)::timestamp = (filt->>key)::timestamp
else (data->>key)::text = (filt->>key)::text
end
when 'boolean' then (data->>key)::boolean = (filt->>key)::boolean
else false
end
else
case jsonb_typeof(filt->key->'from')
when 'number' then (data->>key)::numeric between (filt->key->>'from')::numeric and (filt->key->>'to')::numeric
when 'string' then
case is_timestamp( (filt->key->'from')::text )
when true then (data->>key)::timestamp between (filt->key->>'from')::timestamp and (filt->key->>'to')::timestamp
else (data->>key)::text between (filt->key->>'from')::text and (filt->key->>'to')::text
end
when 'boolean' then false
else false
end
end
group by data, filt
having count(*) = ( select count(distinct key) from cte ) -- must match on all filter elements
;
create or replace function is_timestamp(s text) returns boolean as $$
begin
perform s::timestamp;
return true;
exception when others then
return false;
end;
$$ strict language plpgsql immutable;

How to parse big string U-SQL Regex

I have got a big CSVs that contain big strings. I wanna parse them in U-SQL.
#t1 =
SELECT
Regex.Match("ID=881cf2f5f474579a:T=1489536183:S=ALNI_MZsMMpA4voGE4kQMYxooceW2AOr0Q", "ID=(?<ID>\\w+):T=(?<T>\\w+):S=(?<S>[\\w\\d_]*)") AS p
FROM
(VALUES(1)) AS fe(n);
#t2 =
SELECT
p.Groups["ID"].Value AS gads_id,
p.Groups["T"].Value AS gads_t,
p.Groups["S"].Value AS gads_s
FROM
#t1;
OUTPUT #t
TO "/inhabit/test.csv"
USING Outputters.Csv();
Severity Code Description Project File Line Suppression State
Error E_CSC_USER_INVALIDCOLUMNTYPE:
'System.Text.RegularExpressions.Match' cannot be used as column type.
I know how to do it in a SQL way with EXPLODE/CROSS APPLY/GROUP BY. But may be it is possible to do without these dances?
One more update
#t1 =
SELECT
Regex.Match("ID=881cf2f5f474579a:T=1489536183:S=ALNI_MZsMMpA4voGE4kQMYxooceW2AOr0Q", "ID=(?<ID>\\w+):T=(?<T>\\w+):S=(?<S>[\\w\\d_]*)").Groups["ID"].Value AS id,
Regex.Match("ID=881cf2f5f474579a:T=1489536183:S=ALNI_MZsMMpA4voGE4kQMYxooceW2AOr0Q", "ID=(?<ID>\\w+):T=(?<T>\\w+):S=(?<S>[\\w\\d_]*)").Groups["T"].Value AS t,
Regex.Match("ID=881cf2f5f474579a:T=1489536183:S=ALNI_MZsMMpA4voGE4kQMYxooceW2AOr0Q", "ID=(?<ID>\\w+):T=(?<T>\\w+):S=(?<S>[\\w\\d_]*)").Groups["S"].Value AS s
FROM
(VALUES(1)) AS fe(n);
OUTPUT #t1
TO "/inhabit/test.csv"
USING Outputters.Csv();
This wariant works fine. But there is a question. Will the regex evauated 3 times per row? Does exists any chance to hint U-SQL engine - the function Regex.Match is deterministic.
You should probably be using something more efficient than Regex.Match. But to answer your original question:
System.Text.RegularExpressions.Match is not part of the built-in U-SQL types.
Thus you would need to convert it into a built-in type, such as string or SqlArray<string> or wrap it into a udt that provides an IFormatter to make it a user-defined type.
Looks like it is better to use something like this to parse the simple strings. Regexes are slow for the task and if i will use simple string expressions (instead of CLR calls) they probably will be translated into c++ code at codegen phase... and .net interop will be eliminated (i'm not sure).
#t1 =
SELECT
pv.cust_gads != null ? new SQL.ARRAY<string>(pv.cust_gads.Split(':')) : null AS p
FROM
dwh.raw_page_view_data AS pv
WHERE
pv.year == "2017" AND
pv.month == "04";
#t3 =
SELECT
p != null && p.Count == 3 ? p[0].Split('=')[1] : null AS id,
p != null && p.Count == 3 ? p[1].Split('=')[1] : null AS t,
p != null && p.Count == 3 ? p[2].Split('=')[1] : null AS s
FROM
#t1 AS t1;
OUTPUT #t3
TO "/tmp/test.csv"
USING Outputters.Csv();

How to make Linq to SQL translate to a derived column?

I have a table with a 'Wav' column that is of type 'VARBINARY(max)' (storing a wav file) and would like to be able to check if there is a wav from Linq to SQL.
My first approach was to do the following in Linq:
var result = from row in dc.Table
select new { NoWav = row.Wav != null };
The problem with the code above is it will retreive all the binary content to RAM, and this isn't good (slow and memory hungry).
Any idea how to have Linq query to translate into something like bellow in SQL?
SELECT (CASE WHEN Wav IS NULL THEN 1 ELSE 0 END) As NoWav FROM [Update]
Thanks for all the replies. They all make sense. Indeed, Linq should translate the != null correctly, but it didn't seem to effectively do it: running my code was very slow, so somehow my only explaination is that it got the binary data transfered over to the RAM.... but maybe I'm wrong.
I think I found a work around anyway somewhere else on stackoverflow: Create a computed column on a datetime
I ran the following query against my table:
ALTER TABLE [Table]
ADD WavIsNull AS (CASE WHEN [Wav] IS NULL Then (1) ELSE (0) END)
Now I'll update my DBML to reflect that computed column and see how it goes.
Are you sure that this code will retrieve the data to RAM?
I did some testing using LINQPad and the generated SQL was optimized as you suggest:
from c in Categories
select new
{
Description = c.Description != null
}
SELECT
(CASE
WHEN [t0].[description] IS NOT NULL THEN 1
ELSE 0
END) AS [Description]
FROM [Category] AS [t0]
What about this query:
var result = from row in dc.Table where row.Wav == null
select row.PrimaryKey
for a list of keys where your value is null. For listing of null/not null you could do this:
var result = from row in db.Table
select new
{ Key = row.Key, NoWav = (row.Wav == null ? true : false) };
That will generate SQL code similar to this:
SELECT [t0].[WavID] AS [Key],
(CASE
WHEN [t0].[Wav] IS NULL THEN 1
ELSE 0
END) AS [NoWav]
FROM [tblWave] AS [t0]
I'm not clear here, your SQL code is going to return a list of 1s and 0s from your database. Is that what you are looking for? If you have an ID for your record then you could just retrieve that single record with the a condition on the Wav field, null return would indicate no wav, i.e.
var result = from row in dc.Table
where (row.ID == id) && (row.Wav != null)
select new { row.Wav };