query json data from oracle 12.1 having fields value with "." - sql

I have a table that has JSON data stored and I'm using json_exists functions in the query. Below is my sample data from the column for one of the rows.
{"fields":["query.metrics.metric1.field1",
"query.metrics.metric1.field2",
"query.metrics.metric1.field3",
"query.metrics.metric2.field1",
"query.metrics.metric2.field2"]}
I want all those rows which have a particular field. So, I'm trying below.
SELECT COUNT(*)
FROM my_table
WHERE JSON_EXISTS(fields, '$.fields[*]."query.metrics.metric1.field1"');
It does not give me any results back. Not sure what I'm missing here. Please help.
Thanks

You can use # operator which refers to an occurrence of the array fields such as
SELECT *
FROM my_table
WHERE JSON_EXISTS(fields, '$.fields?(#=="query.metrics.metric1.field1")')
Demo
Edit : The above case works for 12R2+, considering that it doesn't work for your version(12R1), try to use JSON_TABLE() such as
SELECT fields
FROM my_table,
JSON_TABLE(fields, '$.fields[*]' COLUMNS ( js VARCHAR2(90) PATH '$' ))
WHERE js = 'query.metrics.metric1.field1'
Demo

I have no idea how to "pattern match" on the array element, but just parsing the whole thing and filtering does the job.
with t(x, json) as (
select 1, q'|{"fields":["a", "b"]}|' from dual union all
select 2, q'|{"fields":["query.metrics.metric1.field1","query.metrics.metric1.field2","query.metrics.metric1.field3","query.metrics.metric2.field1","query.metrics.metric2.field2"]}|' from dual
)
select t.*
from t
where exists (
select null
from json_table(
t.json,
'$.fields[*]'
columns (
array_element varchar2(100) path '$'
)
)
where array_element = 'query.metrics.metric1.field1'
);
In your code, you are accessing the field "query.metrics.metric1.field1" of an object in the fields array, and there is no such object (the elements are strings)...

Related

with XMLDIFF, how to compare only the fields that my xml elements have in common?

introduction:
I have query using a pipeline function. I won't change the names of the returned columns but I will add other columns.
I want to compare the result of the old query with the new query (syntaxal always the same (select * from mypipelinefunction) , but I have changed the pipeline function )
I have used "select *" instead of "select the name of the columns" because there is a lot names.
code:
the code example is simplified to focus on the problem addressed in the title. (no pipeline function. Only two "identic" queries are tested. The second query has one more column that the first.
SELECT
XMLDIFF (
XMLTYPE.createXML (
DBMS_XMLGEN.getxml ('select 1 one, 2 two from dual')),
XMLTYPE.createXML (
DBMS_XMLGEN.getxml ('select 1 one from dual')))
from dual.
I want that XMLDIFF to say that there is no difference because the only columns that I care about are the colums that are in common.
In short I would like to have this result
<xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</xd:xdiff>
instead of this result
<xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><xd:delete-node xd:node-type="element" xd:xpath="/ROWSET[1]/ROW[1]/TWO[1]"/></xd:xdiff>
Is this possible to force XMLdiff to compare only the columns that are in commun?
code
Another way to fix this problem would be to have a shortcut in TOAD that transform select * from t in select first_column, ......last_column from t. And it should work even if t is a pipeline function
If you only care about certain columns then wrap your query in a outer-query to only output the columns you care about:
SELECT XMLDIFF (
XMLTYPE.createXML (
DBMS_XMLGEN.getxml (
'SELECT one FROM (select 1 one, 2 two from dual)'
)
),
XMLTYPE.createXML (
DBMS_XMLGEN.getxml (
'SELECT one FROM (select 1 one from dual)'
)
)
) AS diff
FROM DUAL;
Which outputs:
DIFF
<xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><?oracle-xmldiff operations-in-docorder="true" output-model="snapshot" diff-algorithm="global"?></xd:xdiff>
db<>fiddle here

Escape square brackets in Oracle and PostgreSQL

I'm working on a project where the database could be an instance of oracle or Postgres.
I have the need to write a query with a like that work on both dbs.
The query works on a text column containing a JSON string, for example:
{"ruleName":"r2_an","divisionNameList":["div1"],"names":["name1"],"thirdTypeLabels":[],"secondTypeLabels":[],"firstTypeLabels":[]}
I need to select the lines with empty thirdTypeLabels.
select *
from my_table
where JSON like '%thirdTypeLabels%[]%';
On Oracle, for example, does not extract anything, even if in "my_table" there is more than one line matching.
The query is inside a Java software, using JDBC, because we need performace.
Have you any suggestion?
You should use a proper JSON parser otherwise there is no guarantee that %thirdTypeLabels%[]% will restrict the match of the empty array to the thirdTypeLabels key-value pair.
So for Oracle 18c you can use:
SELECT id,
thirdTypeLabelsCount
FROM mytable t
CROSS JOIN
JSON_TABLE(
t.json,
'$'
COLUMNS(
thirdTypeLabelsCount NUMBER PATH '$.thirdTypeLabels.size()'
)
)
WHERE thirdTypeLabelsCount = 0;
or
SELECT *
FROM mytable
WHERE JSON_EXISTS( json, '$ ? (#.thirdTypeLabels.size() == 0) ' )
db<>fiddle
For Postgres you have two choices to make this query work properly:
select *
from the_table
where jsonb_array_length(json::jsonb -> 'thirdTypeLabels') = 0;
Or - starting with Postgres 12 - using a JSON Path expression
select *
from the_table
where jsonb_path_exists(json::jsonb, '$.thirdTypeLabels.size() ? (# == 0)' );
Or use the same JSON path expression as in Oracle:
select *
from the_table
where jsonb_path_exists(json::jsonb, '$' ? (#.thirdTypeLabels.size() == 0)');
In Postgres you should also use a column defined as jsonb rather than text (or varchar)

How to efficiently select records matching substring in another table using BigQuery?

I have a table of several million strings that I want to match against a table of about twenty thousand strings like this:
#standardSQL
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Unfortunately this is taking an awful long time.
Considering that the fragment table is only 20k records, can I load it into a JavaScript array using a UDF and match it that way? I'm trying to figure out how to this right now but perhaps there's already some magic I could do here to make this faster. I tried a CROSS JOIN and got resource exceeded fairly quickly. I've also tried using EXISTS but I can't reference the record.name inside that subquery's WHERE without getting an error.
Example using Public Data
This seems to reflect about the same amount of data ...
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT LOWER(name) AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Below is for BigQuery Standard SQL
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT DISTINCT LOWER(name) AS name
FROM `bigquery-public-data.usa_names.usa_1910_current`
), temp_record AS (
SELECT record, TO_JSON_STRING(record) id, name, item
FROM record, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
), temp_fragment AS (
SELECT name, item FROM fragment, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
)
SELECT AS VALUE ANY_VALUE(record) FROM (
SELECT ANY_VALUE(record) record, id, r.name name, f.name fragment_name
FROM temp_record r
JOIN temp_fragment f
USING(item)
GROUP BY id, name, fragment_name
)
WHERE name LIKE CONCAT('%', fragment_name, '%')
GROUP BY id
above was completed in 375 seconds, while original query is still running at 2740 seconds and keep running, so I will not even wait for it to complete
Mikhail's answer appears to be faster - but lets have one that doesn't need to SPLIT nor separate the text into words.
First, compute a regular expression with all the words to be searched:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT FORMAT('(%s)',STRING_AGG(name,'|'))
FROM fragment
Now you can take that resulting string, and use it in a REGEX ignoring case:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), largestring AS (
SELECT '(?i)(mary|margaret|helen|more_names|more_names|more_names|josniel|khaiden|sergi)'
)
SELECT record.* FROM `record`
WHERE REGEXP_CONTAINS(record.name, (SELECT * FROM largestring))
(~510 seconds)
As eluded to in my question, I worked on a version using a JavaScript UDF which solves this albeit in a slower way than the answer I accepted. For completeness, I'm posting it here because perhaps someone (like myself in the future) may find it useful.
CREATE TEMPORARY FUNCTION CONTAINS_ANY(str STRING, fragments ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
for (var i in fragments) {
if (str.indexOf(fragments[i]) >= 0) {
return fragments[i];
}
}
return null;
""";
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
WHERE text IS NOT NULL
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE name IS NOT NULL
GROUP BY name
), fragment_array AS (
SELECT ARRAY_AGG(name) AS names, COUNT(*) AS count
FROM fragment
GROUP BY LENGTH(name)
), records_with_fragments AS (
SELECT record.name,
CONTAINS_ANY(record.name, fragment_array.names)
AS fragment_name
FROM record INNER JOIN fragment_array
ON CONTAINS_ANY(name, fragment_array.names) IS NOT NULL
)
SELECT * EXCEPT(rownum) FROM (
SELECT record.name,
records_with_fragments.fragment_name,
ROW_NUMBER() OVER (PARTITION BY record.name) AS rownum
FROM record
INNER JOIN records_with_fragments
ON records_with_fragments.name = record.name
AND records_with_fragments.fragment_name IS NOT NULL
) WHERE rownum = 1
The idea is that the list of fragments is relatively small enough that it can be processed in an array, similar to Felipe's answer using regular expressions. The first thing I do is create a fragment_array table which is grouped by the fragment lengths ... a cheap way of preventing an over-sized array which I found can cause UDF timeouts.
Next I create a table called records_with_fragments that joins those arrays to the original records, finding only those which contain a matching fragment using the JavaScript UDF CONTAINS_ANY(). This will result in a table containing some duplicates since one record may match multiple fragments.
The final SELECT then pulls in the original record table, joins to records_with_fragments to determine which fragment matched, and also uses the ROW_NUMBER() function to prevent duplicates, e.g. only showing the first row of each record as uniquely identified by its name.
Now, the reason I do the join in the final query is because in my actual data there are more fields I want besides just the string being matched. Earlier on in my actual data I create a table of DISTINCT strings which then later need to be re-joined.
Voila! Not the most elegant but it gets the job done.

Search and extract data from SQL column having some delimiter

I have column in table having data in below format:
(DeliveryMethod+NON;Installation_Method+NoInstallation;Services_Reference_ID+100118547,44444,33333;Service_ID+2222)
(key+value;key+value;key+value;key+value;key+value;key+value;key+value;)
I want to search and extract a particular "value" from this column based on specific "key" and "key+value" can be on any position, how to do this using a SQL query?
Here's one way to approach it in Oracle as I answered in this post: Oracle 11gR2: split string with multiple delimiters(add). Hopefully you can apply the logic to your RDBMS. Note that this answer doesn't just get the value from the string, but attempts to parse the string and return values so they can be processed like rows in a query's result set. This may be overkill for your scenario. At any rate, it's just one way to look at it.
-- Original data with multiple delimiters and a NULL element for testing.
with orig_data(str) as (
select 'DeliveryMethod+NON;Installation_Method+NoInstallation;;Services_Reference_ID+100118547,44444,33333;Service_ID+2222' from dual
),
--Split on first delimiter (semi-colon)
Parsed_data(rec) as (
select regexp_substr(str, '(.*?)(;|$)', 1, LEVEL, NULL, 1)
from orig_data
CONNECT BY LEVEL <= REGEXP_COUNT(str, ';') + 1
)
-- For testing-shows records based on 1st level delimiter
--select rec from parsed_data;
-- Split the record into columns
select regexp_replace(rec, '^(.*)\+.*', '\1') col1,
regexp_replace(rec, '^.*\+(.*)', '\1') col2
from Parsed_data;
Result:
To specifically answer your question, in order to get a value based on a key, change the last query to this in order to get the value where the key is 'Service_ID':
select value
from (
select regexp_replace(rec, '^(.*)\+.*', '\1') key,
regexp_replace(rec, '^.*\+(.*)', '\1') value
from Parsed_data )
where key = 'Service_ID';
Result:
Or to just extract it out of the string using a regular expression:
with orig_data(str) as (
select 'Service_ID+2222;DeliveryMethod+NON;Installation_Method+NoInstallation;;Services_Reference_ID+100118547,44444,33333' from dual
)
select regexp_substr(str, '(.*?)Service_ID\+(.+?)(;|$)', 1, 1, NULL, 2) value
from orig_data;

PostgreSQL get last value in a comma separated list of values

In a PostgreSQL table I have a column which has values like
AX,B,C
A,BD
X,Y
J,K,L,M,N
In short , it will have a few comma separated strings in the column for each record. I wanted to get the last one in each record. I ended up with this.
select id, reverse(substr(reverse(mycolumn),1,position(',' in reverse(mycolumn)))) from mytable order by id ;
Is there an easier way?
I would do it this way:
select reverse(split_part(reverse(myColumn), ',', 1))
With regexp_replace:
select id, regexp_replace(mycolumn, '.*,', '')
from mytable
order by id;
Is there an easier way?
With your current data, Gordon's answer works best imo. Other options would be a regex (messy), or converting the column to a text[] array e.g. ('{' || col || '}')::text[] or variations thereof.
If you were using a text[] array instead of plain text for your column, you'd want to use array functions directly:
select col[array_length(col, 1)]
http://www.postgresql.org/docs/current/static/functions-array.html
Example with dummy data:
with bar as (
select '{a,b,c}'::text[] as foo
)
select foo[array_length(foo, 1)] from bar;
You could, of course, also create a parse_csv() function or get_last_csv_value() function to avoid writing the above.