Can TRIM function in SQL be more precise? - sql

I have an import table that I want to split into other tables. There's a column in the import table which the data of it's fields have this format: ['email', 'phone', 'facebook'].
In order to separate each value on the field (email, phone and facebook in this case) I'm using the function TRIM when I insert into it's corresponding table, like this:
insert into Media (media)
select distinct trim ('{}' from regexp_split_to_table(host_media, ','))
from ImportH;
But the data inserted into the new table looks dirty, for example in row 1 I would have ['email', in row 2: 'phone' and in row 3: 'facebook'].
How can I make it so the data inserts into the table in a clean way, without the '[' and the floating commas?
I'll provide an image of the import table data of this column and what I get when I split it:

You could just change the splitter:
select *
from regexp_split_to_table('[''email'', ''phone'', ''facebook'']', '[^_a-zA-Z]+') s(value)
where value <> '';
The splitters are just whatever characters are NOT valid characters for the strings you want.
Here is a db<>fiddle.

One of the options... Maybe it will be ok for you:
with trimed as
(select substring(host_media from 2 for length(host_media)-2) as clean_col
from ImportH
)
insert into Media (media)
select unnest(string_to_array(clean_col, ','))
from trimed;
Here is a demo
I have understood the "floating commas" term little bit late so I added some changes to my query:
with trimed as
(select replace(substring(host_media from 2 for length(host_media)-2), '''', '') as clean_col
from ImportH
)
insert into Media (media)
select ltrim(unnest(string_to_array(clean_col, ',')))
from trimed;

Here is a different approach which uses json functions.
First create media table;
CREATE TABLE media AS (
WITH my_medias AS (
SELECT
jsonb_array_elements_text(jsonb_agg(DISTINCT medias ORDER BY medias)) media
FROM impor_th, jsonb_array_elements_text(replace(host_media, '''', '"')::jsonb) medias
)
SELECT
row_number() OVER () media_id,
*
FROM my_medias
);
Then create many-to-many relation table;
CREATE TABLE media_host AS (
WITH elements AS (
SELECT
id,
jsonb_array_elements_text(replace(host_media, '''', '"')::jsonb) media
FROM impor_th i
)
SELECT
e.id AS impor_th_id,
m.media_id
FROM elements e
JOIN media m ON m.media = e.media
);

Related

SQL to update rows to remove words with less than N characters

I have a TAGS column in my Products table in SQL Server. In my web project I split them with space, for example:
"web web_design website website_design"
=> 1.web 2. web_design 3. website ......
How I can remove words with less than N characters from tags? Is it possible with a regex?
For example if N=4 so "web" will be removed from my example and the rest remains.
I will give a solution to do this without changing your design at the bottom of this answer,
but I really think you should fix the design, here is an example on how to do that.
It starts from a table called "mytable" that has your "tag" column with all the data,
then it creates a detail table and populates it with the splitted values from your tag column
and then it is very easy to do what you want to do
create table tags (id int identity primary key, mytable_id int, tag varchar(100))
insert into tags (mytable_id, tag)
select t.id,
value
from mytable t
cross apply string_split(t.tag, ' ')
alter table mytable drop column tag
See a complete example in this dbfiddle
EDIT
if you need to show it again as if it where in one table, you can use string_agg like this
select m.id,
m.name,
( select string_agg(t.tag, ' ')
from tags t
where t.mytable_id = m.id
) as tags
from mytable m
You can see this at work in this dbfiddle
EDIT 2
And if you really want to stick to your design, here is how you can remove the words from your tag column
But I recommend not doing this, as you can see in the examples above it is not so hard to fix the design and create a new table to hold the tags.
update m
set m.tag =
( select string_agg(value, ' ')
from mytable t
cross apply string_split(m.tag, ' ')
where len(value) > 3
and t.id = m.id
)
from mytable m
Look at this dbfiddle to see it in action

query json data from oracle 12.1 having fields value with "."

I have a table that has JSON data stored and I'm using json_exists functions in the query. Below is my sample data from the column for one of the rows.
{"fields":["query.metrics.metric1.field1",
"query.metrics.metric1.field2",
"query.metrics.metric1.field3",
"query.metrics.metric2.field1",
"query.metrics.metric2.field2"]}
I want all those rows which have a particular field. So, I'm trying below.
SELECT COUNT(*)
FROM my_table
WHERE JSON_EXISTS(fields, '$.fields[*]."query.metrics.metric1.field1"');
It does not give me any results back. Not sure what I'm missing here. Please help.
Thanks
You can use # operator which refers to an occurrence of the array fields such as
SELECT *
FROM my_table
WHERE JSON_EXISTS(fields, '$.fields?(#=="query.metrics.metric1.field1")')
Demo
Edit : The above case works for 12R2+, considering that it doesn't work for your version(12R1), try to use JSON_TABLE() such as
SELECT fields
FROM my_table,
JSON_TABLE(fields, '$.fields[*]' COLUMNS ( js VARCHAR2(90) PATH '$' ))
WHERE js = 'query.metrics.metric1.field1'
Demo
I have no idea how to "pattern match" on the array element, but just parsing the whole thing and filtering does the job.
with t(x, json) as (
select 1, q'|{"fields":["a", "b"]}|' from dual union all
select 2, q'|{"fields":["query.metrics.metric1.field1","query.metrics.metric1.field2","query.metrics.metric1.field3","query.metrics.metric2.field1","query.metrics.metric2.field2"]}|' from dual
)
select t.*
from t
where exists (
select null
from json_table(
t.json,
'$.fields[*]'
columns (
array_element varchar2(100) path '$'
)
)
where array_element = 'query.metrics.metric1.field1'
);
In your code, you are accessing the field "query.metrics.metric1.field1" of an object in the fields array, and there is no such object (the elements are strings)...

How to efficiently select records matching substring in another table using BigQuery?

I have a table of several million strings that I want to match against a table of about twenty thousand strings like this:
#standardSQL
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Unfortunately this is taking an awful long time.
Considering that the fragment table is only 20k records, can I load it into a JavaScript array using a UDF and match it that way? I'm trying to figure out how to this right now but perhaps there's already some magic I could do here to make this faster. I tried a CROSS JOIN and got resource exceeded fairly quickly. I've also tried using EXISTS but I can't reference the record.name inside that subquery's WHERE without getting an error.
Example using Public Data
This seems to reflect about the same amount of data ...
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT LOWER(name) AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Below is for BigQuery Standard SQL
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT DISTINCT LOWER(name) AS name
FROM `bigquery-public-data.usa_names.usa_1910_current`
), temp_record AS (
SELECT record, TO_JSON_STRING(record) id, name, item
FROM record, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
), temp_fragment AS (
SELECT name, item FROM fragment, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
)
SELECT AS VALUE ANY_VALUE(record) FROM (
SELECT ANY_VALUE(record) record, id, r.name name, f.name fragment_name
FROM temp_record r
JOIN temp_fragment f
USING(item)
GROUP BY id, name, fragment_name
)
WHERE name LIKE CONCAT('%', fragment_name, '%')
GROUP BY id
above was completed in 375 seconds, while original query is still running at 2740 seconds and keep running, so I will not even wait for it to complete
Mikhail's answer appears to be faster - but lets have one that doesn't need to SPLIT nor separate the text into words.
First, compute a regular expression with all the words to be searched:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT FORMAT('(%s)',STRING_AGG(name,'|'))
FROM fragment
Now you can take that resulting string, and use it in a REGEX ignoring case:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), largestring AS (
SELECT '(?i)(mary|margaret|helen|more_names|more_names|more_names|josniel|khaiden|sergi)'
)
SELECT record.* FROM `record`
WHERE REGEXP_CONTAINS(record.name, (SELECT * FROM largestring))
(~510 seconds)
As eluded to in my question, I worked on a version using a JavaScript UDF which solves this albeit in a slower way than the answer I accepted. For completeness, I'm posting it here because perhaps someone (like myself in the future) may find it useful.
CREATE TEMPORARY FUNCTION CONTAINS_ANY(str STRING, fragments ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
for (var i in fragments) {
if (str.indexOf(fragments[i]) >= 0) {
return fragments[i];
}
}
return null;
""";
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
WHERE text IS NOT NULL
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE name IS NOT NULL
GROUP BY name
), fragment_array AS (
SELECT ARRAY_AGG(name) AS names, COUNT(*) AS count
FROM fragment
GROUP BY LENGTH(name)
), records_with_fragments AS (
SELECT record.name,
CONTAINS_ANY(record.name, fragment_array.names)
AS fragment_name
FROM record INNER JOIN fragment_array
ON CONTAINS_ANY(name, fragment_array.names) IS NOT NULL
)
SELECT * EXCEPT(rownum) FROM (
SELECT record.name,
records_with_fragments.fragment_name,
ROW_NUMBER() OVER (PARTITION BY record.name) AS rownum
FROM record
INNER JOIN records_with_fragments
ON records_with_fragments.name = record.name
AND records_with_fragments.fragment_name IS NOT NULL
) WHERE rownum = 1
The idea is that the list of fragments is relatively small enough that it can be processed in an array, similar to Felipe's answer using regular expressions. The first thing I do is create a fragment_array table which is grouped by the fragment lengths ... a cheap way of preventing an over-sized array which I found can cause UDF timeouts.
Next I create a table called records_with_fragments that joins those arrays to the original records, finding only those which contain a matching fragment using the JavaScript UDF CONTAINS_ANY(). This will result in a table containing some duplicates since one record may match multiple fragments.
The final SELECT then pulls in the original record table, joins to records_with_fragments to determine which fragment matched, and also uses the ROW_NUMBER() function to prevent duplicates, e.g. only showing the first row of each record as uniquely identified by its name.
Now, the reason I do the join in the final query is because in my actual data there are more fields I want besides just the string being matched. Earlier on in my actual data I create a table of DISTINCT strings which then later need to be re-joined.
Voila! Not the most elegant but it gets the job done.

How to search SQL column containing JSON array

I have a SQL column that has a single JSON array:
{"names":["Joe","Fred","Sue"]}
Given a search string, how can I use SQL to search for a match in the names array? I am using SQL 2016 and have looked at JSON_QUERY, but don't know how to search for a match on a JSON array. Something like below would be nice.
SELECT *
FROM table
WHERE JSON_QUERY(column, '$.names') = 'Joe'
For doing a search in a JSON array, one needs to use OPENJSON
DECLARE #table TABLE (Col NVARCHAR(MAX))
INSERT INTO #table VALUES ('{"names":["Joe","Fred","Sue"]}')
SELECT * FROM #table
WHERE 'Joe' IN ( SELECT value FROM OPENJSON(Col,'$.names'))
or as an alternative, one can use it with CROSS APPLY.
SELECT * FROM
#table
CROSS APPLY OPENJSON(Col,'$.names')
WHERE value ='Joe'
Postgres syntax
When you know the key that holds the data:
SELECT column_name from table_name where column_name->>'key' LIKE '%QUERYSTRING%';
When you don't know the key that holds the data:
SELECT column_name from table_name where column_name::text LIKE '%QUERYSTRING%';
It's very simple , can be easily done using JSON_CONTAINS() function.
SELECT * FROM table
where JSON_CONTAINS(column, 'joe','$.name');
You can search for a match on a JSON array via below query:
SELECT JSON_EXTRACT(COLUMN, "$.names") AS NAME
FROM TABLE JSON_EXTRACT(COLUMN, "$.names") != ""
Replace the TABLE with your table name and COLUMN with the column name in the table.
the key I have mentioned name as it was there in your question.
Just want to add to the existing answers a simple solution how you can check if array inside json contains a value:
DECLARE #Json NVARCHAR(max) = '{"names":["Joe","Fred","Sue"]}'
IF EXISTS (SELECT value FROM OPENJSON(#Json,'$.names') WHERE value = 'Joe')
PRINT 'Array contains "Joe"'
ELSE
PRINT 'Array does not contain "Joe"'

compare some lists in where condition sql

I have some question in Sqlserver2012. I have a table that contains a filed that save who System Used from this information and separated by ',', I want to set into parameter the name of Systems and query the related rows:
declare #System nvarchar(50)
set #System ='BPM,SEM'
SELECT *
FROM dbo.tblMeasureCatalog t1
where ( ( select Upper(value) from dbo.split(t1.System,','))
= any( select Upper(value) from dbo.split(#System,',')))
dbo.split is a function to return systems in separated rows
Forgetting for a second that storing delimited lists in a relational database is abhorrent, you can do it using a combination of INTERSECT and EXISTS, for example:
DECLARE #System NVARCHAR(50) = 'BPM,SEM';
DECLARE #tblMeasureCatalog TABLE (System VARCHAR(MAX));
INSERT #tblMeasureCatalog VALUES ('BPM,XXX'), ('BPM,SEM'), ('XXX,SEM'), ('XXX,YYY');
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE EXISTS
( SELECT Value
FROM dbo.Split(mc.System, ',')
INTERSECT
SELECT Value
FROM dbo.Split(#System, ',')
);
Returns
System
---------
BPM,XXX
BPM,SEM
XXX,SEM
EDIT
Based on your question stating "Any" I assumed that you wanted rows where the terms matched any of those provided, based on your comment I now assume you want records where the terms match all. This is a fairly similar approach but you need to use NOT EXISTS and EXCEPT instead:
Now all is still quite ambiguous, for example if you search for "BMP,SEM" should it return a record that is "BPM,SEM,YYY", it does contain all of the searched terms, but it does contain additional terms too. So the approach you need depends on your requirements:
DECLARE #System NVARCHAR(50) = 'BPM,SEM,XXX';
DECLARE #tblMeasureCatalog TABLE (System VARCHAR(MAX));
INSERT #tblMeasureCatalog
VALUES
('BPM,XXX'), ('BPM,SEM'), ('XXX,SEM'), ('XXX,YYY'),
('SEM,BPM'), ('SEM,BPM,XXX'), ('SEM,BPM,XXX,YYY');
-- METHOD 1 - CONTAINS ALL SEARCHED TERMS BUT CAN CONTAIN ADDITIONAL TERMS
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
(
SELECT Value
FROM dbo.Split(#System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(mc.System, ',')
);
-- METHOD 2 - ONLY CONTAINS ITEMS WITHIN THE SEARCHED TERMS, BUT NOT
-- NECESSARILY ALL OF THEM
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
( SELECT Value
FROM dbo.Split(mc.System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(#System, ',')
);
-- METHOD 3 - CONTAINS ALL ITEMS IN THE SEARCHED TERMS, AND NO ADDITIONAL ITEMS
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
( SELECT Value
FROM dbo.Split(#System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(mc.System, ',')
)
AND LEN(mc.System) = LEN(#System);
You have a problem with your data structure because you are storing lists of things in a comma-delimited list. SQL has a great data structure for storing lists. It goes by the name "table". You should have a junction table with one row per "measure catalog" and "system".
Sometimes, you are stuck with other people's really bad design decisions. One solution is to use split(). Here is one method:
select mc.*
from dbo.tblMeasureCatalog mc
where exists (select 1
from dbo.split(t1.System, ',') t1s join
dbo.split(#System, ',') ss
on upper(t1s.value) = upper(ss.value)
);
you can try this :
declare #System nvarchar(50)
set #System ='BPM,SEM'
SELECT * from dbo.tblMeasureCatalog t1 inner join dbo.Split (#System ,',') B on t1.it=B.items