Why am I getting empty result by using regex in sql? - sql

There is a column named as keyword of the product table.
+---------+
| keyword |
+---------+
| dump |
| dump2 |
| dump4 |
| dump5 |
| pro |
+---------+
I am fetching those results from product table by using regex whose keyword containing the string du anywhere.
I used select * from products where keyword LIKE '%[du]%';
but it is returning empty set.
What am I doing wrong here ?

If you must use regex, you can just use du as the regex; that will match the string du anywhere in the keyword:
SELECT *
FROM products
WHERE keyword REGEXP 'du'
Output:
keyword
dump
dump2
dump4
dump5
Demo on dbfiddle

Related

Is there a way to alter all columns in a SQL table to have 'utf-8' format

I'm using Spark and I found that my data is not being correctly interpreted. I've tried using decode and encode built-in functions but they can be applied only to one column at a time.
Update:
An example of the behaviour I am having:
+-----------+
| Pa�s |
+-----------+
| Espa�a |
+-----------+
And the one I'm expecting:
+-----------+
| País |
+-----------+
| España |
+-----------+
The sentence is just a simple
SELECT * FROM table

Conditional update column B with modified value based on column A

I am facing a large table with data that got imported from a csv. However the delimiters in the csv where not sanitized, so the input data looked something like this:
alex#mail.com:Alex
dummy#mail.com;Bob
foo#bar.com:Foo
spam#yahoo.com;Spam
whatever#mail.com:Whatever
During the import : was defined as the delimiter, so each row with the delimiter ; was not imported properly. This resulted in a table structured like this:
| ID | MAIL | USER |
|-- --|---------------------|----------|
| 1 | alex#mail.com | ALEX |
| 2 | dummy#mail.com;Bob | NULL |
| 3 | foo#bar.com | Foo |
| 4 | spam#yahoo.com;Spam | NULL |
| 5 | whatever#mail.com | Whatever |
As reimporting is no option I was thinking about manually sanitizing the data in the affected rows by using SQL queries. So I tried to combine SELECT and UPDATE statements by filtering rows WHERE USER IS NULL and update both columns with the correct value where applicable.
What you need are string functions. Reading a bit, I find that Google BigQuery has STRPOS() and SUBSTR().
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#substr
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#strpos
An update query to fix the situation you are describing looks like this:
update table_name set mail =SUBSTR(mail,1,STRPOS(mail,';')-1), user =SUBSTR(mail,STRPOS(mail,';')+1) where user is null
The idea here is to split mail in its two parts, the part before the ; and the part after. Hope this helps.

How to extract content from a json list?

There is a long string value which contains a json list:
{"name":"jack","age":"38","city":"JP"},{"name":"lee","age":"42","city":"tjs"},{"name":"smith","age":"46","city":"kh"}
The objective is to extract info of name, so the result of this is 'jack,lee,smith'.
I tried get_json_object but it returns null; I also tried get_json_object with split but still not worked...
Is there any suitable function in Hive that can implement this demand?
with t as (select '{"name":"jack","age":"38","city":"JP"},{"name":"lee","age":"42","city":"tjs"},{"name":"smith","age":"46","city":"kh"}' as myjson)
select get_json_object(concat('{"x":[',myjson,']}'),'$.x.name[*]') as names
from t
+------------------------+
| names |
+------------------------+
| ["jack","lee","smith"] |
+------------------------+
with t as (select '{"name":"jack","age":"38","city":"JP"},{"name":"lee","age":"42","city":"tjs"},{"name":"smith","age":"46","city":"kh"}' as myjson)
select translate(get_json_object(concat('{"x":[',myjson,']}'),'$.x.name[*]'),'[]"','') as names
from t
+----------------+
| names |
+----------------+
| jack,lee,smith |
+----------------+

Oracle SQL regex extraction

I have data as follows in a column
+----------------------+
| my_column |
+----------------------+
| test_PC_xyz_blah |
| test_PC_pqrs_bloh |
| test_Mobile_pqrs_bleh|
+----------------------+
How can I extract the following as columns?
+----------+-------+
| Platform | Value |
+----------+-------+
| PC | xyz |
| PC | pqrs |
| Mobile | pqrs |
+----------+-------+
I tried using REGEXP_SUBSTR
Default first pattern occurrence for platform:
select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)') as platform from table
Getting second pattern occurrence for value:
select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)', 1, 2) as value from table
This isn't working, however. Where am I going wrong?
For Non-empty tokens
select regexp_substr(my_column,'[^_]+',1,2) as platform
,regexp_substr(my_column,'[^_]+',1,3) as value
from my_table
;
For possibly empty tokens
select regexp_substr(my_column,'^.*?_(.*)?_.*?_.*$',1,1,'',1) as platform
,regexp_substr(my_column,'^.*?_.*?_(.*)?_.*$',1,1,'',1) as value
from my_table
;
+----------+-------+
| PLATFORM | VALUE |
+----------+-------+
| PC | xyz |
+----------+-------+
| PC | pqrs |
+----------+-------+
| Mobile | pqrs |
+----------+-------+
(.*) is greedy by nature, it will match all character including _ character as well, so test_(.*) will match whole of your string. Hence further groups in pattern _(.*)_(.*) have nothing to match, whole regex fails. The trick is to match all characters excluding _. This can be done by defining a group ([^_]+). This group defines a negative character set and it will match to any character except for _ . If you have better pattern, you can use them like [A-Za-z] or [:alphanum]. Once you slice your string to multiple sub strings separated by _, then just select 2nd and 3rd group.
ex:
SELECT REGEXP_SUBSTR( my_column,'(([^_]+))',1,2) as platform, REGEXP_SUBSTR( my_column,'(([^_]+))',1,3) as value from table;
Note: AFAIK there is no straight forward method to Oracle to exact matching groups. You can use regexp_replace for this purpose, but it unlike capabilities of other programming language where you can exact just group 2 and group 3. See this link for example.

How to get records from a table where some field's value is in camel-case

I have a table like this,
+----+-----------+
| Id | Value |
+----+-----------+
| 1 | ABC_DEF |
| 31 | AcdEmc |
| 44 | AbcDef |
| 2 | BAA_CC_CD |
| 55 | C_D_EE |
+----+-----------+
I need a query to get the records which Value is only in camelcase (ex: AcdEmc, AbcDef etc. not ABC_DEF).
Please note that this table has only these two types of string values.
You can use UPPER() for this
select * from your_table
where upper(value) <> value COLLATE Latin1_General_CS_AS
If your default collation is case-insensitive you can force a case-sensitive collation in your where clause. Otherwise you can remove that part from your query.
Based on the sample data, the following will work. I think the issue we're dealing with is checking whether the string contains underscores.
SELECT * FROM [Foo]
WHERE Value NOT LIKE '%[_]%';
See Fiddle
UPDATE: Corrected error. I forgot '_' meant "any character".