Regexp in a matching column sql - sql

I have 2 tables. One of them having one of the columns as regexp's. I want to join the two tables on the regexp column. Cannot seem to be able to do this.
To elaborate: table 1 is (errortype, action, error message) and table 2 is(regexp for error message of each type, error code) join columns are error message and regexp, both string. Is this possible? Thanks.
Select * from table1
left outer join table2 on table1.error_message=table2.regExp.
Issue : table2.regExp has actual regular expressions that should match one or more table1.error_messages. the = does not work here.

You need to use the REGEXP or RLIKE operator. = performs exact matches, not pattern matching.
SELECT *
FROM table1
LEFT OUTER JOIN table2
ON table1.error_message REGEXP table2.regexp

Related

BigQuery select, join from multiple datasets and avoid name conflicts

Imagine I have several datasets and tables.
Format: dataset.table.field
dataset01.table_xxx.field_z
dataset02.table_xxx.field_z
I try to write smth like
select
dataset01.table_xxx.field_z as dataset01_table_xxx_field_z,
dataset02.table_xxx.field_z as dataset02_table_xxx_field_z
from dataset01.table_xxx
join dataset02.table_xxx on dataset02.table_xxx.field_z = dataset01.table_xxx.field_z
to avoid conflicting names
BigQuery says that dataset01.table_xxx.field_xxx is unrecognised name in SELECT clause.
it complains about unrecognised name in join clause too.
Query works if I remove dataset01, dataset02 from SELECT clause and on condition
What is the right way to refer fields in such case?
select
t1.field_z as dataset01_table_xxx_field_z,
t2.field_z as dataset02_table_xxx_field_z
from dataset01.table_xxx t1
join dataset02.table_xxx t2
on t2.field_z = t1.field_z

Hive - Multiple sub-queries in where clause is failing

I am trying to create a table by checking two sub-query expressions within the where clause but my query fails with the below error :
Unsupported sub query expression. Only 1 sub query expression is
supported
Code snippet is as follows (Not the exact code. Just for better understanding) :
Create table winners row format delimited fields terminated by '|' as
select
games,
players
from olympics
where
exists (select 1 from dom_sports where dom_sports.players = olympics.players)
and not exists (select 1 from dom_sports where dom_sports.games = olympics.games)
If I execute same command with only one sub-query in where clause it is getting executed successfully. Having said that is there any alternative to achieve the same in a different way ?
Of course. You can use left join.
Inner join will act as exists. and left join + where clause will mimic the not exists.
There can be issue with granularity but that depends on your data.
select distinct
olympics.games,
olympics.players
from olympics
inner join dom_sports dom_sports on dom_sports.players = olympics.players
left join dom_sports dom_sports2 where dom_sports2.games = olympics.games
where dom_sports2.games is null

Hive Query: defining a variable which is a list of strings

How can I create a constant list and use it in the WHERE clause of my query?
For example, I have a hive query, where I say
Select t1.Id,
t1.symptom
from t1
WHERE lower(symptom) NOT IN ('coughing','sneezing','xyz', etc,...)
Instead of keep repeating this long list of symptoms (which makes the code very ugly), is there a way to define it ahead of time as
MyList = ('coughing','sneezing','x',...)
and then in WHERE clause I'd just say WHERE lower(symptom) not in MyList.
You can put the list in a table and use join:
Select t1.Id, t1.symptom
from t1
where lower(symptom) NOT IN (select symptom from mysymptoms_list);
This persists the list, so it can be used in multiple queries.
You can use hive variable to do this.
SET hivevar:InClause=('coughing','sneezing','x',...)
Make sure you don't leave spaces either side of equals.
SELECT t1.Id,
t1.symptom
FROM t1
WHERE LOWER(symptom) NOT IN ${InClause}
If you are comfortable with joins, you can use a left join with where clause:
Select t1.Id, t1.symptom
from
t1 A left join MyList B
on
lower(A.symptom) = lower(B.symptom)
where lower(B.symptom) IS NULL;
This query will retain all symptoms(A.symptom) from table t1 in one column and for the second column(B.symptom) corresponding to the table MyList, the value will be same as the symptom in t1 if a match is found or NULL if a match is not found.
You want those where a match is not found, hence the where clause.

SQL join result errors

I'm trying to run this join and I'm not receiving the correct values.
My first query return like 25,000 record
SELECT count(*) from table1 as DSO,
table2 as EAR
WHERE
(UCASE(TRIM(EAR.value)) = UCASE(TRIM(DSO.value))
AND
UCASE(TRIM(EAR.value1) = UCASE(TRIM(DSO.value1))
my second Query return like 3,000,000
SELECT count(*) from table1 as DSO
left join table2 as EAR,
ON
(UCASE(TRIM(EAR.value)) = UCASE(TRIM(DSO.value))
AND
UCASE(TRIM(EAR.value1) = UCASE(TRIM(DSO.value1))
The total of records of the table 1 are like 45,000, thats what I Should recieve.
First query is an INNER JOIN and second one is a LEFT JOIN. You should expect quite different results. Also, look at the way db2400 treats NULLs with the UCASE and TRIM functions. My guess is that your left join is making some matches that you don't want.
The INNER JOIN in the first query is going to exclude any records from table1 that don't have a match in table2. That pretty quickly explains the lower count.
Either join will happily create more than one row for each record in table1 if it finds multiple matches in table2. The difference is that the LEFT JOIN will ALSO create one row for each record in table1 that doesn't have a match in table2. It sounds like you expect there to be a 1:1 match between the two tables, but that is not what you are getting.

SQL Syntax JOIN google bigquery

I am getting this error in Google BigQuery:
Error: Ambiguous field reference in SELECT clause. (Remember to fully qualify all field names in the SELECT clause as <table.field>.)
My query is
SELECT LoanPerf1.loankey, Loans.Key
FROM prosperloans1.LoanPerf1
JOIN prosperloans1.Loans
ON LoanPerf1.loankey = Loans.Key
prosperloans1 is the dataset id
the 2 table names are correct.
the error returns regardless of which field name appears first in the select clause.
The documentation on Google SQL Syntax says this is correct:
// Simple JOIN of two tables
SELECT table1.id, table2.username
FROM table1
JOIN table2
ON table1.name = table2.name AND table1.id = table2.customer_id;
Thanks
Shawn
Try adding an AS clause to the table names:
SELECT LoanPerf1.loankey, Loans.Key
FROM prosperloans1.LoanPerf1 as LoanPerf1
JOIN prosperloans1.Loans as Loans
ON LoanPerf1.loankey = Loans.Key