Unable to join using wildcards in BigQuery - sql

I am trying to join two tables in big query,
Table1 contains an ID column, and Table2 contains a column which has the same ID or multiple ID's in the form of a long string separated by commas, like "id123,id456,id678"
I can join the tables together if Table1.ID = Table2.ID but this ignores all the rows where Table1.ID is one of the multiple IDs in Table2.ID.
I have looked at similar post that tell me to use wildcards like
on concat('%',Table1.ID,'%') = Table2.ID
but this does not work, because it seems to create a string that contains the '%' character and doesn't actually use it as a wildcard.
I'm using standard sql in BigQuery, any help would be appreciated

Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table1` AS (
SELECT 123 id, 'a' test UNION ALL
SELECT 456, 'b' UNION ALL
SELECT 678, 'c'
), `project.dataset.table2` AS (
SELECT 'id123,id456' id UNION ALL
SELECT 'id678'
)
SELECT t2.id, test
FROM `project.dataset.table2` t2, UNNEST(SPLIT(id)) id2
JOIN `project.dataset.table1` t1
ON CONCAT('id', CAST(t1.id AS STRING)) = id2
result is as below
Row id test
1 id123,id456 a
2 id123,id456 b
3 id678 c

It is doubtful that you have values in the table that start and end with percentage signs. = does not recognize wildcards; like does:
on Table2.ID like concat('%', Table1.ID, '%')
As a warning. Such a construct is usually a performance killer. You would be better off trying to have columns in Table1 and Table2 that match exactly.

Related

Use wildcards that are stored in table columns in SQL-queries with MS Access

I want to join two tables in Access based on different wildcards for different rows.
The first, table1, contains rows with different wildcards and table2 contains the column that should be matched with the wildcards in table1.
I imagine the SQL code to look like:
SELECT *
FROM table2
LEFT JOIN table1
ON table2.subject LIKE table1.wildcard
The tables look like this: https://imgur.com/a/O9OPAL6
The third pictures shows the result that I want.
How do I execute the join or is there an alternative?
I don't think MySQL support non-equality conditions for JOINs. So, you can do this as:
SELECT * -- first get the matches
FROM table2 as t2, -- ugg, why doesn't it support CROSS JOIN
table1 as t1
WHERE t2.subject LIKE t1.wildcard
UNION ALL
SELECT * -- then get the non-matches
FROM table2 as t2 LEFT JOIN
table1 as t1
ON 1 = 0 -- always false but gets the same columns
WHERE NOT EXISTS (SELECT 1
FROM table1 as t1
WHERE t2.subject LIKE t1.wildcard
);

how to find the difference between two tables in oracle?

I have two tables named A and B where most of the columns are different and the common column is name. Now I want to find the records which are extra in table A based on the common field
name. How to get these?
One more thing here we have to check is a few names in table B have words like 'dummy_','Test_' on the beginning which we have to trim. Say for example
table A is having name ='Div_text_col_tar' and B is having name ='dummy_Div_text_col_tar' which actually the same. So we have to replace 'dummy_' and 'Test_' from the beginning of names. How to do it?
I tried like shown below without any luck:
SELECT *
FROM A t1
WHERE NOT EXISTS
(SELECT 1
FROM B t2
WHERE t1.name = REGEXP_SUBSTR(t2.name,'[^-dummy_|-Test_]+',1,1)
)
AND t1.status =100
AND t1.floor IN ('1','2','3')
I think I would go for:
SELECT t1.*
FROM A t1
WHERE NOT EXISTS (SELECT 1
FROM B t2
WHERE t2.name IN (t1.name, 'dummy_' || t1.name, 'Test_' || t1.name)
) AND
t1.status = 100 AND
t1.floor IN (1, 2, 3); -- presumably, these are numbers, not strings
This seems simpler and easier to follow than using regular expressions.

Combine LIKE and IN using only WHERE clause

I know this question has been asked, but I have a slightly different flavour of it. I have a use case where the only thing I have control over is the WHERE clause of the query, and I have 2 tables.
Using simple example:
Table1 contains 1 column named "FULLNAME" with hundreds of values
Table2 contains 1 column named "PATTERN" with some matching text
so, What I need to do is select all values from Table 1 which match the values in table 2.
Here's a simple example:
Table1 (FULLNAME)
ANTARCTICA
ANGOLA
AUSTRALIA
AFRICA
INDIA
INDONESIA
Table2 (PATTERN)
AN
IN
Effectively what I need is the entries in Table1 which contain the values from Table2 (result would be ANTARCTICA, ANGOLA, INDIA, INDONESIA)
In other words, what I need is something like:
Select * from Table1 where FULLNAME IN LIKE (Select '%' || Pattern || '%' from
Table2)
The tricky thing here is I only have control over the where clause, I can't control the Select clause at all or add joins since I'm using a product which only allows control over the where clause. I can't use stored procedures either.
Is this possible?
I'm using Oracle as the backend DB
Thanks
One possible approach is to use EXISTS in combination with LIKE in the subquery:
select * from table1 t1
where exists (select null
from table2 t2
where t1.fullname like '%' || t2.pattern || '%');
I believe that you can do this with a simple JOIN:
SELECT DISTINCT
fullname
FROM
Table1 T1
INNER JOIN Table2 T2 ON T1.fullname LIKE '%' || T2.pattern || '%'
The DISTINCT is there for those cases where you might have a match to multiple rows in Table2.
If the patterns are always two characters and only have to match the start of the full name, like the examples you showed, you could do:
Select * from Table1 where substr(FULLNAME, 1, 2) IN (Select Pattern from Table2)
Which prevents any index on Table1 being used, and your real case may need to be more flexible...
Or probably even less efficiently, similar to TomH's approach, but with the join inside a subquery:
Select * from Table1 where FULLNAME IN (
Select t1.FULLNAME from Table1 t1
Join Table2 t2 on t1.FULLNAME like '%'||t2.Pattern||'%')
Right, this involved a bit of trickery. Conceptually what I've done is turned the column from the PATTERN into a single cell, and use that with REGEX_LIKE
So the values "AN and IN" becomes one single value '(AN|IN)' - I just feed this to the regexp_like
SELECT FULLNAME from table1 where
regexp_like(FULLNAME,(SELECT '(' || SUBSTR (SYS_CONNECT_BY_PATH (FULLNAME , '|'), 2) || ')' Table2
FROM (SELECT FULLNAME , ROW_NUMBER () OVER (ORDER BY FULLNAME) rn,
COUNT (*) OVER () cnt
FROM Table2)
WHERE rn = cnt START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1))
The subquery in the regexp_like turns the column into a single cell containing the regular expression string.
I do realise this is probably a performance killer though, but thankfully I'm not that fussed about performance at this point

select first N distinct rows without inner select in oracle

I have something like the following structure: Table1 -> Table2 relationship is 1:m
I need to perform queries similar to the next one:
select Table1.id from Table1 left outer join Table2 on (Table1.id1 = Table2.id2) where Table2.name like '%a%' and rownum < 11
i.e. I want first 10 ids from Table 1 which fulfils conditions in Table2. The problem is that I've to use distinct, but the distinct clause applies after 'rownum < 11', so the result could be e.g. 5 records even if their number is more than 10.
The apparent solution is to use the following:
select id from ( select Table1.id from Table1 left outer join Table2 on (Table1.id1 = Table2.id2) where Table2.name like '%a%' ) where rownum < 11
But I'm afraid of performance of such a query. If Table1 contains about 300k records, and Table2 contains about 700k records, wouldn't such a query be really slow?
Is there another query, but without inner select? Unluckily, I want to avoid using inner selects.
Unluckily, I want to avoid using inner
selects
With having the WHERE clause on TABLE2, you are filtering the select to an INNER JOIN (ie. since Table2.name IS null <> Table2.name like '%a%' you will only get results where the join is INNER to one another. Also, the %a% without a function based index will result in a full table scan on each iteration.
but #lweller is completely correct, to do the query correctly you will need to use a subquery. keep in mind, without an ORDER BY you have no guarantee of the order of your top X records (it may always 'appear' that the values conform to the primary key or whatnot, but there is no guarantee.
WITH TABLE1 AS(SELECT 1 ID FROM DUAL
UNION ALL
SELECT 2 ID FROM DUAL
UNION ALL
SELECT 3 ID FROM DUAL
UNION ALL
SELECT 4 ID FROM DUAL
UNION ALL
SELECT 5 ID FROM DUAL) ,
TABLE2 AS(SELECT 1 ID, 'AAA' NAME FROM DUAL
UNION ALL
SELECT 2 ID, 'ABB' NAME FROM DUAL
UNION ALL
SELECT 3 ID, 'ACC' NAME FROM DUAL
UNION ALL
SELECT 4 ID, 'ADD' NAME FROM DUAL
UNION ALL
SELECT 1 ID, 'BBB' NAME FROM DUAL
) ,
sortable as( --here is the subquery
SELECT
Table1.ID ,
ROW_NUMBER( ) OVER (ORDER BY Table2.NAME NULLS LAST) ROWOverName , --this wil handle the sort
table2.name
from
Table1
LEFT OUTER JOIN --this left join it moot, pull the WHERE table2.name into the join to have it LEFT join as expected
Table2
on
(
Table1.id = Table2.id
)
WHERE
Table2.NAME LIKE '%A%')
SELECT *
FROM sortable
WHERE ROWOverName <= 2;
-- you can drop the ROW_NUMBER( ) analytic function and replace the final query as such (as you initially indicated)
SELECT *
FROM sortable
WHERE
ROWNUM <= 2
ORDER BY sortable.NAME --make sure to put in an order by!
;
You don't need DISTINCT here at all, and there is nothing bad in subqueries as such.
SELECT id
FROM Table1
WHERE id IN
(
SELECT id
FROM Table2
WHERE name LIKE '%a%'
)
AND rownum < 11
Note that the order is not guaranteed. To guarantee order, you have to use a nested query:
SELECT id
FROM (
SELECT id
FROM Table1
WHERE id IN
(
SELECT id
FROM Table2
WHERE name LIKE '%a%'
)
ORDER BY
id -- or whatever else
)
WHERE rownum < 11
There is no way to do it without nested queries (or the CTE).
For me there is no reason to be afraid of performance. I think the sub select ist the best way to solve your problem. And if you want don't trust me, take a look at explain plan of your query and you will see that it behave not so bad as you might think.

How do I merge data from two tables in a single database call into the same columns?

If I run the two statements in batch will they return one table to two to my sqlcommand object with the data merged. What I am trying to do is optimize a search by searching twice, the first time on one set of data and then a second on another. They have the same fields and I’d like to have all the records from both tables show and be added to each other. I need this so that I can sort the data between both sets of data but short of writing a stored procedure I can’t think of a way of doing this.
Eg. Table 1 has columns A and B, Table 2 has these same columns but different data source. I then wan to merge them so that if a only exists in one column it is added to the result set and if both exist it eh tables the column B will be summed between the two.
Please note that this is not the same as a full outer join operation as that does not merge the data.
[EDIT]
Here's what the code looks like:
Select * From
(Select ID,COUNT(*) AS Count From [Table1]) as T1
full outer join
(Select ID,COUNT(*) AS Count From [Table2]) as T2
on t1.ID = T2.ID
Perhaps you're looking for UNION?
IE:
SELECT A, B FROM Table1
UNION
SELECT A, B FROM Table2
Possibly:
select table1.a, table1.b
from table1
where table1.a not in (select a from table2)
union all
select table1.a, table1.b+table2.b as b
from table1
inner join table2 on table1.a = table2.a
edit: perhaps you would benefit from unioning the tables before counting. e.g.
select id, count() as count from
(select id from table1
union all
select id from table2)
I'm not sure if I understand completely but you seem to be asking about a UNION
SELECT A,B
FROM tableX
UNION ALL
SELECT A,B
FROM tableY
To do it, you would go:
SELECT * INTO TABLE3 FROM TABLE1
UNION
SELECT * FROM TABLE2
Provided both tables have the same columns
I think what you are looking for is this, but I am not sure I am understanding your language correctly.
select id, sum(count) as count
from (
select id, count() as count
from table1
union all
select id, count() as count
from table2
) a
group by id