Redshift join wildcard - sql

I am trying to do a wildcard search based on the result set of a subquery in Redshift. For example, Table A has first names and Table B has names which could be Last Name, First Name or First Name, Last Name. I want to return rows from Table B based on matches to a subset of Table A. I found the Similar To operator, but that only seems to work when I can hard-code the terms I am searching for. Is there a way I can achieve something like
SELECT col1 FROM Table_A WHERE col1 SIMILAR TO '%(SELECT distinct col2 FROM Table_B)%'
in order to achieve
SELECT col1 FROM Table_A WHERE col1 LIKE '%something%' OR col1 LIKE '%something else%'

This is what I ended up implementing after a recommendation from #GMB
CREATE TABLE test2 AS (
SELECT 'a' as val
UNION ALL
SELECT 'b' as val
UNION ALL
SELECT 'c' as val
);
CREATE TABLE test3 as (
SELECT 'apple' as name
UNION ALL
SELECT 'pear' as name
UNION ALL
SELECT 'plum' as name
);
SELECT * FROM test3
WHERE EXISTS (
SELECT 1
FROM test2 WHERE test3.name LIKE ('%'||val||'%')
)

You could use exists:
select col1
from table_a a
where exists (
select 1
from table_b b
where a.col1 similar to concat('%', b.col2, '%')
)

Related

Create a view of a table with a column that has multiple values

I have a table (Table1) like the following:
Col1
Col2
First
Code1,Code2,Code3
Second
Code2
So Col2 can contain multiple values comma separated, I have another table (Table2) that contains this:
ColA
ColB
Code1
Value1
Code2
Vaue2
Code3
Vaue3
I need to create a view that joins the two tables (Table1 and Table2) and returns something like this:
Col1
Col2
First
Value1,Value2,Value3
Second
Value2
Is that possible? (I'm on Oracle DB if that helps.)
It's a violation of first normal form to have a list in a column value like that. It causes a lot of difficulties in a relational database, like the one you are encountering now.
However, you can get what you want by using the LIKE operator to find colA values that are substrings of the Col2 column. Add delimiters before and after to catch the first and last ones. Then aggregate back up to a single list using LISTAGG.
SELECT table1.col1,
LISTAGG(table2.colB,',') WITHIN GROUP (ORDER BY table2.colB) value_list
FROM table1,
table2
WHERE ','||table1.col2||',' LIKE '%,'||table2.colA||',%'
GROUP BY table1.col1
This will not perform well on large volumes, because without an equijoin it's going to use nested loops, and you can't use an index on a LIKE predicate with % at the beginning. The combination of nested loops + FTS is not pleasant with large volumes of data. Therefore, if this is your situation, you will need to fix the 1NF problem by transforming table1 into normal relational format, and then join it to table2 with an equijoin, which will enable it to use a hash join instead. So:
SELECT table1.col1,
LISTAGG(table2.colB,',') WITHIN GROUP (ORDER BY table2.colB) value_list
FROM (SELECT t.col1,
SUBSTR(t.col2,INSTR(t.col2,',',1,seq)+1,INSTR(t.col2,',',1,seq+1)-(INSTR(t.col2,',',1,seq)+1)) col2_piece
FROM (SELECT col1,
','||col2||',' col2
FROM table1) t,
(SELECT ROWNUM seq FROM dual CONNECT BY LEVEL < 10) x) table1,
table2
WHERE table1.col2_piece IS NOT NULL
AND table1.col2_piece = table2.colA
GROUP BY table1.col1
If you want the values in the same order in the list as the terms then you can use:
SELECT t1.col1,
LISTAGG(t2.colb, ',') WITHIN GROUP (
ORDER BY INSTR(','||t1.col2||',', ','||t2.colA||',')
) AS value2
FROM table1 t1
INNER JOIN table2 t2
ON INSTR(','||t1.col2||',', ','||t2.colA||',') > 0
GROUP BY
t1.col1
Which, for the sample data:
CREATE TABLE Table1 (Col1, Col2) AS
SELECT 'First', 'Code1,Code2,Code3' FROM DUAL UNION ALL
SELECT 'Second', 'Code2' FROM DUAL;
CREATE TABLE Table2 (ColA, ColB) AS
SELECT 'Code1', 'XXXX' FROM DUAL UNION ALL
SELECT 'Code2', 'ZZZZ' FROM DUAL UNION ALL
SELECT 'Code3', 'YYYY' FROM DUAL;
Outputs:
COL1
VALUE2
First
XXXX,ZZZZ,YYYY
Second
ZZZZ
fiddle

How to get also the not existing values

I've got a query like this
select column, count(*)
from mytable
where column in ('XXX','YYY','ZZZ',....)
group by column;
But I want also to get a row for values the aren't in the table.
Let's suppose that 'ZZZ' doesn't exist in mytable, I'd like to get:
COLUMN COUNT(*)
XXX 3
YYY 2
ZZZ 0 (or NULL)
Oracle version 10g
Thanks in advance
Mark
In general, you would need to have a second table which contains all the possible column values whose counts you want to appear in the output. For demo purposes only, we can use a CTE for that:
WITH vals AS (
SELECT 'XXX' AS val UNION ALL
SELECT 'YYY' UNION ALL
SELECT 'ZZZ'
)
SELECT t1.val, COUNT(t2.col) AS cnt
FROM vals t1
LEFT JOIN mytable t2
ON t2.col = t1.val
GROUP BY
t1.val;

Selecting distinct values within a a group

I want to select distinct values of one variable within a group defined by another variable. What is the easiest way?
My first thought was to combine group by and distinct but it does not work. I tried something like:
select distinct col2, col1 from myTable
group by col1
I have looked at this one here but can't seem to solve my problem
Using DISTINCT along with GROUP BY in SQL Server_
Table example
If your requirement is to pick distinct combinations if col1 and COL2 then no need to group by just use
SELECT DISTINCT COL1, COL2 FROM TABLE1;
But if you want to group by then automatically one record per group is displayed by then you have to use aggregate function of one of the columns i.e.
SELECT COL1, COUNT(COL2)
FROM TABLE1 GROUP BY COL1;
no need group by just use distinct
select distinct col2, col1 from myTable
create table t as
with inputs(val, id) as
(
select 'A', 1 from dual union all
select 'A', 1 from dual union all
select 'A', 2 from dual union all
select 'B', 1 from dual union all
select 'B', 2 from dual union all
select 'C', 3 from dual
)
select * from inputs;
The above creates your table and the below is the solution (12c and later):
select * from t
match_recognize
(
partition by val
order by id
all rows per match
pattern ( a {- b* -} )
define b as val = a.val and id = a.id
);
Output:
Regards,
Ranagal

How to exclude data in second part of a UNION with data from the first part?

I want to make a UNION query. The first SELECT of it is pretty straight, but on the second one I'd like to select all entries in a table, where the IDs are not present in a row of the first part.
Something like this:
SELECT * FROM a
UNION ALL
SELECT * FROM b
WHERE b.id NOT IN (LISTAGG(a.selected_id))
Of yourse, I can't use an aggregat function here. But I don't have an idea how to solve this. Is it even possible?
I'm sure I could do another subselect for the NOT IN clause, but I want to avoid this, as I think this will hit too much to performance.
I think you want something like this:
SELECT a.*
FROM a
UNION ALL
SELECT b.*
FROM b
WHERE NOT EXISTS (SELECT 1 FROM a WHERE b.id = a.selected_id);
If performance is an issue, you want an index on a(selected_id).
This assumes that the columns are the same in the two tables.
In general, you want to use NOT EXISTS with a subquery because it does what you expect when the subquery returns NULL values.
Why not
SELECT * FROM a
UNION ALL
SELECT *
FROM b
WHERE b.id NOT IN (SELECT a.id FROM a)
As Matthew suggested, the NOT IN option is safe to use if a.id is not nullable. Otherwise, a NOT EXISTS would be a better option:
WHERE NOT EXISTS (SELECT 1 FROM a WHERE b.id = a.id);
On the other hand, if it were just about IDs (without mentioning other columns from both tables), is it not just union instead of union all?
select id from a
union
select id from b
because your query says:
give me IDs from b, but not the ones that exist in a
union that with IDs from a
which is (b minus a) union all a
which is a union b
I might be wrong, though; try both options and compare results. Yet again, as Matthew has noted, that approach doesn't make much sense if other columns from both tables are involved.
but i still like to avoid going over the table twice. well, at least
if it is avoidable.
In your posted query,
SELECT * FROM a
UNION ALL
SELECT * FROM b
WHERE b.id NOT IN (LISTAGG(a.selected_id))
I am going to assume that selected_Id is not actually a column in your tables, but rather was your way of saying "the list of ids selected from table a, above".
I am also going to assume that a.id and b.id are both non-nullable, unique keys.
If all those assumptions hold true, you might try this approach:
SELECT nvl(a.id, b.id) id,
nvl(a.col1, b.col1) col1,
nvl(a.col2, b.col2) col2,
-- you get the idea...
FROM a
FULL OUTER JOIN b b ON b.id = a.id;
This approach is more typing, but should access each table only once.
Here is a full example:
create table matta ( id number, col1 varchar2(5), col2 varchar2(5) );
create table mattb ( id number, col1 varchar2(5), col2 varchar2(5) );
insert into matta ( id, col1, col2 ) VALUES ( 1, 'A1.1', 'A1.2');
insert into matta ( id, col1, col2 ) VALUES ( 2, 'A2.1', 'A2.2');
insert into matta ( id, col1, col2 ) VALUES ( 3, 'A3.1', 'A3.2');
insert into matta ( id, col1, col2 ) VALUES ( 4, 'A4.1', 'A4.2');
insert into mattb ( id, col1, col2 ) VALUES ( 3, 'B3.1', 'B3.2');
insert into mattb ( id, col1, col2 ) VALUES ( 4, 'B4.1', 'B4.2');
insert into mattb ( id, col1, col2 ) VALUES ( 5, 'B5.1', 'B5.2');
COMMIT;
SELECT nvl(a.id, b.id),
nvl(a.col1, b.col1),
nvl(a.col2, b.col2)
FROM matta a
FULL OUTER JOIN mattb b ON b.id = a.id
ORDER BY 1;
+----+------+------+
| ID | COL1 | COL2 |
+----+------+------+
| 1 | A1.1 | A1.2 |
| 2 | A2.1 | A2.2 |
| 3 | A3.1 | A3.2 |
| 4 | A4.1 | A4.2 |
| 5 | B5.1 | B5.2 |
+----+------+------+
One more option to try is the LEFT JOIN of b to a end exclude the matching rows, where b.id = a.selected_id:
SELECT a.* FROM a
UNION ALL
SELECT b.* FROM b
LEFT JOIN a ON b.id = a.selected_id
WHERE a.selected_id IS NULL;

Select statement for Oracle SQL

I have a table say,
column1 column2
a apple
a ball
a boy
b apple
b eagle
b orange
c bat
c ball
c cork
Now I would like to fetch column1 based on the rows that doesn't contain 'apple' and also ignore values in column1 if any of the rows have 'apple' in it. So in the table above only 'C' must be retured.
I am kind of new to Oracle SQL and I know Select column1 from table where column2 != 'apple' will not work. I need some help with this please.
You could use DISTINCT with NOT IN in following:
QUERY 1 using NOT IN
select distinct col1
from t
where col1 not in (select col1 from t where col2 = 'Apple')
QUERY 2 using NOT EXISTS
As per #jarlh comment you could use NOT EXISTS in following:
select distinct col1
from #t t1
where not exists (select 1 from #t t2 where col2 = 'Apple' and t1.col1 = t2.col1)
SAMPLE DATA
create table t
(
col1 nvarchar(60),
col2 nvarchar(60)
)
insert into t values
('a','apple')
,('a','ball')
,('a','boy')
,('b','apple')
,('b','eagle')
,('b','orange')
,('c','bat')
,('c','ball')
,('c','cork')
Assuming that column1 is NOT NULL you could use:
SELECT DISTINCT t.column1
FROM table_name t
WHERE t.column1 NOT IN (SELECT column1
FROM table_name
WHERE column2 = 'apple');
LiveDemo
To get all columns and rows change DISTINCT t.column1 to *.
Select * from tbl
Left join (
Select column1 from tbl
Where column2 like '%apple%'
Group by column1
) g on tbl.colum1 = g.column1
Where g.column1 is null
Seems to me that you need to find a summary of all colum1 values that have any reference to apple. Then list the rows that have no match to the summary list (g)
If I understand well, you need the values af column1 such that in your table does not exist a row with the same value of column1 and 'apple' in column2; you can translate this in SQL with:
Select column1
from your_table t
where not exists (
select 1
from your_table t2
where t2.column1 = t1.column1
and t2.column2= 'apple'
)
This is only one of the possible ways to get your result, soyou can rewrite it in many ways; I believe this way of writing is similar enough to the logics to clearly explain how a logic could be written in plain SQL.