Redshift create list and search different table with it - sql

I think there a few ways to tackle this, but I'm not sure how to do any of them.
I have two tables, the first has ID's and Numbers. The ID's and numbers can potentially be listed more than once, so I create a result table that lists the unique numbers grouped by ID.
My second table has rows (100 million) with the ID and Numbers again. I need to search that table for any ID that has a Number not in the list of Numbers from the result table.
Can redshift do a query based on if the ID matches and the Number exists in the list from the table? Can this all be done in memory/one statement?
DROP TABLE IF EXISTS `myTable`;
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1890),
("UHO21QQY3TW",4370),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1892),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
SELECT ID, listagg(distinct Numbers,',') as Number_List, count(Numbers) as Numbers_Count
FROM myTable
GROUP BY ID
AS result
DROP TABLE IF EXISTS `myTable2`;
CREATE TABLE `myTable2` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable2` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1870),
("UHO21QQY3TW",4350),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1882),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
Pseudo Code
Select ID, listagg(distinct Numbers) as Violation
Where Numbers IN NOT IN result.Numbers_List
or possibly: WHERE Numbers NOT LIKE '%' || result.Numbers_List|| '%'
Desired Output
(“CRQ44MPX1SZ”, ”1870,1882”)
(“UHO21QQY3TW”, ”4350”)
EDIT
Going the JOIN route, I am not getting the right results...but I'm pretty sure my WHERE implementation is wrong.
SELECT mytable1.ID, listagg(distinct mytable2.Numbers, ',') as unauth_list, count(mytable2.Numbers) as unauth_count
FROM mytable1
LEFT JOIN mytable2 on mytable1.id = mytable2.id
WHERE (mytable1.id = mytable2.id)
AND (mytable1.Numbers <> mytable2.Numbers)
GROUP BY mytable1.id
Expected output:
(“CRQ44MPX1SZ”, ”1870,1882”, 2)
(“UHO21QQY3TW”, ”4350”, 1)

Just left join the two tables on ID and numbers and check for (where clause) to see if the match wasn't found. Shouldn't be a need for listagg() and complex comparing. Or did I miss part of the question?

Related

Why where not exists return exist ids?

my query:
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE NOT EXISTS (
SELECT *
FROM ncbi.affi_known1 AS b
WHERE a.id = b.id
)
limit 5000
it returns:
id
affiliation
4683763
Psychopharmacology Unit, Dorothy Hodgkin Building, University of Bristol, Whitson Street, Bristol, BS1 3NY, UK.
as first row.
but
select * from ncbi.affi_known1 where id = 4683763
do return the data with id = 4683763
both id are int8 type
table a
CREATE TABLE "public"."affiliation" (
"id" int8 NOT NULL,
"affiliation" text COLLATE "pg_catalog"."default",
"tsv_affiliation" tsvector,
CONSTRAINT "affiliation_pkey" PRIMARY KEY ("id")
)
;
CREATE INDEX "affi_idx_tsv" ON "public"."affiliation" USING gin (
to_tsvector('english'::regconfig, affiliation) "pg_catalog"."tsvector_ops"
);
CREATE INDEX "tsv_affiliation_idx" ON "public"."affiliation" USING gin (
"tsv_affiliation" "pg_catalog"."tsvector_ops"
);
table b
CREATE TABLE "ncbi"."affi_known1" (
"id" int8 NOT NULL,
"affi_raw" text COLLATE "pg_catalog"."default",
"affi_main" text COLLATE "pg_catalog"."default",
"affi_known" bool,
"divide" text COLLATE "pg_catalog"."default",
"divide_known" bool,
"sub_divides" text[] COLLATE "pg_catalog"."default",
"country" text COLLATE "pg_catalog"."default",
CONSTRAINT "affi_known_pkey" PRIMARY KEY ("id")
)
;
update:
after create index on id, everything works well.
delete the index, it seems go wrong.
so why primary key id fails there.
update2:
table b is generated from table a, using:
query = '''
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE NOT EXISTS (
SELECT 1
FROM ncbi.affi_known AS b
WHERE a.id = b.id
)
limit 2000000
'''
data = pd.read_sql(query,conn)
while len(data):
for i,row in tqdm(data.iterrows()):
...
curser_insert.execute(
'insert into ncbi.affi_known(id,affi_raw, affi_main ,affi_known,divide,country) values ( %s, %s, %s,%s,%s,%s) ',
[affi_id,affi_raw, affi_main, affi_known,devide,country]
)
conn2.commit()
conn2.commit()
conn.commit()
data = pd.read_sql(query, conn)
and the code exit improperly.
Your understanding of how EXISTS works might be off. Your current exists query is saying that id 4683763 exists in the affiliation table, not the affi_known1 table. So, the following query should return the single record:
SELECT a.id, a.affiliation
FROM public.affiliation a
WHERE a.id = 4683763;
I am assuming the requirement is to fetch rows only when the id is not present in the second table, so you can try this
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE a.id NOT IN (
SELECT id
FROM ncbi.affi_known1
)
If id were an integer, your query would do what you want.
If id is a string, you could have issues with "look-alikes". It is very hard to say what the problem is -- there could be spaces in the id, hidden characters, or something else. And this could be in either table.
Assuming the ids look like numbers, you could filter "bad" ids out using regular expressions:
select id
from ncbi.affi_known1
where not id ~ '^[0-9]*$';

How to Make a mutually excusive select query in SQL?

I'm new to sql, and I need to write a query for a table that looks like this
CREATE TABLE TESTS
PATH_ID int PRIMARY KEY,
Day DATE NOT NULL,
Direction varchar(255) NOT NULL,
D_ID int NOT NULL,
FOREIGN KEY (D_ID) REFERENCES Drivers(D_ID),
);
INSERT INTO TESTS(PATH_ID,Day,Direction,D_ID)
VALUES (1,'2021-02-01' ,'Right',001),
(2,'2021-02-01' ,'Left',002),
(3,'2021-02-02','Right',002),
What I need to do is write a query that shows drivers (D_ID) who have ONLY ever gone Right (Direction), and show The D_ID, the Day, and all the times the driver went right.
One method is not exists:
select t.*
from tests t
where not exists (select 1
from tests t2
where t2.d_id = t.d_id and t2.direction <> 'Right'
);
you can use not in
select a.* from Tests a where D_ID not in (
select D_ID from Tests where direction <>'Right'
)

SQL Query to search records in multiple tables

I'm trying to implement a search feature. I need to look into multiple tables in SQL database using a text-string. Currently, I'm only looking into 3 tables i.e.,
Table Items:
[dbo].[Items]
(
[ItemID] INT IDENTITY (1, 1) NOT NULL,
[CategoryID] INT NOT NULL,
[BrandID] INT NOT NULL,
[ItemName] NVARCHAR(MAX) NOT NULL,
[ItemPrice] DECIMAL(18, 2) NOT NULL,
[imageUrl] NVARCHAR(MAX) NULL,
CONSTRAINT [PK_dbo.Items]
PRIMARY KEY CLUSTERED ([ItemID] ASC),
CONSTRAINT [FK_dbo.Items_dbo.Brands_BrandID]
FOREIGN KEY ([BrandID]) REFERENCES [dbo].[Brands] ([BrandID]),
CONSTRAINT [FK_dbo.Items_dbo.Categories_CategoryID]
FOREIGN KEY ([CategoryID]) REFERENCES [dbo].[Categories] ([CategoryID])
)
Table Categories:
[dbo].[Categories]
(
[CategoryID] INT IDENTITY (1, 1) NOT NULL,
[Name] NVARCHAR (MAX) NULL,
CONSTRAINT [PK_dbo.Categories]
PRIMARY KEY CLUSTERED ([CategoryID] ASC)
)
Table Brands:
[dbo].[Brands]
(
[BrandID] INT IDENTITY (1, 1) NOT NULL,
[Name] NVARCHAR (MAX) NULL,
CONSTRAINT [PK_dbo.Brands]
PRIMARY KEY CLUSTERED ([BrandID] ASC)
)
Any records that may contain the supplied text-string must be fetched out. I'm a newbie on SQL knowledge. This is my implementation is:
SELECT *
FROM Items
WHERE ItemName LIKE 'cocacola'
SELECT *
FROM Categories
WHERE Name LIKE 'cocacola'
SELECT *
FROM Brands
WHERE Name LIKE 'cocacola'
which is obviously incorrect. Can someone please guide.
Thanks.
If you want to return a substring search, it might be slow depending on how much data you have.
If you are able to pre-specify the tables, and want a single search that searches all and returns matches across all tables, you will want something like this:
SELECT
'Items' as table_name,
item_id as record_id,
ItemName AS found
FROM
Items
WHERE
ItemName LIKE '%cocacola%'
UNION
SELECT
'Categories' as table_name,
CategoryID AS record_id,
Name AS found
FROM
Categories
WHERE
Name LIKE '%cocacola%'
UNION
SELECT
'Brands' as table_name,
BrandID AS record_id,
Name AS found
FROM
Brands
WHERE
Name LIKE '%cocacola%'
The UNION will append the results from one query to another.
It will be slow if you have a lot of data
You solution is not incorrect. You run three queries. Each against a different Table. Depending on your use case this is probably fine.
You can join the tables if you want to search all tables with only one query. This is probably slower than running three queries because the database has to match the values together.
SELECT *
FROM Items
FULL OUTER JOIN Categories ON Categories.CategoryID = Items.CategoryID
FULL OUTER JOIN Brands ON Brands.BrandID = Items.BrandID
WHERE Items.ItemName LIKE 'cocacola'
AND Categories.Name LIKE 'cocacola'
AND Brands.Name LIKE 'cocacola'
If you get a hit in the category name with this query, the category will be listed for every item that's associated with this category.
It sounds like you might want to try using a union to join together the results of all three queries.
For example:
SELECT ItemID, ItemName
FROM Items
WHERE ItemName = 'cocacola'
UNION
SELECT CategoryID, Name
FROM Categories
WHERE Name = 'cocacola'
UNION
SELECT BrandID, Name
FROM Brands
WHERE Name = 'cocacola'
One note about union is that you have to make sure that each part of the query is returning the same number of columns with the same datatype in the same order.

SQLITE3: converting IDs to codes when there are hundreds of columns to convert

I have a table A that has several hundred columns (let's say 301 for example) with the first column being the primary key and the rest being IDs from table B i.e.
CREATE TABLE "A" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
ko2 INTEGER NOT NULL,
...
ko300 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
);
CREATE TABLE "B" (
id INTEGER NOT NULL,
name INTEGER NOT NULL,
PRIMARY KEY (id)
);
I would like to be able to convert the IDs into names. For example:
SELECT name FROM B WHERE id in (SELECT * FROM A);
Except the SELECT * part means that ko_index_id will be fed into B which is wrong. If there were only two columns in A I could just write
SELECT name FROM B WHERE id in (SELECT ko1, ko2 FROM A);
but table A has 300 columns!
Can anyone help me around this?
300+ columns? How about redoing table A by pivoting those columns into rows. You could have key name and value column. For example:
select * from A:
id, ko_name, ko_value
1, ko1, 5
1, ko2, 6
Then selecting the keys becomes much easier; e.g:
SELECT name FROM B WHERE id in (SELECT ko_value FROM A where ko_name in ('ko1', 'ko2')) ;
I agree with #Gordon's comment. If you can afford to change your data model, I would suggest you use an intersection table. It's the typical way to model "many-to-many" relationships in a database.
Example:
CREATE TABLE "A" (
id INTEGER NOT NULL,
...
PRIMARY KEY (id)
);
CREATE TABLE "B" (
id INTEGER NOT NULL,
name INTEGER NOT NULL,
...
PRIMARY KEY (id)
);
CREATE TABLE "AB" (
a_id INTEGER NOT NULL,
b_id INTEGER NOT NULL
);
SELECT A.id, B.name
FROM A
INNER JOIN AB ON A.id=AB.a_id
INNER JOIN B ON AB.b_id=B.id;

How to join two tables together and return all rows from both tables, and to merge some of their columns into a single column

I'm working with SQL Server 2012 and wish to query the following:
I've got 2 tables with mostly different columns. (1 table has 10 columns the other has 6 columns).
however they both contains a column with ID number and another column of category_name.
The ID numbers may be overlap between the tables (e.g. 1 table may have 200 distinct IDs and the other 900 but only 120 of the IDs are in both).
The Category name are different and unique for each table.
Now I wish to have a single table that will include all the rows of both tables, with a single ID column and a single Category_name column (total of 14 columns).
So in case the same ID has 3 records in table 1 and another 5 records in table 2 I wish to have all 8 records (8 rows)
The complex thing here I believe is to have a single "Category_name" column.
I tried the following but when there is no null in both of the tables I'm getting only one record instead of both:
SELECT isnull(t1.id, t2.id) AS [id]
,isnull(t1.[category], t2.[category_name]) AS [category name]
FROM t1
FULL JOIN t2
ON t1.id = t2.id;
Any suggestions on the correct way to have it done?
Make your FULL JOIN ON 1=0
This will prevent rows from combining and ensure that you always get 1 copy of each row from each table.
Further explanation:
A FULL JOIN gets rows from both tables, whether they have a match or not, but when they do match, it combines them on one row.
You wanted a full join where you never combine the rows, because you wanted every row in both tables to appear one time, no matter what. 1 can never equal 0, so doing a FULL JOIN on 1=0 will give you a full join where none of the rows match each other.
And of course you're already doing the ISNULL to make sure the ID and Name columns always have a value.
SELECT ID, Category_name, (then the other 8 columns), NULL, NULL, NULL, NULL
FROM t1
UNION ALL
SELECT ID, Category_name, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, (then the other 4 columns)
FROM t2
This demonstrates how you can use a UNION ALL to combine the row sets from two tables, TableA and TableB, and insert the set into TableC.
Create two source tables with some data:
CREATE TABLE dbo.TableA
(
id int NOT NULL,
category_name nvarchar(50) NOT NULL,
other_a nvarchar(20) NOT NULL
);
CREATE TABLE dbo.TableB
(
id int NOT NULL,
category_name nvarchar(50) NOT NULL,
other_b nvarchar(20) NOT NULL
);
INSERT INTO dbo.TableA (id, category_name, other_a)
VALUES (1, N'Alpha', N'ppp'),
(2, N'Bravo', N'qqq'),
(3, N'Charlie', N'rrr');
INSERT INTO dbo.TableB (id, category_name, other_b)
VALUES (4, N'Delta', N'sss'),
(5, N'Echo', N'ttt'),
(6, N'Foxtrot', N'uuu');
Create TableC to receive the result set. Note that columns other_a and other_b allow null values.
CREATE TABLE dbo.TableC
(
id int NOT NULL,
category_name nvarchar(50) NOT NULL,
other_a nvarchar(20) NULL,
other_b nvarchar(20) NULL
);
Insert the combined set of rows into TableC:
INSERT INTO dbo.TableC (id, category_name, other_a, other_b)
SELECT id, category_name, other_a, NULL AS 'other_b'
FROM dbo.TableA
UNION ALL
SELECT id, category_name, NULL, other_b
FROM dbo.TableB;
Display the results:
SELECT *
FROM dbo.TableC;