SQL DB2 Find and Display Common - sql

I am working on a query in which I need to find all pair of distinct customers who bought atleast one title in common and display it, with the customer with higher id as the first customer A and customer B being the one with lower id. The schema looks like
create table customer (
id smallint not null,
name varchar(20)
primary key (id))
create table purchase (
id smallint not null,
title varchar(25) not null,
primary key (id,title))
Here is the query I wrote but its not outputting the desired result
Select
distinct A.name as customera,B.name as customerb
from customer A,customer B, purchase C
where A.id=C.id and B.id=C.id
But this is yeilding a wrong result to what I want. I am a beginner in sql and this database is what i got to work on.
My output should look like this which It does but it displays both customers as same which is wrong.
CUSTOMERA CUSTOMERB
-------------------- --------------------
Some customer with a higher id other customer
Any help on this or how i can fix this.

First, never use commas in the from clause. Always use proper, explicit, standard join syntax.
Assuming that the id in purchase matches the id in customer, then you can just do:
select distinct p1.id, p2.id
from purchase p1 join
purchase p2
on p2.title = p1.title and p1.id > p2.id;

Related

SQL insert multiple rows depending on number of rows returned from subquery

I have 3 SQL tables Companies, Materials and Suppliers as follows.
Tables
I need to insert values into Suppliers from a list which contains Company Name and Material Name as headers. However, I have multiple companies with the same name in the database and i need to add a new value into suppliers for each one of those companies.
For e.g. my list containes values ['Wickes','Bricks'] . I have this sql below to add a new entry into the suppliers table but since i have multple companies called 'Wickes' I'll get an error as the subquery will return more than 1 value.
INSERT INTO Suppliers(Id,CompanyId,MaterialId) VALUES (NEWID(), (SELECT Id FROM Companies WHERE Name = 'Wickes'),(SELECT Id FROM Materials WHERE Name = 'Bricks'))
Whats the best solution to get the Id of all the companies there are called 'Wickes' and then add vales into the suppliers table with that Id and the relevant material Id of 'Bricks'.
You can use INSERT () SELECT.. rather than INSERT () VALUES(), e.g
INSERT INTO Suppliers (Id, CompanyId, MaterialId)
SELECT NEWID(), c.Id, m.Id
FROM Companies AS c
CROSS JOIN Materials AS m
WHERE c.Name = 'Wickes'
AND m.Name = 'Bricks';
This will ensure that if you have multiple companies/materials with the same name, all permutations are inserted. Example on db<>fiddle
Although based on your image Suppliers.Id is an integer, so I think NEWID() is not doing what you think it is here, you probably just want:
INSERT INTO Suppliers (CompanyId, MaterialId)
SELECT c.Id, m.Id
FROM Companies AS c
CROSS JOIN Materials AS m
WHERE c.Name = 'Wickes'
AND m.Name = 'Bricks';
And let IDENTITY take care of the Id column in Suppliers.
As a further aside, I've also just noted that MaterialId is VARCHAR in your Suppliers table, that looks like an error if it is supposed to reference the integer Id column in Materials.
INSERT INTO Suppliers(Id,CompanyId,MaterialId) VALUES (NEWID(), (SELECT distict Id FROM Companies WHERE Name = 'Wickes'),(SELECT distict Id FROM Materials WHERE Name = 'Bricks'));
If I understand rightly Companies are the suppliers and the Suppliers table is the one that says where you can buy each material from.
Why do you have duplicates? Do you have an account for different branches of Wickes for example? If they are really duplicates and you don't care which one you use a function like MIN() will do the job of ensuring that only one value is returned. If you have duplicates it would be a good idea to find a way of disactivating all except one. This will make is simpler for you everytime you want to deal with the supplier: minimum orders, chasing overdue orders, payments etc.
Also Companies.ID and Materials.ID should be foreign keys of the Suppliers table. It is also a good idea for the ID column to be auto-incrementing, which makes it easier to add new products as you do not need to specify the ID column.
If you cannot or do not want to modify the id column to auto-incrementing IDENTITY you can continue to use NEWID().
create table Companies(
id INT PRIMARY KEY NOT NULL IDENTITY,
name VARCHAR(25));
create table Materials(
id INT PRIMARY KEY NOT NULL IDENTITY,
name VARCHAR(25));
create table Suppliers(
id INT PRIMARY KEY NOT NULL IDENTITY,
CompanyId INT FOREIGN KEY REFERENCES Companies(id),
MaterialId INT FOREIGN KEY REFERENCES Materials(id)
);
INSERT INTO Companies (name) VALUES ('Wickes');
INSERT INTO Materials (name) VALUES ('Bricks');
INSERT INTO Suppliers ( CompanyId, MaterialId)
SELECT c.Id, M.Id
FROM Companies AS c
CROSS JOIN Materials AS m
WHERE c.Name = 'Wickes'
AND m.Name = 'Bricks';
SELECT * FROM Companies;
SELECT * FROM Materials;
SELECT * FROM Suppliers;
GO
id | name
-: | :-----
1 | Wickes
id | name
-: | :-----
1 | Bricks
id | CompanyId | MaterialId
-: | --------: | ---------:
1 | 1 | 1
db<>fiddle here
INSERT INTO SUPPLIERS
(ID, COMPANYID, MATERIALID)
VALUES (NEWID(),
(SELECT DISTINCT ID FROM COMPANIES WHERE NAME = 'Wickes'), (SELECT DISTINCT ID FROM MATERIALS WHERE NAME = 'Bricks'))

SQLite - Excluding rows from a query based on certain column values, without using correlated subqueries

I'm working with the classicmodels database, originally for MySQL, but which we're using with SQLite. Within this database, there are 2 tables of interest, the orderdetails table...
CREATE TABLE `orderdetails` (
`orderNumber` int(11) NOT NULL,
`productCode` varchar(15) NOT NULL,
`quantityOrdered` int(11) NOT NULL,
`priceEach` decimal(10,2) NOT NULL,
`orderLineNumber` smallint(6) NOT NULL,
PRIMARY KEY (`orderNumber`,`productCode`)
);
... and the products table.
CREATE TABLE `products` (
`productCode` varchar(15) NOT NULL,
`productName` varchar(70) NOT NULL,
`productLine` varchar(50) NOT NULL,
`productScale` varchar(10) NOT NULL,
`productVendor` varchar(50) NOT NULL,
`productDescription` text NOT NULL,
`quantityInStock` smallint(6) NOT NULL,
`buyPrice` decimal(10,2) NOT NULL,
`MSRP` decimal(10,2) NOT NULL,
PRIMARY KEY (`productCode`)
);
I've written a query to list all the distinct order numbers, along with the product lines of the products within their orders. Here is the query, with the first few rows of the corresponding output.
sqlite> SELECT DISTINCT orderdetails.orderNumber, products.productLine
> FROM orderdetails JOIN products
> ON orderdetails.productCode == products.productCode;
orderNumber productLine
----------- ------------
10100 Vintage Cars
10101 Vintage Cars
10102 Vintage Cars
10103 Classic Cars
10103 Trucks and B
10103 Vintage Cars
10104 Classic Cars
10104 Trucks and B
10104 Trains
10105 Classic Cars
10105 Vintage Cars
10105 Trains
10105 Ships
10106 Planes
10106 Ships
10106 Vintage Cars
10107 Motorcycles
Now, I want to exclude all rows which contain order numbers corresponding to orders which contain Planes. For example, in the output above, order number 10106 contains products that are either Planes or Ships - since 10106 represents an order which contains planes (among other things), BOTH of these rows should be removed in such a query.
A simple subquery approach yields the right answer...
SELECT orderNumber FROM orders EXCEPT
SELECT DISTINCT orderdetails.orderNumber FROM orderdetails JOIN products
ON orderdetails.productCode == products.productCode
WHERE products.productLine == "Planes";
... however the catch here, is that I can't use correlated subqueries - the question that I am trying to tackle has explicitly stated that these subqueries are not allowed.
What have I tried?
A simple exclusion clause (such as WHERE products.productLine != "Planes") won't do the trick, as this only removes the order numbers of orders which ONLY contain planes. If an order contains planes and ships (for example), that number will remain in the query - not good!
My initial research into similar questions on StackOverflow seems only to bring up answers suggesting the use of subqueries (which would be awesome in other situations, but unfortunately not in this problem - we're avoiding subqueries).
You say "the catch here is that I can't use correlated subqueries". But the query that you say yields the right answer, namely...
SELECT orderNumber FROM orders EXCEPT
SELECT DISTINCT orderdetails.orderNumber FROM orderdetails JOIN products
ON orderdetails.productCode == products.productCode
WHERE products.productLine == "Planes";
... does not, in fact, contain a correlated subquery. In fact it doesn't contain a subquery at all, if you're going by the true definition of what a subquery is. So if the restriction really is "don't use subqueries", then the above query actually abides by the restriction, and is fine.
But you could do something like this, right?
SELECT DISTINCT orderNumber FROM orders
WHERE ordernumber NOT IN
(SELECT DISTINCT orderdetails.orderNumber FROM orderdetails
JOIN products ON orderdetails.productCode == products.productCode
WHERE products.productLine == "Planes"
);
You can use aggregation, if you just want the orderNumber (which your sample code suggests is what you want):
SELECT od.orderNumber
FROM orderdetails od JOIN
products p
ON od.productCode = p.productCode
GROUP BY od.orderNumber
HAVING SUM(CASE WHEN p.productLine = 'Planes' THEN 1 ELSE 0 END) = 0; -- none of those

SQL - Selecting data from two tables and removing duplicates

So I have two tables and I'm trying to display some data from both and remove the duplicates. Sorry, I'm new to SQL and databases. Here's my code
Table 1
CREATE TABLE customer
(
customer_id VARCHAR2(5),
customer_name VARCHAR2(50) NOT NULL,
customer_address VARCHAR2(150) NOT NULL,
customer_phone VARCHAR2(11) NOT NULL,
PRIMARY KEY (customer_id)
);
Table 2
CREATE TABLE shop
(
shop_id VARCHAR2(7),
shop_address VARCHAR2(150) NOT NULL,
customer_id VARCHAR2(7),
PRIMARY KEY (shop_id),
FOREIGN KEY (customer_id) REFERENCES customer (customer_id)
);
I want to display everything from the SHOP table, and customer_id, customer_name from the CUSTOMER TABLE.
I've tried this so far, but it's displaying everything from both tables and I get two duplicate customer_id columns:
SELECT *
FROM shop
JOIN customer ON shop.customer_id = customer.customer_id
ORDER BY customer_name;
Anyone able to help?
Thanks
Due to both tables has column customer_id, so you can show everything on shop table and only column customer_name from customer table
SELECT s.*, c.customer_name
FROM shop s
JOIN customer c ON s.customer_id = c.customer_id
ORDER BY c.customer_name;
select distinct c.customer_id, c.customer_name, s.*
from customer c
inner join shop s on c.customer_id = s.customer_id
To remove duplicates, you need to use distinct keyword
https://www.w3schools.com/sql/sql_distinct.asp
You need to manually list the columns you want. Using * will pull in every column from every table. SQL does not have any way of saying "select all columns except these...".
I hope you're only using * casually - it's a very bad idea to use SELECT * inside program code that then expects certain columns to exist in a particular order or with a certain name.
To save typing, you could use * for one of the tables and manually name the rest:
SELECT
customer.*,
shop.shop_id,
shop.shop_address
FROM
...

SQL: How to speed up a query using indexing

I am trying to speed up a query to find all CUSTOMERs who have bought a MOTORCYCLE manufactured before 1970 AND bought another MOTORCYCLE manufactured after 2010. Since my query is running very slowly, I think that I need help with finding the better indexes. My attempts are documented below:
Tables
CREATE TABLE CUSTOMER (
id int PRIMARY KEY,
fname varchar(30),
lname varchar(30)
);
CREATE TABLE MOTORCYCLE (
id int PRIMARY KEY,
name varchar(30),
year int -- Manufactured year
);
CREATE TABLE SALES (
cid int,
mid int,
FOREIGN KEY(cid) REFERENCES CUSTOMER(id),
FOREIGN KEY(mid) REFERENCES MOTOCYCLE(id),
PRIMARY KEY(pid, mid, role)
);
Indexes
Here are my indexes (I am somewhat guessing with these, but this was my attempt):
CREATE UNIQUE INDEX customerID on CUSTOMER(id);
CREATE INDEX customerName on CUSTOMER(fname, lname);
CREATE UNIQUE INDEX motorcycleID on MOTORCYCLE(id);
CREATE INDEX motorcycleName on MOTORCYCLE(name);
CREATE INDEX motorcycleYear on MOTORCYCLE(year);
CREATE INDEX salesCustomerMotorcycleID on SALES(cid, mid);
CREATE INDEX salesCustomerID on SALES(cid);
CREATE INDEX castsMotorcycleID on SALES(mid);
Queries
My query to find the customers purchasing bikes manufactured before 1970 and after 2010 is here:
SELECT fname, lname
FROM (SALES INNER JOIN CUSTOMER ON SALES.cid=CUSTOMER.id) INNER JOIN MOTORCYCLE ON MOTORCYCLE.id=SALES.mid
GROUP BY CUSTOMER.id
HAVING MIN(MOTORCYCLE.year) < 1970 AND MAX(MOTORCYCLE.year) > 2010;
And here is another working query which avoids the GROUP BY and HAVING clauses:
SELECT DISTINCT C.id, fname, lname
FROM (CUSTOMER as C inner join (SALES as S1 INNER JOIN MOTORCYCLE as M1 ON M1.id=S1.mid) on C.id=S1.cid) inner join (SALES as S2 inner join MOTORCYCLE as M2 on S2.mid=M2.id) on C.id=S2.cid
WHERE (M1.year < 1970 AND M2.year > 2010);
Any suggestions on the kinds of indexes I can use to speed up my query? Or should I change my query?
UPDATE
I found another query that also works, but it is also too slow. It has been added above. Still, it might be helpful when finding an index to speed it up.
When you check out your queries with EXPLAIN QUERY PLAN, you see that in both cases, the database looks up many related records before it filters out unneeded records (with unwanted years).
The following queries look up the motorcycle IDs before matching; which one is faster depends on the details of your data and must be measured by you:
SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
FROM Sales
WHERE cid = Customer.id
AND mid IN (SELECT id
FROM Motorcycle
WHERE year < 1970))
AND EXISTS (SELECT 1
FROM Sales
WHERE cid = Customer.id
AND mid IN (SELECT id
FROM Motorcycle
WHERE year > 2010));
SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
FROM Sales AS s1
JOIN Sales AS s2 ON s1.cid = s2.cid
WHERE s1.cid = Customer.id
AND s1.mid IN (SELECT id
FROM Motorcycle
WHERE year < 1970)
AND s2.mid IN (SELECT id
FROM Motorcycle
WHERE year > 2010));
SQL Fiddle
Why using group by when there's no using of aggregation function in the query?
Use distinct instead if you don't want to see any duplication

Get results that have the same data in the table

I need to get all the customer name where their preference MINPRICE and MAXPRICE is the same.
Here's my schema:
CREATE TABLE CUSTOMER (
PHONE VARCHAR(25) NOT NULL,
NAME VARCHAR(25),
CONSTRAINT CUSTOMER_PKEY PRIMARY KEY (PHONE),
);
CREATE TABLE PREFERENCE (
PHONE VARCHAR(25) NOT NULL,
ITEM VARCHAR(25) NOT NULL,
MAXPRICE NUMBER(8,2),
MINPRICE NUMBER(8,2),
CONSTRAINT PREFERENCE_PKEY PRIMARY KEY (PHONE, ITEM),
CONSTRAINT PREFERENCE_FKEY FOREIGN KEY (PHONE) REFERENCES CUSTOMER (PHONE)
);
I think I need to do some compare between rows and rows? or create another views to compare? any easy way to do this?
its one to many. a customer can have multiple preferences so i need to query a list of customer that have the same minprice and maxprice. compare between rows minprice=minprice and maxprice=maxprice
A self-join on preference would find rows with the same price preference, but a different phone number:
select distinct c1.name
, p1.minprice
, p1.maxprice
from preference p1
join preference p2
on p1.phone <> p2.phone
and p1.minprice = p2.minprice
and p1.maxprice = p2.maxprice
join customer c1
on c1.phone = p1.phone
join customer c2
on c2.phone = p2.phone
order by
p1.minprice
, p1.maxprice
, c1.name
It seems strange that you have minprice and maxprice in your preference table. Is that a table that you update after each transaction, such that each customer only has 1 active preference record? I mean, it reads like a customer could pay two different prices for the same item, which seems odd.
Assuming customer and preference are 1:1
SELECT c.*
FROM customer c INNER JOIN preference p ON c.phone = p.phone
WHERE p.minprice = p.maxprice
However, if a customer can have multiple preferences and you are looking for minprice = maxprice for ALL item ... then you could do this
SELECT c.*
FROM (SELECT phone, MIN(minprice) as allMin, MAX(maxprice) as allMax
FROM preference
GROUP BY phone) p INNER JOIN customer c on p.phone = c.phone
WHERE allMin = allMax
This will show all the customer names that have the same price preferences.
SELECT minprice, maxprice, GROUP_CONCAT(name) names
FROM preference
JOIN customer USING (phone)
GROUP BY minprice, maxprice
HAVING COUNT(*) > 1
The HAVING clause prevents it showing preferences that have no duplicates. If you want to see those single-customer preferences, remove that line.