SQL query to match similar customers - sql

I'm trying to find the query in order to match similar customers.
To simplify the situation consider this scenario:
I have a table which contains a customer name and product purchased.
customer name can have multiple purchases of same and different products.
So firstly I can take distinct customer name and product name, so I see all customers and all products they purchased at least once.
Now I want a query to show me a sort of matching customers, according to the product they both purchased, so I want to count the similar products they purchased.
So I want to see for each pair of customers (pairing all the table) the amount of similar product they purchased.
Lets say the raw data is:
CustomerName | ProductName
A | 1
A | 2
A | 1
A | 3
B | 1
B | 2
B | 4
C | 2
Then I want to see the result of:
CustomerName1 | CustomerName2 | CountSimilarity
A | B | 2
A | C | 1
B | C | 1
And so on for all pairs of customers that have at least 1 similar product purchasing
Any suggestions how to approach this query?
The environment is SQL Server.
Thanks

Here is a self join approach:
SELECT t1.CustomerName, t2.CustomerName, COUNT(*) AS CountSimilarity
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.ProductName = t2.ProductName
WHERE
t1.CustomerName < t2.CustomerName
GROUP BY
t1.CustomerName, t2.CustomerName;
Two records are joined together above if their products match. Note that the inequality in the WHERE clause ensures that customer pairs do not appear in duplicate.

Related

How to count number of rows that corresponds to the ID given two tables in SQL?

I have two tables: 1) Places 2) Reviews
Table examples are below:
PLACES
ID | NAME
============
1 | Joe
2 | Cat
3 | Dog
REVIEWS
PLACE_ID | REVIEW_ID| REVIEW_CONTENT
====================================
1 | 1000 | "it's good"
1 | 1001 | "aweful place"
3 | 1002 | "good place"
PLACE_ID is my foreign key and I want to count number of review contents per each ID in PLACES table.
As you can see,
there are 2 review contents in REVIEWS table for place id 1 ("Joe")
there are 0 review contents in REVIEWS table for place id 2 ("Cat")
there are 1 review contents in REVIEWS table for place id 3 ("Dog")
The result should look like
RESULT
PLACE_ID | NAME | COUNT
=======================
1 | Joe | 2
2 | Cat | 0
3 | Dog | 1
Can someone please help how to count number of rows (e.g number of review contents) that has same foreign key (e.g. PLACE_ID), given two tables?
This is basic SQL. Please do some reading on simple aggregations.
SELECT P.ID as PLACE_ID,
P.NAME as NAME,
COUNT(R.ID) as COUNT
FROM PLACES P
LEFT JOIN REVIEWS R
ON P.ID = R.PLACE_ID
You can try the below - using left join and aggregation
SELECT p.id, p.name,count(r.id) as cnt
from place p left join reviews ON p.id = r.place_id
group by p.id, p.name
Simple Join both the tables and perform a aggregation to count the number of reviews for each ID available in Place table. You can find the code below.
Select A.PLACE_ID,
A.Name,
count(REVIEW_ID) COUNT
From Places A
Left Join Reviews B
on A.ID = B.PLACE_ID
group by A.PLACE_ID,
A.Name

Why is INNER JOIN producing more records than original file?

I have two tables. Table A & Table B. Table A has 40516 rows, and records sales by seller_id. The first column in Table A is the seller_id that repeats every time a sale is made.
Example: Table A (40516 rows)
seller_id | item | cost
------------------------
1 | dog | 5000
1 | cat | 50
4 |lizard| 80
5 |bird | 20
5 |fish | 90
The seller_id is also present in Table B, and also contains the corresponding name of the seller.
Example: Table B (5851 rows)
seller_id | seller_name
-------------------------
1 | Dog and Cat World INC
4 | Reptile Love.com
5 | Ocean Dogs Inc
I want to join these two tables, but only display the seller name from Table B and all other columns from Table A. When I do this with an INNER JOIN I get 40864 rows (348 extra rows). Shouldn't the query produce only the original 40516 rows?
Also not sure if this matters, but the seller_id can contain several zeros before the number (e.g., 0000845, 0000549).
I've looked around on here and haven't really found an answer. I've tried LEFT and RIGHT joins and get the same results for one and way more results for the other.
SQL Code Example:
SELECT public.table_B.seller_name, *
FROM public.table_A
INNER JOIN public.table_B ON public.table_A.seller_id =
public.table_B.seller_id;
Expected Results:
seller_name | seller_id | item | cost
------------------------------------------------
Dog and Cat World INC | 1 | dog | 5000
Dog and Cat World INC | 1 | cat | 50
Reptile Love.com | 4 |lizard| 80
Ocean Dogs Inc | 5 |bird | 20
Ocean Dogs Inc | 5 |fish | 90
I expected the results to contain the same number of rows in Table A. Instead I gut names matching up and an additional 348 rows...
Update:
I changed "unique_id" to "seller_id" in the question.
I guess I should have chosen a better name for unique_id in the original example. I didn't mean it to be unique in the sense of a key. It is just the seller's id that repeats every time there is a sale (in Table A). The seller's ID does repeat in Table A because it is supposed to. I simply want to pair up the seller IDs with the seller names.
Thanks again everyone for their help!
unique_id is already not correctly named in the first table, so there is no reason to assume it is unique in the second table either.
Run this query to find the duplicates:
select unique_id
from table_b
group by unique_id
having count(*) > 1;
You can fix the query using distinct on:
SELECT b.seller_name, a.*
FROM public.table_A a JOIN
(SELECT DISTINCT ON (b.unique_id) b.*
FROM public.table_B b
ORDER BY b.unique_id
) b
ON a.unique_id = b.unique_id;
In this case, you may get fewer records, if there are no matches. To fix that, use a LEFT JOIN.
Because unique id column is not unique.
Gordon Linoff was correct. The seller_id (formerly listed as unique_id) was indeed duplicated throughout the data set. I foolishly assumed otherwise. Also the seller_name had many duplicates too! In the end I had to use the CONCAT() function to join the seller_id with second identifier to create a type of foreign key. After I did this the join worked as expected. Thanks everyone!

UNION or JOIN for SELECT from multiple tables

My Issue
I am trying to select one row from multiple tables based on parameters, but my limited knowledge of SQL joining is holding me back. Could somebody possibly point me in the right direction?
Consider these table structures:
+-----------------------+ +---------------------+
| Customers | | Sellers |
+-------------+---------+ +-----------+---------+
| Customer_ID | Warning | | Seller_ID | Warning |
+-------------+---------+ +-----------+---------+
| 00001 | Test 1 | | 00008 | Testing |
| 00002 | Test 2 | | 00010 | Testing |
+-------------+---------+ +-----------+---------+
What I would like to do is one SELECT to retrieve only one row, and in this row will be the 'Warning' field for each of the tables based on the X_ID field.
Desired Results
So, if I submitted the following information, I would receive the following results:
Example 1:
Customer_ID = 00001
Seller_ID = 00008
Results:
+-----------------------------------+
| Customer_Warning | Seller_Warning |
+------------------+----------------+
| Test 1 | Testing |
+------------------+----------------+
Example 2:
Customer_ID = 00001
Seller_ID = 00200
Results:
+-----------------------------------+
| Customer_Warning | Seller_Warning |
+------------------+----------------+
| Test 1 | NULL |
+------------------+----------------+
What I Have Tried
This is my current code (I am receiving loads of rows):
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c,Sellers s
WHERE c.Customer_ID = #Customer_ID
OR s.Seller_ID = #Seller_ID
But I have also played around with UNION, UNION ALL and JOIN. Which method should I go for?
Since you're not really joining tables together, just selecting a single row from each, you could do this:
SELECT
(SELECT Warning
FROM Customers
WHERE Customer_ID = #Customer_ID) AS Customer_Warning,
(SELECT Warning
FROM Sellers
WHERE Seller_ID = #Seller_ID) AS Seller_Warning
The problem is you're getting a cartesian product of rows in each table where either column has the value you're looking for.
I think you just want AND instead of OR:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c
JOIN Sellers s
ON c.Customer_ID = #Customer_ID
AND s.Seller_ID = #Seller_ID
If performance isn't good enough you could join two filtered subqueries:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM (SELECT Warnning FROM Customers WHERE c.Customer_ID = #Customer_ID) c,
(SELECT Warning FROM Sellers s WHERE s.Seller_ID = #Seller_ID) s
But I suspect SQL will be able to optimize the filtered join just fine.
it wont return a row if one of the ID's doesnt exist.
Then you want a FULL OUTER JOIN:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c
FULL OUTER JOIN Sellers s
ON c.Customer_ID = #Customer_ID
AND s.Seller_ID = #Seller_ID
The problem that you are facing is that when one of the tables has no rows, you are going to get no rows out.
I would suggest solving this with a full outer join:
SELECT c.Warning as Customer_Warning, s.Warning AS Seller_Warning
FROM Customers c FULL OUTER JOIN
Sellers s
ON c.Customer_ID = #Customer_ID AND s.Seller_ID = #Seller_ID;
Also, I strongly discourage you from using single quotes for column aliases. Use single quotes only for string and date constants. Using them for column names can lead to confusion. In this case, you don't need delimiters on the names at all.
What I have seen so far here are working examples for your scenario. However, there is no real sense behind putting unrelated data together in one row. I would propose using a UNION and separate the values in your code:
SELECT 'C' AS Type, c.Warning
FROM Customers c
WHERE c.Customer_ID = #Customer_ID
UNION
SELECT 'S' AS Type, s.Warning
FROM Sellers s
WHERE s.Seller_ID = #Seller_ID
You can use the flag to distinguish the warnings in your code. This will be more efficient then joining or sub queries and will be easy to understand later on (when refactoring). I know this is not 100% what you ask for in your question but that's why I challenge the question :)

Oracle WITH clause and grouping by results

I have two tables: a product table and a territory table. The product tables holds IDs of products and the territory code denoting which countries they can be sold in:
PRODUCT:
PRODUCT_ID | TERRITORY_CODE
----------------------------
PROD1 | 2
PROD2 | 0
PROD3 | 1
PROD4 | 0
PROD5 | 2
PROD6 | 0
PROD7 | 2
The second table table holds a territory code and the corresponding ISO code of countries it's allowed to be sold in. For example:
TERRITORY:
TERRITORY_CODE | COUNTRY_CODE
---------------------------
0 | US
1 | CA
2 | US
2 | CA
I would like to write a query that counts the number of PRODUCT_IDs using COUNTRY_CODE as a key.
For example, I want to know how many distinct products there are for sale in the US. I don't want to have to know that 0 and 2 are territory codes that contain the US, I just want to look up by COUNTRY_CODE. How can I do this?
In some preliminary research, I've found that a WITH clause may be useful, and came up with the following query:
WITH country AS (
SELECT (DISTINCT COUNTRY_CODE)
FROM TERRITORY
)
SELECT COUNT(DISTINCT PRODUCT_ID)
FROM country c,
PRODUCT p
WHERE p.TERRITORY_CODE=c.TERRITORY_ID;
However, this doesn't produce the expected result. I also can't get it to group by COUNTRY_CODE. What am I doing wrong?
Looks like you need to use GROUP BY. Try something like this:
SELECT T.Country_Code, COUNT(DISTINCT PRODUCT_ID)
FROM Product P
JOIN Territory T ON P.Territory_Code = T.Territory_Code
GROUP BY T.Country_Code
And the SQL Fiddle.
Good luck.

Using foreign key

I am new to foreign key, but I understand the concept very well.
I have found lot of documentation on how to create / delete them but not how to use them. My schema is as follows.
Stock table:
PartID | Model | Type | Vendor
------------------------------
1 | DDr2 | RAM | shop1
2 | DDr3 | RAM | shop1
3 | WD1 | HDD | shop2
4 | WD2 | HDD | shop2
Then product Table
ProdID | Name | PartID1 | PartID2 ...
1 | PC1 | 1 | 2
2 | PC1 | 3 | 4
How do I use select to get
| PC1 | DDr2 | DDR3 |
| PC1 | WD1 | WD2 |
with PartID2 and PartID3 foreign key linked to PartID primary key?
The concept of Foreign Keys is to link the IDs in one table to the lisk of unique IDs in another. In your example, you have unique parts with unique IDs and Products that can use those parts, so in your product table, you could have multiple part IDs being used in multiple rows.
Foreign Keys are used to keep referential integrity in your database, you can use joins to get the Data you want:
SELECT A.NAME,
B.Model,
C.Model
FROM PRODUCTS A
INNER JOIN PARTS B ON B.PARTID1 = A.PARTID
INNER JOIN PARTS C ON C.PARTID1 = A.PARTID
WHERE A.PRODID = 1
The short answer is you could do
select p.name, a.model as part1, b.model as part2, c.model as part3
from product p, stock a, stock b, stock c
where p.partid1 = a.partid and p.partid2 = b.partid and p.partid3 = c.partid
The longer answer is that this isn't really a good table design for what you're trying to do. It assumes that you always have a fixed number of parts for any item (or at least no more than some fixed number). A better design would be:
Part Table:
partID | model | type | vendor
Product Table:
productID | name
Product_Parts Table:
productID | partID
where productID in Product_parts is a foreign key into Product and partID is a foreign key into the Part table.
SELECT s1.Name, p1.Model, p2.Model FROM stock st
INNER JOIN product p1
ON st.PartID1 = p1.PartID1
INNER JOIN product p2
ON st.PartID2 = p2.PartID1
Take one JOIN at the time first join stock and parts table
then again join result of this join to parts table.
SQL parser will use parts table as two separate tables an so you can have two results from same tabe in single row.
you can join in a table more than once in the same sql statement. in this case, you need to join your stock table twice, once to get the name of each part in your product.
SELECT pr.ProdID, s1.Model, s2.Model
FROM Product pr, Stock s1, Stock s2
WHERE pr.PartID1 = s1.PartID
AND pr.PartID2 = s2.PartID
Using a LEFT OUTER JOIN means that the product will still be returned event if the Part1ID or Part2ID values are set to NULL.
SELECT P.Name,
S1.Model,
S2.Model
FROM Product P
LEFT OUTER JOIN Stock S1 ON P.PartID1 = S1.PartID
INNER JOIN Stock S2 ON P.PartID2 = S2.PartID