Trying to avoid duplicated records from a query

Trying to avoid duplicated records from a query - sql

I have 2 tables with the following structure:
------------------------------------
| dbo.Katigories | dbo.Products |
|-----------------|------------------|
| product_id | product_id |
| Cat_Main_ID | other data..... |
| Cat_Sub_ID | other data..... |
| Cat_Sub_Sub_ID | other data..... |
| other data..... | other data..... |
I want to retrieve all the products from the dbo.Products table, having the same Cat_Main_ID and the same Cat_Sub_ID. To do that, I have the following SELECT statement:
SELECT * FROM dbo.katigories, dbo.Products
WHERE
dbo.katigories.Cat_Main_ID = (the Cat_Main_ID – exists_in-my url - query string)
AND
dbo.katigories.Cat_Sub_ID = (the Cat_Sub_ID – exists_in-my url - query string)
AND
dbo.katigories.product_id = dbo.Products.product_id
Unfortunately, this SELECT statement giving me duplicated records of products.
I know why this is happening: The reason is that some of the products belong simultaneously to many categories or subcategories. What I do not know is the way I can manage to get only unique records from the Products table. Only the unique product_id without duplicated.
Can someone please help with the correct syntax of my query?

In SQL Server, you can use this trick:
SELECT TOP (1) WITH TIES *
FROM dbo.katigories k JOIN
dbo.Products p
ON k.product_id = p.product_id
WHERE k.Cat_Main_ID = (the Cat_Main_ID – exists_in-my url - query string) AND
k.Cat_Sub_ID = (the Cat_Sub_ID – exists_in-my url - query string)
ORDER BY ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY NEWID());
In other databases, you would do the some thing very similar with ROW_NUMBER() in a subquery or CTE.
Notes:
SELECT * is dangerous, because you have columns with the same names.
Always use correct, proper, standard, explicit JOIN syntax. Never use commas in the FROM clause.
Table aliases make a query easier to write and to read.

I think you can add the directive 'DISTINCT' after the directive 'SELECT'

Related

SQL: SELECT same attribute in the table from different references

I'm new in SQL language and RDBMS. I'm trying to show the same column from the same table but using different references.
Problem
I have the following database and tables:
DATABASE: PRODUCTS
PROD (CodProd <PK>, Descr, PrecoUnit, QtdeEst)
PROD_SIM (CodProd <PK>, CodProdSim <PK>)
PROD_SIM (CodProd) REFERENCES PROD (CodProd)
PROD_SIM (CodProdSim) REFERENCES PROD (CodProd)
I was asked to show all the products desc that have other products considered similar to them, and show the desc column from the similar product too.
Example
I have the following data:
PROD:
(0, `spachetti`, 12.5, 2)
(1, `noodle`, 8.0, 4)
PROD_SIM:
(0, 1)
I want to show this:
+-----------+-------------+
| Product | SimilarProd |
+-----------+-------------+
| spaghetti | noodle |
+-----------+-------------+
Tried
SELECT PROD.Desc, PROD.Desc FROM PROD
INNER JOIN PROD_SIM ON PROD_SIM.CodProd = PROD.CodProd
But of course, this will not work, because I'm selecting the same table twice and I don't know how to specify from what reference should the selected column consider display Desc.

If I've understood your request and your schema correctly (I don't have enough reputation to comment), I think you're only one join away from achieving the desired result:
SELECT A.Desc original_product, B.Desc similar_product
FROM PROD A
INNER JOIN PROD_SIM S ON A.CodProd = S.CodProd
INNER JOIN PROD B ON S.CodProdSim = B.CodProd

update a single column with join lookups

I have a table adjustments with columns adjustable_id | adjustable_type | order_id
order_id is the target column to fill with values, this value should come from another table line_items which has a order_id column.
adjustable_id (int) and _type (varchar) references that table.
table: adjustments
id | adjustable_id | adjustable_type | order_id
------------------------------------------------
100 | 1 | line_item | NULL
101 | 2 | line_item | NULL
table: line_items
id | order_id | other | columns
--------------------------------
1 | 10 | bla | bla
2 | 20 | bla | bla
In the case above I guess I need a join query to update adjustments.order_id first row with value 10, second row with 20 and so on for the other rows using Postgres 9.3+.
In case the lookup fails, I need to delete invalid adjustments rows, for which they have no corresponding line_items.

There are two ways to do this. The first one using a co-related sub-query:
update adjustments a
set order_id = (select lorder_id
from line_items l
where l.id = a.adjustable_id)
where a.adjustable_type = 'line_item';
this is standard ANSI SQL as standard SQL does not define a join condition for the UPDATE statement.
The second way is using a join, which is a Postgres extension to the SQL standard (other DBMS also support that but with different semantics and syntax).
update adjustments a
set order_id = l.order_id
from line_items l
where l.id = a.adjustable_id
and a.adjustable_type = 'line_item';
The join is probably the faster one. Note that both versions (especially the first one) will only work if the join between line_items and adjustments will always return exactly one row from the line_items table. If that is not the case they will fail.
The reason why Arockia's query was "eating your RAM" is that his/her query creates a cross-join between table1 and table1 which is then joined against table2.
The Postgres manual contains a warning about that:
Note that the target table must not appear in the from_list, unless you intend a self-join

update a set A.name=B.name from table1 A join table2 B on
A.id=B.id

Why do WHERE and HAVING exist as separate clauses in SQL?

I understand the distinction between WHERE and HAVING in a SQL query, but I don't see why they are separate clauses. Couldn't they be combined into a single clause that could handle both aggregated and non-aggregated data?

Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.
Here's another rule: You can't use HAVING unless you also use GROUP BY.
The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done.
ANOTHER DIFFERENCE IS WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
Here's the difference:
SELECT `value` v FROM `table` WHERE `v`>5;
Error #1054 - Unknown column 'v' in 'where clause'
SELECT `value` v FROM `table` HAVING `v`>5; -- Get 5 rows
WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
This is because WHERE clause filters data before select, but HAVING clause filters data after select.
So put the conditions in WHERE clause will be more effective if you have many many rows in a table.
Try EXPLAIN to see the key difference:
EXPLAIN SELECT `value` v FROM `table` WHERE `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | table | range | value | value | 4 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
EXPLAIN SELECT `value` v FROM `table` having `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| 1 | SIMPLE | table | index | NULL | value | 4 | NULL | 10 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
You can see either WHERE or HAVING uses index, but the rows are different.
So there is a need of both of them especially when we need grouping and additional filters.

This question seems to illustrate a misunderstanding that WHERE and HAVING are both missing up to 1/2 of the information necessary to fully process a query.
Consider the following SQL:
drop table if exists foo; create table foo (
ID int,
bar int
); insert into foo values (1, 1);
select now() as d, bar as b
from foo
where bar = 1 and d <= now()
having bar = 1 and ID = 1
;
In the where clause, d is not available because the selected items have not been processed to create it yet.
In the having clause ID has been discarded because it was not selected. In aggregate queries ID may not even have meaning in context of multiple rows combined into one. ID may also be meaningless when joining different tables into a single result.

Could it be done? Sure, but on the back-end it'd do the same as it does now, because you have to aggregate something before you can filter based on that aggregation. Ultimately that's the reason, it's a logical separation of different processes. Why waste resources aggregating records you could have filtered with a WHERE?

The question could only be fully answered by the designer since it asks intent. But the implication is that both clauses do the same thing only against aggregated vs. non-aggregated data. That's not true. "The HAVING clause is typically used together with the GROUP BY clause to filter the results of aggregate values. However, HAVING can be specified without GROUP BY."
As I understand it, the important thing is that "The HAVING clause specifies additional filters that are applied after the WHERE clause filters."
http://technet.microsoft.com/en-us/library/ms179270(v=sql.105).aspx

'Implicit' JOIN based on schema's foreign keys?

Hello all :) I'm wondering if there is way to tell the database to look at the schema and infer the JOIN predicate:
+--------------+ +---------------+
| prices | | products |
+--------------+ +---------------+
| price_id (PK)| |-1| product_id(PK)|
| prod_id |*-| | weight |
| shop | +---------------+
| unit_price |
| qty |
+--------------+
Is there a way (preferably in Oracle 10g) to go from:
SELECT * FROM prices JOIN product ON prices.prod_id = products.product_id
to:
SELECT * FROM pricesIMPLICIT JOINproduct

The closest you can get to not writing the actual join condition is a natural join.
select * from t1 natural join t2
Oracle will look for columns with identical names and join by them (this is not true in your case). See the documentation on the SELECT statement:
A natural join is based on all columns in the two tables that have the same name. It selects rows from the two tables that have equal values in the relevant columns. If two columns with the same name do not have compatible data types, then an error is raised
This is very poor practice and I strongly recommend not using it on any environment

You shouldnt do that. Some db systems allow you to but what if you modify the fk's (i.e. add foreign keys)? You should always state what to join on to avoid problems. Most db systems won't even allow you to do an implicit join though (good!).

How can I find out our customer's favourite brand using a query?

I would like to generate a table (or query result) in this form
+---------------------+---------------------+
| Email | Favourite Brand ID |
+---------------------+---------------------+
| customer#gmail.com | 89 |
+- -+- -+
| another#gmail.com | 193 |
+- -+- -+
I have managed to write a query that generates a list of unique brand ID's with customer email addresses and the number of times that customer has purchased that brand. The results look something like this:
+---------------------+-----------+---------------+
| Email | Brand ID | CountOfOrders |
+---------------------+-----------+---------------+
| customer#gmail.com | 89 | 10 |
+- -+- -+- -+
| another#gmail.com | 193 | 32 |
+- -+- -+- -+
| duplicate#gmail.com | 20 | 2 |
+- -+- -+- -+
| duplicate#gmail.com | 47 | 5 |
+- -+- -+- -+
Obviously duplicate#gmail.com has purchased from BrandID 20 twice and BrandID 47 5 times which is why they appear twice. Most customers have purchased from more than one brand.
From this information how can I construct a query to get the brand ID they have purchased from the most? I have tried the following but it just times out:
SELECT [table1].Email, [table1].Brand, [table1].CountOfBrand
FROM [Customer Brand Purchases] AS [table1]
GROUP BY [table1].Email, [table1].Brand, [table1].CountOfBrand
WHERE [table1].CountOfBrand=(
SELECT TOP 1 [table2].CountOfBrand
FROM [Customer Brand Purchases] AS [table2]
WHERE [table2].Email = [table1].Email
ORDER BY [table2].CountOfBrand DESC
);
Oh and I have to use Microsoft Access, unfortunately. Thanks.

I am reluctant to answer this as the thought of even assisting in the development of a database with a table name [Customer Brand Purchases] makes me feel a little bit sick.
My Access SQL is a little rusty but I am 98% certain this will work:
SELECT CBP.Email, CBP.Brand AS [Favourite Brand ID]
FROM [Customer Brand Purchases] AS CBP
INNER JOIN
( SELECT [Email], MAX(CountofBrand) AS [MaxCountofBrand]
FROM [Customer Brand Purchases]
GROUP BY [Email]
) AS [MaxCBP]
ON CBP.Email = MaxCBP.Email
AND CBP.CountOfBrand = MaxCBP.MaxCountOfBrand
The only draw back is that if a particular customer has ordered 2 brands the same amount of times then it will return 2 rows. You would need additional subqueries with MAX statements in to resolve this.
EDIT/ADDENDUM:
If it is ABSOLUTELY imperative the query returns 1 result per email address then you need to allow for the scenario where a particular email address has purchased 2 brands and equal amount of times there is no way to establish which or these is favourite as they are joint favourite. If it were me I would deal with this at application level, and Concatenate favourite brands into one string. However, it can be done in SQL just be aware that one or more brands could be hidden using this:
SELECT CBP.Email, CBP.Brand AS [Favourite Brand ID]
FROM [Customer Brand Purchases] AS CBP
INNER JOIN
( SELECT CBP.Email, MAX(CBP.Brand) AS MaxBrand
FROM [Customer Brand Purchases] AS CBP
INNER JOIN
( SELECT [Email], MAX(CountofBrand) AS MaxCountofBrand
FROM [Customer Brand Purchases]
GROUP BY [Email]
) AS MaxCBP
ON CBP.Email = MaxCBP.Email
AND CBP.CountOfBrand = MaxCBP.MaxCountOfBrand
GROUP BY Email
) AS MaxCBP
ON CBP.Email = MaxCBP.Email
AND CBP.Brand = MaxCBP.Brand

So your GROUP BY clause should list the values that the aggregate data function MAX() should collapse into. I just did this in SQLite (because there's no way I'm going to open Microsoft Access):
sqlite> create table purchases ( email varchar(255), brand_id int, order_count int );
sqlite> select * from purchases;
sqlite> insert into purchases values( 'customer#gmail.com', 89, 10 );
sqlite> insert into purchases values( 'another#gmail.com', 193, 32 );
sqlite> insert into purchases values( 'duplicate#gmail.com', 20, 2 );
sqlite> insert into purchases values( 'duplicate#gmail.com', 47, 5 );
sqlite> select * from purchases
customer#gmail.com|89|10
another#gmail.com|193|32
duplicate#gmail.com|20|2
duplicate#gmail.com|47|5
sqlite> .mode column
sqlite> .headers on
sqlite> select email, brand_id, max( order_count )from purchases group by email;
email brand_id order_count
----------------- ---------- -----------
another#gmail.com 193 32
customer#gmail.co 89 10
duplicate#gmail.c 47 5
I believe that's what you're looking for, right?

Gareth's answer looked correct to me. I tested my own attempt using the data from your base query which I stored in a table I named Customer_Brand_Purchases. I also renamed the Brand_ID column.
SELECT
c1.Email,
c1.Brand_ID AS [Favourite Brand ID]
FROM
Customer_Brand_Purchases AS c1
INNER JOIN (
SELECT
Email,
Max(CountOfOrders) AS MaxOfCountOfOrders
FROM Customer_Brand_Purchases
GROUP BY Email
) AS c2
ON
(c1.Email = c2.Email)
AND (c1.CountOfOrders = c2.MaxOfCountOfOrders)
ORDER BY c1.Email;
I can confirm this works in Access 2007, meaning it produces the output I think you want without error messages.
Email Favourite Brand ID
another#gmail.com 193
customer#gmail.com 89
duplicate#gmail.com 47
However my query is nearly the same as Gareth's version. The only reason I can offer why mine might work for you when his doesn't is that I avoided using square brackets within the subquery.
In certain situations (the details of which are unclear to me), Access' query designer will rewrite a subquery from this form:
SELECT q.* FROM (SELECT something FROM YourTable) AS q
to this ...
SELECT q.* FROM [SELECT something FROM YourTable]. AS q
And in that second form, the db engine will choke if the subquery includes square brackets. Incidentally, this is one reason to avoid using object names which require bracketing ... such as names which include spaces.
OTOH, if my version also fails for you, I suspect your base query source is too complex for the db engine to cope with when you use it here. If so, follow Philippe's advice to build on the original source tables rather than the [Customer Brand Purchases] query.

Following one of the comments you made, it seems that you are here buidling queries on queries.
If you really want to know your customer's favorite brands, I am sure it would be a lot easier to go back to the original tables, building a query on your clients, order lines, product references, and 'brand' tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Trying to avoid duplicated records from a query - sql

I think you can add the directive 'DISTINCT' after the directive 'SELECT'

Related

SQL: SELECT same attribute in the table from different references

update a single column with join lookups

Why do WHERE and HAVING exist as separate clauses in SQL?

'Implicit' JOIN based on schema's foreign keys?

How can I find out our customer's favourite brand using a query?

Categories

Resources