SQL query multiple columns in SELECT - one needs to be DISTINCT

SQL query multiple columns in SELECT - one needs to be DISTINCT - sql

This is my SQL code:
SELECT
T.Template_Id, T.TemplateName,
CONVERT(NVARCHAR(MAX), T.CatalogDescription),
T.MasterNo, T.Customer_Id,
O.Quantity, O.Cost
FROM
Template as T
INNER JOIN
[Order] AS O ON T.Template_Id = O.Template_Id
ORDER BY
O.Cost
My problem is that none of the fields I'm selecting are unique, and I want to have T.Template_Id be DISTINCT, which I couldn't find a way to do. Other columns don't matter, as long as they're there and that the T.Template_Id column is DISTINCT (no duplicates).

Other fields don't matter.
If this is really true*, you can do it like this:
SELECT T.Template_Id, MAX(T.TemplateName) As TemplateName,
CONVERT(NVARCHAR(MAX),MAX(T.CatalogDescription)) As CatalogDescription,
MAX(T.MasterNo) As MasterNo, MAX(T.Customer_Id) As CustomerId,
MAX(O.Quantity) As Quantity, MAX(O.Cost) As Cost
FROM Template as T
INNER JOIN [Order] as O ON T.Template_Id=O.Template_Id
GROUP BY T.Template_Id
ORDER BY MAX(O.Cost)
It's a bit less unusual to see queries where it doesn't matter which corresponding Order fields are used, as long as you're using fields from the same Order record. In that case, you can do it like this:
SELECT T.Template_Id, T.TemplateName,
CONVERT(NVARCHAR(MAX),T.CatalogDescription),
T.MasterNo, T.Customer_Id, O.Quantity, O.Cost
FROM Template as T
CROSS APPLY (SELECT TOP 1 * FROM [Order] WHERE T.Template_Id=[Order].Template_Id) As O
ORDER BY O.Cost
Assuming, of course, that the records at least within the Template table are already unique based on the ID. This has the nice benefit of also making it easier to select which order is chosen, simply by adding an ORDER BY clause inside the nested query.
* Tip: It turns out this is rarely the case. You'll pretty much always find out that it does matter at some point, for at least one of the fields.

Typically you'd GROUP BY that column, but that requires specifying an aggregate function for all the other columns. In your case that may work since you say the other columns don't matter (which makes me wonder why they're being returned).
SELECT T.Template_Id, Max(T.TemplateName),
Max(CONVERT(NVARCHAR(MAX),T.CatalogDescription)), Max(T.MasterNo), Max(T.Customer_Id),
Max(O.Quantity), Max(O.Cost)
FROM Template as T INNER JOIN [Order] as O ON T.Template_Id=O.Template_Id
GROUP BY T.Template_Id
ORDER BY O.Cost

SQL will not allow you to aggregate on only specific fields in a dataset, they must all be aggregated. If it did then like VFP and other database engines that allow this, will just pick a row to fill in the other values.
If you are trying to achieve what I believe you are, then you want a list of all distinct values for the one field and just a sample of the other fields.
I have done this before using window functions such as Rank and Row_Number depending on exactly what I was trying to accomplish. This allows you to also choose your samples if you want such as ordering by OrderDate Desc to get sample fields from the most recent order for a customer.

Related

SQL Inner Join respect order of primary table (DBF files DBASE IV)

I'm integrating a solution with a point of sale software. I'm almost done just need to get the ordering/sequence correct.
I'm working in VB.net.
This is my query:
SELECT B.DESCRIPT AS description,
B.REF_NO AS upc,
A.QUANTY AS quantity,
ROUND((A.PRICE_PAID * (1+(C.TAX_PCT/100))),2) AS unit_price,
A.DEL_CODE AS discount_percent
FROM (TABLE.DBF A INNER JOIN MENU.DBF B ON B.REF_NO = A.REF_NO),
TAXTBL.DBF C
WHERE C.TAX_DESC='TAX'
And it yields a result something like this:
(sequence is an auto-incrementing column set to the datatable)
The results from the query are being ordered by the upc/REF_NO, due to the inner join between TABLE.DBF and MENU.DBF. When I take out the MENU.DBF components, I get the correct order from TABLE.DBF.
I need to make this query respect the order from TABLE.DBF. The items should be ordered as such:
(time_sent doesnt help because (1) its a batch of items and (2) multiple items can be added even within the same second)
Thanks for the help.

I am not sure about dbf, but in almost all SQL I have run into the order is an implementation detail, so you should not rely on it and should provide your own order.
That would mean using an ORDER BY xyz at the end of your query

How to reduce scope of subquery?

I've got SQL running on MS SQL Server similar to the following:
SELECT
CustNum,
Name,
FROM
Cust
LEFT JOIN (
SELECT
CustNum, MAX(OrderDate) as LastOrderDate
FROM
Orders
GROUP BY
CustNum) as Orders
ON Orders.CustNum = Cust.CustNum
WHERE
Region = 1
It contains a subquery to find the MAX record from a child table. The concern is that these tables have a very large number of rows. It seems like the subquery would operate on all the rows of the child table, even though only a very few of them are actually needed because of the WHERE clause on the outer query
Is there a way to reduce the scope of the inner query? Something like adding a WHERE clause to only include the records that are included in the outer query? Something like
WHERE CustomerOrders.CustomerNumber = Customers.CustomerNumber -- Customers from the outer query.
I suspect that this is not necessary, but I am getting some push back from another developer and I wanted to be sure (my SQL is a little rusty).

You are correct about the subquery. It will have to summarize all the data. You could re-write the query like this:
SELECT CustNum, Name, max(OrderDate) as LastOrderDate
FROM Cust LEFT JOIN
Orders
ON Orders.CustNum = Cust.CustNum
WHERE Region = 1
group by CustNum, Name
This would let the SQL optimizer choose the optimal path.
If you know that there are very, very few customers matching Region = 1 and you have an index on CustNum, OrderDate in Orders, you could write the query like this:
select CustNum, Name,
(select top 1 OrderDate
from Orders o
where Cust.CustNum = o.CustNum
order by OrderDate desc
) as LastOrderDate
from Cust
Where Region = 1
I think you would get a very similar effect by using cross apply.
By the way, I'm not a fan of re-writing queries for such purposes. But, I haven't found a SQL optimizer that would do anything other than summarize all the orders rows in this case.

No it's generally not necessary if your statistics etc are up to date. That's the job of the optimiser. You can try the CROSS APPLY operator if you think you're missing out on some shortcuts but generally if you have all constraints and stats it will be fine.
Your proposed additional WHERE might make sense to you, but as it doesn't correlate to anything in the actual query you posted it will change the results (if it works at all). If you want comments on that you need to post tables & relations etc.
Best way is to check the execution plan and see if it's doing anything dumb.

How to select all attributes in sql Join query

The following sql query below produces the specified result.
select product.product_no,product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
However my intensions are to further produce a more detailed result which will include all the attributes in the PRODUCT table. my approach is to list all the attributes in the first line of the query.
select product.product_no,product.product_date,product.product_colour,product.product_style,
product.product_age product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
Is there another way it can be done instead of listing all the attributes of PRoduct table one by one?

You can use * to select all columns from all tables, or you can use [table/alias].* to select all columns from the specified table. In your case, you can use product.*:
select product.*,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
It is important to note that you should only do this if you are 100% sure you need every single column, and always will. There are performance implications associated with this; if you're selecting 100 columns from a table when you really only need 4 or 5 of them, you're adding a lot of overhead to the query. The DBMS has to work harder, and you're also sending more data across the wire (if your database is not on the same machine as your executing code).
If any columns are later added to the product table, those columns will also be returned by this query in the future.

select
product.*,
salesteam.rep_name,
salesteam.SUPERVISOR_NAME
from product inner join salesteam on
product.product_rep=salesteam.rep_id
ORDER BY
product.Product_No;
This should do.

You can write like this
select P.* --- all Product columns
,S.* --- all salesteam columns
from product P
inner join salesteam S
on P.product_rep=S.rep_id
ORDER BY P.Product_No;

does the order of columns in a SQL select matters?

my question is regarding a left join I've tried to count how many people are tracking a certain project.
(there can be zero followers)
now the only way i can get it to work is by adding
group by idproject
my question is if the is a way to avoid using this and only selecting and implicitly
setting that group option.
SQL:
select `project_view`.`idproject` AS `idproject`,
count(`track`.`iduser`) AS `c`,`name`
from `project_view` left join `track` using(idproject)
I expected it count null as zero but it doesn't appear at all, if i neglect counting then it shows as null where there are no followers.

If you have a WHERE clause to specify a certain project then you don't need a GROUP BY.
SELECT project_view.idproject, COUNT(track.iduser) AS c, name
FROM project_view
LEFT JOIN track USING (idproject)
WHERE idproject = 4
If you want a count for each project then you do need a GROUP BY.
SELECT project_view.idproject, COUNT(track.iduser) AS c, name
FROM project_view
LEFT JOIN track USING (idproject)
GROUP BY idproject

Yes the order of selecting matters. For performance reasons you (typically) want your most limiting select first to narrow your data set. This makes every subsequent query operate on a smaller dataset.

COUNT in a query with multiple JOINS and a GROUP BY CLAUSE

I am working on a database that contains 3 tables:
A list of companies
A table of the products they sell
A table of prices they offered on each date
I'm doing a query like this in my php to generate a list of the companies offering the lowest prices on a certain product type on a certain date.
SELECT
a.name AS company,
c.id,
MIN(c.price) AS apy
FROM `companies` a
JOIN `company_products` b ON b.company_id = a.id
JOIN `product_prices` c ON c.product_id = b.id
WHERE
b.type = "%s"
AND c.date = "%s"
GROUP BY a.id
ORDER BY c.price ASC
LIMIT %d, %d
This gets me the data I need, but in order to implement a pager in PHP I need to know how many companies offering that product on that day there are in total. The LIMIT means that I only see the first few...
I tried changing the SELECT clause to SELECT COUNT(a.id) or SELECT COUNT(DISTINCT(a.id)) but neither of those seem to give me what I want. I tried removing the GROUP BY and ORDER BY in my count query, but that didn't work either. Any ideas?

Looks to me like you should GROUP BY a.id, c.id -- grouping by a.id only means you'll typically have several c.ids per a.id, and you're just getting a "random-ish" one of them. This seems like a question of basic correctness. Once you have fixed that, an initial SELECT COUNT(*) FROM etc etc should then definitely give you the number of rows the following query will return, so you can prepare your pager accordingly.

This website suggests MySQL has a special trick for this, at least as of version 4:
Luckily since MySQL 4.0.0 you can use SQL_CALC_FOUND_ROWS option in your query which will tell MySQL to count total number of rows disregarding LIMIT clause. You still need to execute a second query in order to retrieve row count, but it’s a simple query and not as complex as your query which retrieved the data.
Usage is pretty simple. In you main query you need to add SQL_CALC_FOUND_ROWS option just after SELECT and in second query you need to use FOUND_ROWS() function to get total number of rows. Queries would look like this:
SELECT SQL_CALC_FOUND_ROWS name, email
FROM users
WHERE name LIKE 'a%'
LIMIT 10;
SELECT FOUND_ROWS();
The only limitation is that you must call second query immediately after the first one because SQL_CALC_FOUND_ROWS does not save number of rows anywhere.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query multiple columns in SELECT - one needs to be DISTINCT - sql

Related

SQL Inner Join respect order of primary table (DBF files DBASE IV)

How to reduce scope of subquery?

How to select all attributes in sql Join query

does the order of columns in a SQL select matters?

COUNT in a query with multiple JOINS and a GROUP BY CLAUSE

Categories

Resources