SQL Join and contains - sql

I'm struggling with this query:
SELECT *
FROM Transactions
WHERE CustomerID IN (SELECT ID FROM Customers
WHERE Name LIKE '%Test%')
It takes 10 seconds to run, however if I create the query manually by taking the 4 values returned by the sub query it runs in milliseconds, for example:
SELECT *
FROM Transactions
WHERE (CustomerID = 1 OR CustomerID = 2 OR
CustomerID = 3 OR CustomerID = 4)
To clarify, running
SELECT ID FROM Customers WHERE Name LIKE '%Test%'
returns the values 1,2,3,4 immediately
Any ideas? What am I missing?

As you already said, when you have the customer id's it runs in milliseconds so the filtering of the customer name is the problem.
The first wildcard (WHERE name LIKE '%Test%') is the suspect here because sql server needs to read all the strings in the name column like a regular expression and find if there is any "Test" in there for every row in the table!
If the names you are filtering for would always start with a "Test" and you could do a WHERE name LIKE 'Test%' it would work much better because sql server only needs to read the start of each string.
Edit:
Here is a little bit different version of the original query if you want to try:
SELECT * FROM Transactions t
WHERE EXISTS (
SELECT 1 FROM
Customers c
WHERE c.ID = t.CustomerID
AND c.Name LIKE '%Test%'
)

What happens with a join?
SELECT t.*
FROM Transactions t JOIN
Customers c
ON t.CustomerID = c.ID
WHERE c.Name Like '%Test%';
Sometimes, JOINs optimize better than IN.

Related

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

can I use a variable for the integer expression in a left sql function

I have the following query:
SELECT top 2500 *
FROM table a
LEFT JOIN table b
ON a.employee_id = b.employee_id
WHERE left(a.employee_rc,6) IN
(
SELECT employeeID, access
FROM accesslist
WHERE employeeID = '#client.id#'
)
The sub select in the where clause can return one or several access values, ex:
js1234 BLKHSA
js1234 HDF48R7
js1234 BLN6
In the primary where clause I need to be able to change the integer expression from 6 to 5 or 4 or 7 depending on what the length of the values returned in the sub select. I am at a loss if this is the right way to go about it. I have tried using OR statements but it really slows down the query.
Try using exists instead:
SELECT top 2500 *
FROM table a LEFT JOIN
table b
ON a.employee_id = b.employee_id
WHERE EXISTS (Select 1
FROM accesslist
WHERE employeeID = '#client.id#' and
a.employee_rc like concat(employeeID, '%')
) ;
I don't see how your original query worked. The subquery is returning two columns and that normally isn't allowed in SQL for an in.
Move the subquery to a JOIN:
SELECT TOP 2500 *
FROM table a
LEFT JOIN table b ON a.employee_id = b.employee_id
LEFT JOIN accesslist al ON al.access LIKE concat('%', a.employee_id)
WHERE al.employeeID = '#client.id#'
Like Gordon, I don't quite see how your query worked, so I'm not quite sure if it should be access or employeeID which is matched.
This construct will enable you to do what you said you want to do, have an integer value depend on somethign from a subquery. It's the general idea only, the details are up to you.
select field1, field2
, case when subqueryField1 = 'fred' then 1
when subqueryField1 = 'barney' then 2
else 3 end integerValue
from table1 t1 join (
select idField subqueryField1, etc
from whereever ) t2 on t1.idFeld = t2.idField
where whatever
Also, a couple of things in your query are questionable. First, a top n query without an order by clause doesn't tell the database what records to return. Second, 2500 rows is a lot of data to return to ColdFusion. Are you sure you need it all? Third, selecting * instead of just the fields you need slows down performance. If you think you need every field, think again. Since the employee ids will always match, you don't need both of them.

Compare 2 tables and find the missing record

I have 2 database tables:
customers and customers_1
I have 100 customers in the customers table but only 99 customers in the customers_1 table. I would like to write a query that will compare the 2 tables and will result in the missing row.
I have tried this following SQL:
select * from customers c where in (select * from customers_1)
But this will only check for the one table.
Your query shouldn't work this way. You have to compare one column to another and use NOT IN instead of IN:
select *
from customers c
where customerid not in (select customerid from customers_1)
However, Since you are on SQL Server 2008, you can use EXCEPT:
SELECT * FROM customers
EXCEPT
SELECT * FROM customers_1;
This will give you the rows which are in the customers table that are not in customers_1 table:
EXCEPT returns any distinct values from the left query that are not
also found on the right query.
This is easy. Just join them with a left outer join and check for NULL in the table which has the 99 rows. It will look something like this.
SELECT * FROM customers c
LEFT JOIN customers1 c1 ON c.some_key = c1.some_key
WHERE c1.some_key IS NULL
Instead of NOT IN clause consider using NOT EXISTS. NOT EXISTS clause performs better in this particular scenario. Your query would look like:
SELECT * FROM Customer c WHERE NOT EXISTS (SELECT 1 FROM Customer_1 c1 WHERE c.Customer_Id = c1.Customer_Id)
SELECT 1 is just for readability so everyone will know that I don't care about the actual data.

SQL query SELECT FROM 2 tables - equals returns correct results, but need not equals

Basically, I want to get all rows from the customers table that do NOT appear in the brochure_requests table.
SELECT *
FROM customers JOIN brochure_requests
WHERE brochure_requests.first_name != customers.customer_first_name
AND brochure_requests.last_name != customers.customer_last_name
The query works when the parameters are =, but as soon as I run a != query, the program (HeidiSQL) hangs indefinitely or until I cancel it.
don't you have a customerID in the brochure_requests table?
If you do, you can do something like this:
select * from customers
where customerId not in (select customerId from brochure_requests)
Use NOT EXISTS, e.g.
SELECT *
FROM customers
WHERE NOT EXISTS (
SELECT 1
FROM brochure_requests
WHERE brochure_requests.first_name = customers.customer_first_name
AND brochure_requests.last_name = customers.customer_last_name)
I would also suggest adding an index on the brochure_requests.first_name and brochure_requests.last_name fields for improved performance.
SELECT
*
FROM
customers
LEFT JOIN brochure_requests
ON brochure_requests.first_name = customers.customer_first_name
AND brochure_requests.last_name = customers.customer_last_name
WHERE
brochure_requests.first_name IS NULL
Also, consider normalising your database by adding CustomerID to brochure_requests as a foreign key instead of duplicating the first and last names.

How to avoid large in clause?

I have 3 tables :
table_product (30 000 row)
---------
ID
label
_
table_period (225 000 row)
---------
ID
date_start
date_end
default_price
FK_ID_product
and
table_special_offer (10 000 row)
-----
ID
label
date_start,
date_end,
special_offer_price
FK_ID_period
So I need to load data from all these table, so here it's what I do :
1/ load data from "table_product" like this
select *
from table_product
where label like 'gun%'
2/ load data from "table_period" like this
select *
from table_period
where FK_ID_product IN(list of all the ids selected in the 1)
3/ load data from "table_special_offer" like this
select *
from table_special_offer
where FK_ID_period IN(list of all the ids selected in the 2)
As you may think the IN clause in the point 3 can be very very big (like 75 000 big), so I got a lot of chance of getting either a timeout or something like " An expression services limit has been reached".
Have you ever had something like this, and how did you manage to avoid it ?
PS :
the context : SQL server 2005, .net 2.0
(please don't tell me my design is bad, or I shouldn't do "select *", I just simplified my problem so it is a little bit simpler than 500 pages describing my business).
Thanks.
Switch to using joins:
SELECT <FieldList>
FROM Table_Product prod
JOIN Table_Period per ON prod.Id = per.FK_ID_Product
JOIN Table_Special_Offer spec ON per.ID = spec.FK_ID_Period
WHERE prod.label LIKE 'gun%'
Something you should be aware of is the difference of IN vs JOIN vs EXISTS - great article here.
In finally have my answer : table variable (a bit like #smirkingman's solution but not with cte) so:
declare #product(id int primary key,label nvarchar(max))
declare #period(id int primary key,date_start datetime,date_end datetime,defaultprice real)
declare #special_offer(id int,date_start datetime,date_end datetime,special_offer_price real)
insert into #product
select *
from table_product
where label like 'gun%'
insert into #period
select *
from table_period
where exists(
select * from #product p where p.id = table_period.FK_id_product
)
insert into #special_offer
select *
from table_special_offer
where exists(
select * from #period p where p.id = table_special_offer.fk_id_period
)
select * from #product
select * from #period
select * from #special_offer
this is for the sql, and with c# I use ExecuteReader, Read, and NextResult of the class sqldatareader
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqldatareader.aspx
I got all I want :
- my datas
- i don't have too much data (unlike the solutions with join)
- i don't execute twice the same query (like solution with subquery)
- i don't have to change my mapping code (1row = 1 business object)
Don't use explicit list of values in IN clause. Instead, write your query like
... FK_ID_product IN (select ID
from table_product
where label like 'gun%')
SELECT *
FROM
table_product tp
INNER JOIN table_period tper
ON tp.ID = tper.FK_ID_product
INNER JOIN table_special_offer so
ON tper.ID = so.FK_ID_period
WHERE
tp.label like 'gun%'"
First some code...
Using JOIN:
SELECT
table_product.* --'Explicit table calls just for organisation sake'
, table_period.*
, table_special_offer.*
FROM
table_product
INNER JOIN table_period
ON table_product.ID = table_period.FK_ID_product
INNER JOIN table_special_offer
ON table_period.ID = table_special_offer.FK_ID_period
WHERE
tp.label like 'gun%'"
Using IN :
SELECT
*
FROM
table_special_offer
WHERE FK_ID_period IN
(
SELECT
FK_ID_period
FROM
table_period
WHERE FK_ID_product IN
(
SELECT
FK_ID_product
FROM
table_product
WHERE label like '%gun'
) AS ProductSub
) AS PeriodSub
Depending on how well your tables get indexed both can be used. Inner Joins as the others have suggested are definitely efficient at doing your query and returning all data for the 3 tables. If you are only needing To use the ID's from table_product and table_period Then using the nested "IN" statements can be good for adapting search criteria on indexed tables (Using IN can be ok if the criteria used are integers like I assume your FK_ID_product is).
An important thing to remember is every database and relational table setup is going to act differently, you wont have the same optimised results in one db to another. Try ALL the possibilities at hand and use the one that is best for you. The query analyser can be incredibly useful in times like these when you need to check performance.
I had this situation when we were trying to join up customer accounts to their appropriate addresses via an ID join and a linked table based condition (we had another table which showed customers with certain equipment which we had to do a string search on.) Strangely enough it was quicker for us to use both methods in the one query:
--The query with the WHERE Desc LIKE '%Equipment%' was "joined" to the client table using the IN clause and then this was joined onto the addresses table:
SELECT
Address.*
, Customers_Filtered.*
FROM
Address AS Address
INNER JOIN
(SELECT Customers.* FROM Customers WHERE ID IN (SELECT CustomerID FROM Equipment WHERE Desc LIKE '%Equipment search here%') AS Equipment ) AS Customers_Filtered
ON Address.CustomerID = Customers_Filtered.ID
This style of query (I apologise if my syntax isn't exactly correct) ended up being more efficient and easier to organise after the overall query got more complicated.
Hope this has helped - Follow #AdaTheDev 's article link, definitely a good resource.
A JOIN gives you the same results.
SELECT so.Col1
, so.Col2
FROM table_product pt
INNER JOIN table_period pd ON pd.FK_ID_product = pt.ID_product
INNER JOIN table_special_offer so ON so.FK_ID_Period = pd.ID_Period
WHERE pt.lable LIKE 'gun%'
I'd be interested to know if this might make an improvement:
WITH products(prdid) AS (
SELECT
ID
FROM
table_product
WHERE
label like 'gun%'
),
periods(perid) AS (
SELECT
ID
FROM
table_period
INNER JOIN products
ON id = prdid
),
offers(offid) AS (
SELECT
ID
FROM
table_special_offer
INNER JOIN periods
ON id = perid
)
... just a suggestion...