What's the difference of SELECT and SELECT IN in sql? - sql

The difference of the two statements?
Is the second statemnet faster than the first statement?
First statement :
SELECT * FROM students WHERE id = 2197176;
SELECT * FROM students WHERE id = 74877;
Second statement:
SELECT * FROM students WHERE id IN(2197176, 74877, ...)
UPDATE:
If the time complexity of first statement is m*n, will the second statement be m*n?
m: the time complexity of SELECT * FROM students WHERE id = 2197176;.
n: the amount of ids.
UPDATE:
In the following two cases, which case is faster? And why?
Assuming the table is as follows:
| ID | FLAG |
| ----------|:------:|
| 2197176 | true |
| 74877 | false |
First case:
List ids = getIds();
for(id in ids){
result = getResultFromFirstStatement(id); //one sql statement
if(result.flag) { do sth ...}
}
Second case:
List ids = getIds();
results = getResultFromSecondStament(ids); //`n` sql statements
for(r in results){
if(r.flag) { do sth ...}
}

I ran execution plan on 3 different queries.
First query: Using UNION
Second query: Using UNION ALL
Third query: Using IN
USE AdventureWorksLT2012
-- First query using UNION
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID = 716
UNION
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID = 727
UNION
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID = 770
-- Second query using UNION ALL
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID = 716
UNION ALL
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID = 727
UNION ALL
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID = 770
-- Third query using IN
SELECT ProductID, Name FROM SalesLT.Product WHERE ProductID IN(716, 727, 770)
As you can see the UNION is using 53% (Because UNION tries to delete duplicates), UNION ALL is costing 34% and IN costs 14% of whole batch

First query
SELECT * FROM students WHERE id = 2197176 ..
returns rows with an id column value equal with specific value in this case 2197176 multiple select returns union of results.
In the second query
SELECT * FROM students WHERE id IN (2197176, 74877, ...);
returns rows where the id column value equals with 2197176 or 74877 or ... .
In equal parameter result of both query are the same records, but in readability and performance second query are better.

IN (val1,val2,val3, ...) is an abbreviated form of filtering predicates inside a WHERE clause and not directly related to the select keyword.
SELECT
column,list, ...
FROM
table
JOIN othertables ON ...
WHERE
table.id IN (1,2,3)
This could be re-written as:
SELECT
column,list, ...
FROM
table
JOIN othertables ON ...
WHERE
( table.id = 1
OR table.id = 2
OR table.id = 3
)

If the time complexity of first statement is m*n, will the second statement be m*n?
m: the time complexity of SELECT * FROM students WHERE id = 2197176;.
n: the amount of ids.
No. The second statement will be the same complexity as the first, but with a larger constant for n.
HOWEVER, if you have an index on the table for ID then complexity of the first is 1 and the second is n. (Which is still the same since n = 1 in the first case.)

Related

How add more rows when find string in column Oracle

Would it be possible to add more rows base on Keyword string in SQL ?
table A
PID PromotionName
1 OUT_EC_D10_V500K_FamilyCare_PROCO
2 OUT_EC_D5_V50K_Lunchbox_PROCO
3 OUT_EC_D5_V50K_PROCO
table B
promotion_code itm_name quantity
Lunchbox Item name 1 1
FamilyCare Item name 2 1
FamilyCare Item name 3 1
BUY1FREE6 Item name 4 1
HiSummer Item name 5 1
FamilyCare Item name 6 1
Example:
SELECT * FROM A where pid = '1';
Output of the SQL should be -
PID PromotionName Itm_name quantity
1 OUT_EC_D10_V500K_FamilyCare_PROCO
2 FamilyCare Item name 2 1
3 FamilyCare Item name 3 1
4 FamilyCare Item name 6 1
How to find string with keyword 'FamilyCare' in PromotionName of table A base on promotion_code of table B? If it exist it will add more rows in output
Any help with the SQL?
Here is how you can achieve this:
SELECT PID,PromotionName, '' as Itm_name, NULL as quantity
FROM A
WHERE pid = '1'
UNION
SELECT PID, PROMOTION_NAME, Itm_name, quantity
FROM
(SELECT * FROM A inner join B on a.promotionName LIKE '%'||b.promotion_name||'%')
WHERE pid='1'
You have to update your pid in both the places (before and after UNION).
Notice that tables were joined using LIKE operator with % before and after the word. Hence this joins if a part of a string is present in another column.
db<>fiddle link here
An option would be starting to construct a subquery factoring along with joining tables through a.promotionName LIKE '%'||b.promotion_code||'%' condition while filtering by b.promotion_code = 'FamilyCare', then add another query to combine the result sets by UNION ALL, and then enumerate with an id column by ROW_NUMBER() analytic function such as
WITH ab AS
(
SELECT a.*, b.*
FROM a
JOIN b
ON a.promotionName LIKE '%'||b.promotion_code||'%'
WHERE b.promotion_code = 'FamilyCare'
), ab2 AS
(
SELECT promotion_code, itm_name, quantity
FROM ab
UNION ALL
SELECT DISTINCT promotionName, NULL, NULL
FROM ab
)
SELECT ROW_NUMBER() OVER (ORDER BY itm_name NULLS FIRST) AS pid,
a.*
FROM ab2 a
if there's mismatch for the topmost query, then no row will be returned. eg. that query will check for the existence for the literal you provide
Demo

Select multiple rows from a table where field is the max date

I have a table called Product. I need to select all product records that have the MAX ManufatureDate.
Here is a sample of the table data:
Id ProductName ManufactureDate
1 Car 01-01-2015
2 Truck 05-01-2015
3 Computer 05-01-2015
4 Phone 02-01-2015
5 Chair 03-01-2015
This is what the result should be since the max date of all the records is 05-01-2015 and these 2 records have this max date:
Id ProductName ManufactureDate
2 Truck 05-01-2015
3 Computer 05-01-2015
The only way I can think of doing this is by first doing a query on the entire table to find out what the max date is and then store it in a variable #MaxManufatureDate. Then do a second query where ManufactureDate=#MaxManufactureDate. Something tells me there is a better way.
There are 1 million+ records in this table:
Here is the way I am currently doing it:
#MaxManufactureDate = select max(ManufactureDate) from Product
select * from Product where ManufactureDate = #MaxManufactureDate
If figure this is a lot better then doing a subselect in a where clause. Or is this the same exact thing as doing a subselect in a where clause? I am not sure if the query gets ran for each row regardless or if sqlserver stored the variable value in memory.
select * from product
where manufactureDate = (select max(manufactureDate) from product)
The inner select-statements selects the maximum date, the outer all products which have the date.
You can use a subQuery
SELECT *
FROM Product
WHERE ManufactureDate = (
SELECT ManufactureDate
FROM Product
ORDER BY ManufactureDate
LIMIT 1
);`
You may need to use ASC or DESC to collect the right order
Try this pattern:
SELECT Id, ProductName, ManufactureDate
FROM (
SELECT Id, ProductName, ManufactureDate, MAX(ManufactureDate)OVER() AS MaxManufactureDate
FROM Product P
) P
WHERE P.MaxManufactureDate = P.ManufactureDate
Essentially, use a window function to get the data you're looking for in the inline view, then use the where clause in the outer query to match them.

Compare data from query result to different table data in PostgreSQL

STEP 1
Select data1,name,phone,address from dummyTable limit 4;
From above query, I will get the following result for example:
data1 | name | phone | address
fgh | hjk | 567...| CA
ghjkk | jkjii| 555...| NY
Now, after having the above result I am suppose to match data1 records that I got from above query to existing another table in a database called existingTable which has a same column called data1 in it. If the result above gives data1 value as 'fgh' so I take that 'fgh' and compare with that existingtable column called data1.
STEP 2
Next, after I am finished comparing, I need to apply some condition as follows:
if((results.data1.value).equals(existingTable.data1.value))
then count --
else
count++
So by above condition I am trying to explain, that if the value I got from the result is matched then I do count decrement by 1 and if not then count is incremented by 1.
Summary
I basically wanted to achieve this in one single query, is it possible using PostgreSQL?
I think you can translate that to a simple query:
SELECT d.data1, d.name, d.phone, d.address
, count(*) - 2 * count(e.data1)
FROM (
SELECT data1, name, phone, address
FROM dummytable
-- ORDER BY ???
LIMIT 4
) d
LEFT JOIN existingtable e USING (data1)
GROUP BY d.data1, d.name, d.phone, d.address;
The major ingredient is the LEFT [OUTER] JOIN. Follow the link to the manual.
count(*) counts all rows from dummytable.
count(e.data1) only counts rows from existingtable where a matching data1 exists (count() does not count NULL values). I subtract that twice to match your formula.
About ORDER BY: There is no natural order in a database table. You need to order by something to get predictable results.
If there can be duplicates in existingtable but you want to count every distinct data1 only once, eliminate dupes before you join or use an EXISTS semi-join:
SELECT data1, name, phone, address
, count(*) - 2 * count(EXISTS (
SELECT 1 FROM existingtable e
WHERE e.data1 = d.data1) OR NULL)
FROM (
SELECT data1, name, phone, address
FROM dummytable
-- ORDER BY ???
LIMIT 4
) d
GROUP BY data1, name, phone, address;
The last count works because (TRUE OR NULL) IS TRUE, but (FALSE OR NULL) IS NULL.

return count 0 with mysql group by

database table like this
============================
= suburb_id | value
= 1 | 2
= 1 | 3
= 2 | 4
= 3 | 5
query is
SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id
however, while I run this query, it doesn't give COUNT(suburb_id) = 0 when suburb_id = 0
because in suburbs table, there is no suburb_id 4, I want this query to return 0 for suburb_id = 4, like
============================
= total | suburb_id
= 2 | 1
= 1 | 2
= 1 | 3
= 0 | 4
A GROUP BY needs rows to work with, so if you have no rows for a certain category, you are not going to get the count. Think of the where clause as limiting down the source rows before they are grouped together. The where clause is not providing a list of categories to group by.
What you could do is write a query to select the categories (suburbs) then do the count in a subquery. (I'm not sure what MySQL's support for this is like)
Something like:
SELECT
s.suburb_id,
(select count(*) from suburb_data d where d.suburb_id = s.suburb_id) as total
FROM
suburb_table s
WHERE
s.suburb_id in (1,2,3,4)
(MSSQL, apologies)
This:
SELECT id, COUNT(suburb_id)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
LEFT JOIN
suburbs s
ON s.suburb_id = ids.id
GROUP BY
id
or this:
SELECT id,
(
SELECT COUNT(*)
FROM suburb
WHERE suburb_id = id
)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
This article compares performance of the two approaches:
Aggregates: subqueries vs. GROUP BY
, though it does not matter much in your case, as you are querying only 4 records.
Query:
select case
when total is null then 0
else total
end as total_with_zeroes,
suburb_id
from (SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id) as dt
#geofftnz's solution works great if all conditions are simple like in this case. But I just had to solve a similar problem to generate a report where each column in the report is a different query. When you need to combine results from several select statements, then something like this might work.
You may have to programmatically create this query. Using left joins allows the query to return rows even if there are no matches to suburb_id with a given id. If your db supports it (which most do), you can use IFNULL to replace null with 0:
select IFNULL(a.count,0), IFNULL(b.count,0), IFNULL(c.count,0), IFNULL(d.count,0)
from (select count(suburb_id) as count from suburbs where id=1 group by suburb_id) a,
left join (select count(suburb_id) as count from suburbs where id=2 group by suburb_id) b on a.suburb_id=b.suburb_id
left join (select count(suburb_id) as count from suburbs where id=3 group by suburb_id) c on a.suburb_id=c.suburb_id
left join (select count(suburb_id) as count from suburbs where id=4 group by suburb_id) d on a.suburb_id=d.suburb_id;
The nice thing about this is that (if needed) each "left join" can use slightly different (possibly fairly complex) query.
Disclaimer: for large data sets, this type of query might have not perform very well (I don't write enough sql to know without investigating further), but at least it should give useful results ;-)

Select values in SQL that do not have other corresponding values except those that i search for

I have a table in my database:
Name | Element
1 2
1 3
4 2
4 3
4 5
I need to make a query that for a number of arguments will select the value of Name that has on the right side these and only these values.
E.g.:
arguments are 2 and 3, the query should return only 1 and not 4 (because 4 also has 5). For arguments 2,3,5 it should return 4.
My query looks like this:
SELECT name FROM aggregations WHERE (element=2 and name in (select name from aggregations where element=3))
What do i have to add to this query to make it not return 4?
A simple way to do it:
SELECT name
FROM aggregations
WHERE element IN (2,3)
GROUP BY name
HAVING COUNT(element) = 2
If you want to add more, you'll need to change both the IN (2,3) part and the HAVING part:
SELECT name
FROM aggregations
WHERE element IN (2,3,5)
GROUP BY name
HAVING COUNT(element) = 3
A more robust way would be to check for everything that isn't not in your set:
SELECT name
FROM aggregations
WHERE NOT EXISTS (
SELECT DISTINCT a.element
FROM aggregations a
WHERE a.element NOT IN (2,3,5)
AND a.name = aggregations.name
)
GROUP BY name
HAVING COUNT(element) = 3
It's not very efficient, though.
Create a temporary table, fill it with your values and query like this:
SELECT name
FROM (
SELECT DISTINCT name
FROM aggregations
) n
WHERE NOT EXISTS
(
SELECT 1
FROM (
SELECT element
FROM aggregations aii
WHERE aii.name = n.name
) ai
FULL OUTER JOIN
temptable tt
ON tt.element = ai.element
WHERE ai.element IS NULL OR tt.element IS NULL
)
This is more efficient than using COUNT(*), since it will stop checking a name as soon as it finds the first row that doesn't have a match (either in aggregations or in temptable)
This isn't tested, but usually I would do this with a query in my where clause for a small amount of data. Note that this is not efficient for large record counts.
SELECT ag1.Name FROM aggregations ag1
WHERE ag1.Element IN (2,3)
AND 0 = (select COUNT(ag2.Name)
FROM aggregatsions ag2
WHERE ag1.Name = ag2.Name
AND ag2.Element NOT IN (2,3)
)
GROUP BY ag1.name;
This says "Give me all of the names that have the elements I want, but have no records with elements I don't want"