I'm learning SQLite3 by means of a book ("Using SQLite") and the Northwind database. I have written the following code to order the customers by the number of customers in their city, then alphabetically by their name.
SELECT ContactName, Phone, City as originalCity
FROM Customers
ORDER BY (
SELECT count(*)
FROM Customers
WHERE city=originalCity)
DESC, ContactName ASC
It takes about 50-100ms to run. Is there a standard procedure to follow to optimize this query, or more generally, queries of its type?
In the most general case, query optimization starts with reading the query optimizer's execution plan. In SQLite, you just use
EXPLAIN QUERY PLAN statement
In your case,
EXPLAIN QUERY PLAN
SELECT ContactName, Phone, City as originalCity
FROM Customers
ORDER BY (
SELECT count(*)
FROM Customers
WHERE city=originalCity)
DESC, ContactName ASC
You might also need to read the output of
EXPLAIN statement
which goes into more low-level detail.
In general (not only SQLite), it's better to do the count for all values (cities) at once, and a join to construct the query:
SELECT ContactName, Phone, Customers.City as originalCity
FROM Customers
JOIN (SELECT city, count(*) cnt
FROM Customers
GROUP BY city) Customers_City_Count
ON Customers.city = Customers_City_Count.city
ORDER BY Customers_City_Count.cnt DESC, ContactName ASC
(to prevent, like in your case, the count from being computed many times for the same value (city))
Related
Any idea of how we can improve this query execution ? (maybe with some pre-aggregation)?
SELECT p.segment, country, count(distinct userid)
from pixel_data_opt p
WHERE country in ('US')
and segment is not null
GROUP BY p.segment, country;
I tried the below but it didn't help -
select segment, country,sum(cnt)
from
(SELECT p.segment, country, userid,count(*) as cnt
from pixel_data_opt p
WHERE country in ('US')
and segment is not null
GROUP BY p.segment, country,userid
)
group by 1,2;
There's nothing wrong with your first query - though, it could have been where country = 'US' - but optimizer (as far as Oracle is concerned) is smart enough to figure it out.
Is the country column indexed? If not, do that.
Also, gather statistics on the table.
It would probably help if you posted some more info, e.g. number of rows involved, explain plan as it shows figures that mean something.
For this query:
SELECT p.segment, country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
segment is not null
GROUP BY p.segment, country;
You want an index on the table. There are several approaches. One reasonable choice is: pixel_data_opt(country, segment, userid).
I would suggest rewriting the query as:
SELECT p.segment, 'US' as country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
segment is not null
GROUP BY p.segment;
and using the above index.
Do I need to put all the column names in group by which I have select put in select?
for example in this simple query :
Select
CustomerID,
CompanyName,
ContactName,
ContactTitle,
City,
Country
From
Customers
Group By
Country,
CompanyName,
ContactName,
ContactTitle,
City,
Country,
CustomerID
I have to allways call same amount Group By what i used in Select?
If you're just selecting columns and you want the returned records to discard the exact duplicate rows? Then there are 2 methods.
1) group by
2) distinct
Your query doesn't use any of the aggregate functions like f.e. COUNT, MIN, MAX, SUM, ...
So your query could use DISTINCT instead of a GROUP BY.
select DISTINCT
CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
But if CustomerID is a primary key, then CustomerID would already make the result unique.
So then this query doesn't need a GROUP BY or a DISTINCT to only get unique records.
select CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
Note that one could have both DISTINCT and GROUP BY in the same query. But that's just pointless. A GROUP BY already enforces the uniqueness, so adding a DISTINCT to make them unique would just make the query slower for no reason.
As for the why all the columns in that select also have to be listed in the GROUP BY? Some databases, f.e. MySql can be more tolerant about not having to group on all columns. But it's a rule from one of the SQL Standards. So most databases enforce that. It's to avoid potential misleading results.
GROUP BY x, y means you want one result row per x and y. So if you have a table with bills, you could group by year and month for instance and thus get the number of bills (count(*)) and the total (sum(amount)) per month.
So the question is what rows do you want to see. A row per company (with the number of their customers) maybe? A row per city? The GROUP BY clause contains exactly those columns mentioned.
Your GROUP BY clause does exactly nothing, as select customers and you group by customer ID (which should be the customer table's primary key).
Here the relation: WORKS(emp_name, company_name,salary)
Q. Write an expression Relational Algebra to find the company name that has the highest number of employee.
I tried to solve it in many ways but not finding the correct way.
Here is a query which should work across most RDBMS:
SELECT company_name
FROM WORKS
GROUP BY company_name
HAVING COUNT(*) = SELECT MAX(empCount) FROM
(
SELECT COUNT(*) AS empCount
FROM WORKS
GROUP BY company_name
) t
If you are using MySQL, SQL Server, or any database which has a LIMIT keyword (or something like it), then the query gets easier:
SELECT company_name, COUNT(*) AS empCount
FROM WORKS
GROUP BY company_name
ORDER BY empCount DESC
LIMIT 1
let's say for example I have the following query:
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;
As you can see the WHERE Country='Germany' is repeated in both the targets of the union - is there any way to reduce this to a query without repetitions? I don't like my queries being too long.
I'm currently working on Oracle.
Why not include the WHERE only once like
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION ALL
SELECT City, Country FROM Suppliers
ORDER BY City
) tab
WHERE Country='Germany'
(OR) do a JOIN like
SELECT c.City as CustomerCity, c.Country as customerCountry,
s.City as suppliercity, s.Country as suppliercountry
FROM Customers c
LEFT JOIN Suppliers s ON c.Country = s.Country
AND c.Country='Germany'
ORDER BY c.City;
select distinct city, country
from
(
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
) x
order by city
You can't really get around the need for a UNION if you really want both sets of rows: I've added a UNION ALL inside the main SQL and a DISTINCT outside to remove duplicates but with no extra sort operations (assuming you want to do that).
The code that you already have is pretty compact and clear while not sacrificing performance.
The suggestions to use a subquery can eliminate the duplicate WHERE statements, but will require more i/o activity than the simple union you originally provided.
When there is a union inside the sub query and then a WHERE outside of it this is asking the SQL engine to build a temporary table that is all rows in the customer table added to all the rows in the supplier table and then query that resulting table throwing out the rows which are not country = Germany. If your tables only have a couple hundred rows and you are running the query locally, it will likely not show much performance difference, but if you have thousands of rows or tables are on different servers across the network, performance could be orders of magnitude slower.
If performance is a consideration, you could make the query a bit simpler and more maintainable by using a variable for the country like so:
VAR country varchar2(64);
EXEC :country := 'Germany';
SELECT City, Country FROM Customers
WHERE Country=' :country
UNION
SELECT City, Country FROM Suppliers
WHERE Country= :country
ORDER BY City;
This clearly does not make for shorter code, but it is somewhat cleaner and would be easier to modify and only retrieves the rows that you are interested in which will give better performance.
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers
) t
WHERE t.Country='Germany'
ORDER BY t.City;
You could use a common table expression (CTE):
WITH CTE AS (
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers)
SELECT City, Country
FROM CTE
WHERE CTE.Country='Germany'
ORDER BY CTE.City;
I find it easier to read than nested sub-queries.
Agree with Patrick. Query performance is much more important that query length. All the alternate solutions mentioned have a slower performance than your original one.
I have the following query I am trying to optimize.
EXPLAIN
select clb.f_name, clb.l_name, noofbooks
from (
select f_name, l_name, count(*) as noofbooks from
customer natural join loaned_book
group by f_name, l_name
) as clb
where 3 > (
select count(*) from (
select f_name, l_name, count(*) as noofbooks
from customer natural join loaned_book
group by f_name, l_name
) as clb1
where clb.noofbooks<clb1.noofbooks
)
order by noofbooks desc;
Essentially this query is trying to find the "top three" counts (including ties i.e. not limited to 3) of the no. of books loaned by a customer.
The problem is related to the amount of counts that must be made in the query.
Is it possible to use the count values from the first query to reduce selected rows in the second query without recounting all of the rows?
This is a homework task so I am not expecting a direct answer. Any pointers would be appreciated.
Have a look at the dense_rank window function. You want all the rows that have a dense rank of 3 or less.