Improve distinct query performance - sql

Any idea of how we can improve this query execution ? (maybe with some pre-aggregation)?
SELECT p.segment, country, count(distinct userid)
from pixel_data_opt p
WHERE country in ('US')
and segment is not null
GROUP BY p.segment, country;
I tried the below but it didn't help -
select segment, country,sum(cnt)
from
(SELECT p.segment, country, userid,count(*) as cnt
from pixel_data_opt p
WHERE country in ('US')
and segment is not null
GROUP BY p.segment, country,userid
)
group by 1,2;

There's nothing wrong with your first query - though, it could have been where country = 'US' - but optimizer (as far as Oracle is concerned) is smart enough to figure it out.
Is the country column indexed? If not, do that.
Also, gather statistics on the table.
It would probably help if you posted some more info, e.g. number of rows involved, explain plan as it shows figures that mean something.

For this query:
SELECT p.segment, country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
segment is not null
GROUP BY p.segment, country;
You want an index on the table. There are several approaches. One reasonable choice is: pixel_data_opt(country, segment, userid).
I would suggest rewriting the query as:
SELECT p.segment, 'US' as country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
segment is not null
GROUP BY p.segment;
and using the above index.

Related

SQL: Select inside select?

I have a table of car accident in a major city, and the structure is like:
accident_table has the following columns:
id, caseno, date_of_occurrence, street, iucr, primary_type,
description, district, community_area, year, updated_on
I want to write a query that finds the street which has the most accidents for each district(I think the street count for each street is the number of accident that happened on that street).
Here is what I have:
SELECT DISTINCT on (street)
street,
district
FROM
(
SELECT
count(street) as street_cnt,
street,
district
FROM accident_table
)
WHERE street_count = (SELECT max(street_cnt))
It did not give me syntax error, but timed out, so I guess it took too long to run.
What's wrong and how to fix it?
Thanks,
Philip
First aggregate to get the count of accidents for each street. Then use the rank() window function to rank the streets within a district by the count of accidents in them. Then only select the ones that were ranked at the top.
SELECT x.district,
x.street,
x.accidents
FROM (SELECT a.district,
a.street,
count(*) accidents,
rank() OVER (PARTITION BY a.district
ORDER BY count(*) DESC) r
FROM accident_table a
GROUP BY a.district,
a.street) x
WHERE x.r = 1;
Your code looks like Postgres. In that database, you can express this without a subquery:
SELECT DISTINCT ON (a.district)
a.district, a.street, COUNT(*) as accidents
FROM accident_table a
GROUP BY a.district, a.street
ORDER BY a.district, COUNT(*) DESC;
That said, your problem is performance, which is probably not affected by subqueries. An index on accident_table(district, street) might help performance.

SQL reduce duplicates in union clause

let's say for example I have the following query:
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;
As you can see the WHERE Country='Germany' is repeated in both the targets of the union - is there any way to reduce this to a query without repetitions? I don't like my queries being too long.
I'm currently working on Oracle.
Why not include the WHERE only once like
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION ALL
SELECT City, Country FROM Suppliers
ORDER BY City
) tab
WHERE Country='Germany'
(OR) do a JOIN like
SELECT c.City as CustomerCity, c.Country as customerCountry,
s.City as suppliercity, s.Country as suppliercountry
FROM Customers c
LEFT JOIN Suppliers s ON c.Country = s.Country
AND c.Country='Germany'
ORDER BY c.City;
select distinct city, country
from
(
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
) x
order by city
You can't really get around the need for a UNION if you really want both sets of rows: I've added a UNION ALL inside the main SQL and a DISTINCT outside to remove duplicates but with no extra sort operations (assuming you want to do that).
The code that you already have is pretty compact and clear while not sacrificing performance.
The suggestions to use a subquery can eliminate the duplicate WHERE statements, but will require more i/o activity than the simple union you originally provided.
When there is a union inside the sub query and then a WHERE outside of it this is asking the SQL engine to build a temporary table that is all rows in the customer table added to all the rows in the supplier table and then query that resulting table throwing out the rows which are not country = Germany. If your tables only have a couple hundred rows and you are running the query locally, it will likely not show much performance difference, but if you have thousands of rows or tables are on different servers across the network, performance could be orders of magnitude slower.
If performance is a consideration, you could make the query a bit simpler and more maintainable by using a variable for the country like so:
VAR country varchar2(64);
EXEC :country := 'Germany';
SELECT City, Country FROM Customers
WHERE Country=' :country
UNION
SELECT City, Country FROM Suppliers
WHERE Country= :country
ORDER BY City;
This clearly does not make for shorter code, but it is somewhat cleaner and would be easier to modify and only retrieves the rows that you are interested in which will give better performance.
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers
) t
WHERE t.Country='Germany'
ORDER BY t.City;
You could use a common table expression (CTE):
WITH CTE AS (
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers)
SELECT City, Country
FROM CTE
WHERE CTE.Country='Germany'
ORDER BY CTE.City;
I find it easier to read than nested sub-queries.
Agree with Patrick. Query performance is much more important that query length. All the alternate solutions mentioned have a slower performance than your original one.

Query optimisation for count comparisons

I have the following query I am trying to optimize.
EXPLAIN
select clb.f_name, clb.l_name, noofbooks
from (
select f_name, l_name, count(*) as noofbooks from
customer natural join loaned_book
group by f_name, l_name
) as clb
where 3 > (
select count(*) from (
select f_name, l_name, count(*) as noofbooks
from customer natural join loaned_book
group by f_name, l_name
) as clb1
where clb.noofbooks<clb1.noofbooks
)
order by noofbooks desc;
Essentially this query is trying to find the "top three" counts (including ties i.e. not limited to 3) of the no. of books loaned by a customer.
The problem is related to the amount of counts that must be made in the query.
Is it possible to use the count values from the first query to reduce selected rows in the second query without recounting all of the rows?
This is a homework task so I am not expecting a direct answer. Any pointers would be appreciated.
Have a look at the dense_rank window function. You want all the rows that have a dense rank of 3 or less.

How to optimize an SQLite3 query

I'm learning SQLite3 by means of a book ("Using SQLite") and the Northwind database. I have written the following code to order the customers by the number of customers in their city, then alphabetically by their name.
SELECT ContactName, Phone, City as originalCity
FROM Customers
ORDER BY (
SELECT count(*)
FROM Customers
WHERE city=originalCity)
DESC, ContactName ASC
It takes about 50-100ms to run. Is there a standard procedure to follow to optimize this query, or more generally, queries of its type?
In the most general case, query optimization starts with reading the query optimizer's execution plan. In SQLite, you just use
EXPLAIN QUERY PLAN statement
In your case,
EXPLAIN QUERY PLAN
SELECT ContactName, Phone, City as originalCity
FROM Customers
ORDER BY (
SELECT count(*)
FROM Customers
WHERE city=originalCity)
DESC, ContactName ASC
You might also need to read the output of
EXPLAIN statement
which goes into more low-level detail.
In general (not only SQLite), it's better to do the count for all values (cities) at once, and a join to construct the query:
SELECT ContactName, Phone, Customers.City as originalCity
FROM Customers
JOIN (SELECT city, count(*) cnt
FROM Customers
GROUP BY city) Customers_City_Count
ON Customers.city = Customers_City_Count.city
ORDER BY Customers_City_Count.cnt DESC, ContactName ASC
(to prevent, like in your case, the count from being computed many times for the same value (city))

Count number of users from a certain country

I have a table of users, and in this table I have a country field telling where these people are from (i.e. "Sweden", "Italy", ...). How can I do a SQL query to get something like:
Country Number
Sweden 10
Italy 50
... ...
Users select their countries from a list I give to them, but the list is really huge so it would be great to have a SQL query that can avoid using that list, that is look in the DB and give back only those countries which are in the database, because for example I have nobody from Barbados, even if I have that option in the country select field of the signup form :)
Thanks in advance!
If the name of the country is in the Users table, try something like this:
SELECT Country, COUNT (*) AS Number
FROM Users
GROUP BY Country
ORDER BY Country
If the name of the country is in the country table, then you will have to join
SELECT Contries.CountryName, Count (*) AS Number
FROM Users
INNER JOIN Countries
ON Users.CountryId = Countries.CountryId
GROUP BY Countries.CountryName
ORDER BY Countries.CountryName
This will give what you want. But you might want to cache the result of the query. With a lot of users it's quite a heavy query.
SELECT
country,
COUNT(*)
FROM
users
GROUP BY
country
Perhaps a better idea is (assuming you don't need the counts) to do it like this:
SELECT
DISTINCT country
FROM
users
Sounds like you want something like this...?
SELECT Country, COUNT(*) AS Number
FROM Users
GROUP BY Country
This is pretty straightforward:
SELECT
Country, COUNT(*) AS 'Number'
FROM
YourTable
GROUP BY
Country
ORDER BY
Country
You just group your data by country and count the entries for each country.
Or if you want them sorted by the number of visitors, use a different ORDER BY clause:
SELECT
Country, COUNT(*) AS 'Number'
FROM
YourTable
GROUP BY
Country
ORDER BY
COUNT(*) DESC
If you want the count per country:
select country, count(*) from users group by country;
If you just want the possible values:
select distinct country from users;
SELECT BillingCountry, COUNT(*)as Invoices
FROM Invoice
GROUP BY BillingCountry
ORDER BY Invoices DESC