NOTE: I WANT TO AVOID DISTINCT ON FOR PERFORMANCE REASONS.
NOTE 2: I WAS WRONG. USING PROPER INDEXES THE QUERY WORKED AWESOME THANKS TO #Gordon Linoff!
Having the following structure:
| id | image_url | sort | t1_id |
|----|---------------|------|-------|
| 1 | https://.../1 | 10 | 1 |
| 2 | https://.../2 | 20 | 1 |
| 3 | https://.../3 | 30 | 1 |
| 4 | https://.../4 | 30 | 2 |
| 5 | https://.../5 | 20 | 2 |
| 6 | https://.../6 | 10 | 2 |
I want to fetch the lowest sort row's image_url column by t1_id, similar to the following:
SELECT * FROM t2 WHERE MIN(sort) GROUP BY (t1_id);
Getting the following result:
| id | image_url | sort | t1_id |
|----|---------------|------|-------|
| 1 | https://.../1 | 10 | 1 |
| 6 | https://.../6 | 10 | 2 |
Thanks in advance!
Postgres has a handy extension called distinct on:
select distinct on (t1_id) t2.*
from t2
order by t1_id, sort asc;
This is usually the fastest way to approach such a problem. In particular, this can take advantage of an index on (t1_id, sort [desc]).
However, you can try another approach such as:
select t2.*
from t2
where t2.sort = (select min(tt2.sort)
from t2 tt2
where tt2.t1_id = t2.t1_id
);
This would use the same index. If this is faster, please post a comment with the relevant performance.
Related
I have an Oracle table like this
| id | code | info | More cols |
|----|------|------------------|-----------|
| 1 | 13 | The Thirteen | dggf |
| 1 | 18 | The Eighteen | ghdgffg |
| 1 | 18 | The Eighteen | |
| 1 | 9 | The Nine | ghdfgjgf |
| 1 | 9 | Die Neun | ghdfgjgf |
| 1 | 75 | The Seventy-five | ghfgh |
| 1 | 75 | The Seventy-five | ghfgh |
| 1 | 2 | The Two | ghfgh |
| 1 | 27 | The Twenty-Seven | |
| 1 | 27 | The Twenty-Seven | |
| 1 | 27 | el veintisiete | fghfg |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
In this table I want to find all rows with values in column code which have more than one distinct value in the info column. So from the listed rows this would be the values 9 and 27 and the associated rows.
I tried to construct a first query like
SELECT code FROM mytable
WHERE COUNT(DISTINCT info) >1
but I get a "ORA-00934: group function is not allowed here" error. Also I don't know how to express the condition COUNT(DISTINCT info) "with a fixed postcode".
You need having with group by - aggregate functions don't work with where clause
SELECT code
FROM mytable
group by code
having COUNT(DISTINCT info) >1
I would write your query as:
SELECT code
FROM yourTable
GROUP BY code
HAVING MIN(info) <> MAX(info);
Writing the HAVING logic this ways leaves the query sargable, meaning that an index on (code, info) should be usable.
You could also do this using exists logic:
SELECT DISTINCT code
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable WHERE t2.code = t1.code AND t2.info <> t1.info);
Suppose I have the following table:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
+----+-------------+-------------+
My desired results are:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+----+-------------+-------------+
My current solution is:
SELECT
*
FROM
(SELECT
id,
step_number,
MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
employee_id
FROM
table_name) AS t
WHERE
t.step_number = t.min_step_number
Is there a more efficient way I could be doing this?
I'm currently using postgresql, version 12.
In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:
select distinct on (id) t.*
from mytbale t
order by id, step_number
This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).
Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).
Hope you can help. We have a table with two columns Customer_ID and Trip_Date. The customer receives 15% off on their first visit and on every visit where they haven't received the 15% off offer in the past thirty days. How do I write a single SQL query that finds all days where a customer received 15% off?
The table looks like this
+-----+-------+----------+
| Customer_ID | date |
+-----+-------+----------+
| 1 | 01-01-17 |
| 1 | 01-17-17 |
| 1 | 02-04-17 |
| 1 | 03-01-17 |
| 1 | 03-15-17 |
| 1 | 04-29-17 |
| 1 | 05-18-17 |
+-----+-------+----------+
The desired output would look like this:
+-----+-------+----------+--------+----------+
| Customer_ID | date | received_discount |
+-----+-------+----------+--------+----------+
| 1 | 01-01-17 | 1 |
| 1 | 01-17-17 | 0 |
| 1 | 02-04-17 | 1 |
| 1 | 03-01-17 | 0 |
| 1 | 03-15-17 | 1 |
| 1 | 04-29-17 | 1 |
| 1 | 05-18-17 | 0 |
+-----+-------+----------+--------+----------+
We are doing this work in Netezza. I can't think of a way using just window functions, only using recursion and looping. Is there some clever trick that I'm missing?
Thanks in advance,
GF
You didn't tell us what your backend is, nor you gave some sample data and expected output nor you gave a sensible data schema :( This is an example based on guess of schema using postgreSQL as backend (would be too messy as a comment):
(I think you have Customer_Id, Trip_Date and LocationId in trips table?)
select * from trips t1
where not exists (
select * from trips t2
where t1.Customer_id = t2.Customer_id and
t1.Trip_Date > t2.Trip_Date
and t1.Trip_date - t2.Trip_Date < 30
);
I would like to filter my table by MIN() function but still keep columns which cant be grouped.
I have table:
+----+----------+----------------------+
| ID | distance | geom |
+----+----------+----------------------+
| 1 | 2 | DSDGSAsd23423DSFF |
| 2 | 11.2 | SXSADVERG678BNDVS4 |
| 2 | 2 | XCZFETEFD567687SDF |
| 3 | 24 | SADASDSVG3423FD |
| 3 | 10 | SDFSDFSDF343DFDGF |
| 4 | 34 | SFDHGHJ546GHJHJHJ |
| 5 | 22 | SDFSGTHHGHGFHUKJYU45 |
| 6 | 78 | SDFDGDHKIKUI45 |
| 6 | 15 | DSGDHHJGHJKHGKHJKJ65 |
+----+----------+----------------------+
This is what I would like to achieve:
+----+----------+----------------------+
| ID | distance | geom |
+----+----------+----------------------+
| 1 | 2 | DSDGSAsd23423DSFF |
| 2 | 2 | XCZFETEFD567687SDF |
| 3 | 10 | SDFSDFSDF343DFDGF |
| 4 | 34 | SFDHGHJ546GHJHJHJ |
| 5 | 22 | SDFSGTHHGHGFHUKJYU45 |
| 6 | 15 | DSGDHHJGHJKHGKHJKJ65 |
+----+----------+----------------------+
it is possible when I use MIN() on distance column and grouping by ID but then I loose my geom which is essential.
The query looks like this:
SELECT "ID", MIN(distance) AS distance FROM somefile GROUP BY "ID"
the result is:
+----+----------+
| ID | distance |
+----+----------+
| 1 | 2 |
| 2 | 2 |
| 3 | 10 |
| 4 | 34 |
| 5 | 22 |
| 6 | 15 |
+----+----------+
but this is not what I want.
Any suggestions?
One common approach to this is to find the minimum values in a derived table that you join with:
SELECT somefile."ID", somefile.distance, somefile.geom
FROM somefile
JOIN (
SELECT "ID", MIN(distance) AS distance FROM somefile GROUP BY "ID"
) t ON t.distance = somefile.distance AND t.ID = somefile.ID;
Sample SQL Fiddle
You need a window function to do this:
SELECT "ID", distance, geom
FROM (
SELECT "ID", distance, geom, rank() OVER (PARTITION BY "ID" ORDER BY distance) AS rnk
FROM somefile) sub
WHERE rnk = 1;
This effectively orders the entire set of rows first by the "ID" value, then by the distance and returns the record for each "ID" where the distance is minimal - no need to do a GROUP BY.
select a.*,b.geom from
(SELECT ID, MIN(distance) AS distance FROM somefile GROUP BY ID) as a
inner join somefile as b on a.id=b.id and a.distance=b.distance
You can use "distinct on" clause of the PostgreSQL.
select distinct on(id) id, distance, geom
from table_name
order by distance;
I think this is what you are exactly looking for.
For more details on how "distinct on" works, refer the documentation and the example.
But, remember, using "distinct on" does not comply to SQL standards.
Could anybody please help me on SQL command?
I have a table (tbl_sActivity) that have below data:
user_id | client_id | act_status |
1 | 7 |
cold |
1 | 7 |
dealed |
22 | 5 |
cold |
1 | 6 |
cold |
1 | 6 |
warm |
1 | 6 |
hot |
1 | 6 |
dealed |
1 | 8 |
warm |
1 | 8 |
dealed |
21 | 4 |
warm |
21 | 4 |
dealed |
The out put should be
user_id | Count_C_id |
1 |
3 |
21 |
1 |
22 |
1 |
I've searched from net and learnt that MS ACCESS cannot use COUNT(DISTINCT) function. So I'm stuck at this stage for days.
Try this one. The "trick" is to have a subquery first to get all the distinct combinations of user and client IDs and then do the grouping per user:
SELECT
user_id
, COUNT(*) AS count_distinct_clients
FROM
( SELECT DISTINCT
user_id,
client_id
FROM tbl_sActivity
) AS tmp
GROUP BY
user_id ;
Recommendation is to make query without using sub-query.
Please find the below code which will be faster and accurate then subquery.
// Temp Table
CREATE TABLE #TempStudent(userId int, c_id int , Name varchar(MAX) )
SELECT max(userid) as UserId, count(c_id) as C_ID from #TempStudent
GROUP BY userId