Select first three rows for each ID - sql

I have executed the following query:
SELECT ProductID, Quantity, Location
FROM DBLocations
ORDER BY ProductID, LocationDistanceIndex DESC;
Afterwards, I've been trying to select up to 3 closest warehouses which have each of the products - LocationDistanceIndex column (Also there could be none, 1 or 2).
How would I write the query to remain with up to 3 records for each ProductID - the 3 records with the highest LocationDistanceIndex hence the descending order by.
Also if there is a way to perform such filtering without manually written queries in MS Access, it would be great if somebody points that out.
Note: I tried using Row_Number() Over Partition but MS Access does not seem to support that.

Here is one method for MS Access:
SELECT l.*
FROM DBLocations l
WHERE l.LocationDistanceIndex IN (SELECT TOP 3 l2.LocationDistanceIndex
FROM DBLocations l2
WHERE l.ProductID = l2.ProductID
ORDER BY l2.LocationDistanceIndex DESC
);

Related

Rank order ST_DWithin results by the number of radii a result appears in

I have a table of existing customers and another table of potential customers. I want to return a list of potential customers rank ordered by the number of radii of existing purchasers that they appear in.
There are many rows in the potential customers table per each existing customer, and the radius around a given existing customer could encompass multiple potential customers. I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
SELECT pur.contact_id AS purchaser, count(pot.*) AS nearby_potential_customers
FROM purchasers_geocoded pur, potential_customers_geocoded pot
WHERE ST_DWithin(pur.geom,pot.geom,1000)
GROUP BY purchaser;
Does anyone have advice on how to proceed?
EDIT:
With some help, I wrote this query, which seems to do the job, but I'm verifying now.
WITH prequalified_leads_table AS (
SELECT *
FROM nearby_potential_customers
WHERE market_val > 80000
AND market_val < 120000
)
, proximate_to_existing AS (
SELECT pot.prop_id AS prequalified_leads
FROM purchasers_geocoded pur, prequalified_leads_table pot
WHERE ST_DWithin(pot.geom,pur.geom,100)
)
SELECT prequalified_leads, count(prequalified_leads)
FROM proximate_to_existing
GROUP BY prequalified_leads
ORDER BY count(*) DESC;
I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
Your query tried the opposite of your statement, counting potential customers around existing ones.
Inverting that, and after adding some tweaks:
SELECT pot.contact_id AS potential_customer
, rank() OVER (ORDER BY pur.nearby_customers DESC
, pot.contact_id) AS rnk
, pur.nearby_customers
FROM potential_customers_geocoded pot
LEFT JOIN LATERAL (
SELECT count(*) AS nearby_customers
FROM purchasers_geocoded pur
WHERE ST_DWithin(pur.geom, pot.geom, 1000)
) pur ON true
ORDER BY 2;
I suggest a subquery with LEFT JOIN LATERAL ... ON true to get counts. Should make use of the spatial index that you undoubtedly have:
CREATE INDEX ON purchasers_geocoded USING gist (geom);
Thereby retaining rows with 0 nearby customers in the result - your original join style would exclude those. Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then ORDER BY the resulting nearby_customers in the outer query (not: nearby_potential_customers).
It's not clear whether you want to add an actual rank. Use the window function rank() if so. I made the rank deterministic while being at it, breaking ties with an additional ORDER BY expression: pot.contact_id. Else, peers are returned in arbitrary order which can change for every execution.
ORDER BY 2 is short syntax for "order by the 2nd out column". See:
Select first row in each GROUP BY group?
Related:
How do I query all rows within a 5-mile radius of my coordinates?

How to modify query to walk entire table rather than a single

I wrote several SQL queries and executed them against my table. Each individual query worked. I kept adding functionality until I got a really ugly working query. The problem is that I have to manually change a value every time I want to use it. Can you assist in making this query automatic rather than “manual”?
I am working with DB2.
Table below shows customers (cid) from 1 to 3. 'club' is a book seller, and 'qnty' is the number of books the customer bought from each 'club'. The full table has 45 customers.
Image below shows all the table elements for the first 3 users (cid=1 OR cid=2 OR cid=3). The final purpose of all my queries (once combined) is it to find the single 'club' with the largest 'qnty' for each 'cid'. So for 'cid =1' the 'club' is Readers Digest with 'qnty' of 3. For 'cid=2' the 'club' is YRB Gold with 'qnty' of 5. On and on until cid 45 is reached.
To give you a background on what I did here are my queries:
(Query 1-starting point for cid=1)
SELECT * FROM yrb_purchase WHERE cid=1
(Query 2 - find the 'club' with the highest 'qnty' for cid=1)
SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC
(Query 3 – combine the record from the above query with it’s cid)
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club
(Query 4) make sure there is only one record for cid=1
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club FETCH FIRST ROWS ONLY
To get the 'club' with the highest 'qnty' for customer 2, I would simply change the text cid=1 to cid=2 in the last query above. My query seems to always produce the correct results. My question is, how do I modify my query to get the results for all 'cid's from 1 to 45 in a single table? How do I get a table with all the cid values along with the club which sold that cid the most books, and how many books were sold within one tablei? Please keep in mind I am hoping you can modify my query as opposed to you providing a better query.
If you decide that my query is way too ugly (I agree with you) and choose to provide another query, please be aware that I just started learning SQL and may not be able to understand your query. You should be aware that I already asked this question: For common elements, how to find the value based on two columns? SQL but I was not able to make the answer work (due to my SQL limitations - not because the answer wasn't good); and in the absence of a working answer I could not reverse engineer it to understand how it works.
Thanks in advance
****************************EDIT #1*******************************************
The results of the answer is:
You could use OLAP/Window Functions to achieve this:
SELECT
cid,
club,
qnty
FROM
(
SELECT
cid,
club,
qnty,
ROW_NUMBER() OVER (PARTITION BY cid order by qnty desc) as cid_club_rank
FROM
(
SELECT
cid,
club,
sum(qnty) as qnty
FROM yrb_purchase
GROUP BY cid, club
) as sub1
) as sub2
WHERE cid_club_rank = 1
The inner most statement (sub1) just grabs a total quantity for each cid/club combination. The second inner most statement (sub2) creates a row_number for each cid/club combination ordering by the quantity (top down). Then the outer most query chooses only records where that row_number() is 1.

MS Access Query to retrieve records with lasted date in orderdate column

I have table in Access Database with the following columns
ProductID|ProductName|StoreID|StoreName|AuditRating|AuditVisit|NextAuditDue
100100 |Calculator |SC12345|CrawlyRoad| B |11/12/2013|21/02/2014
100100 |Calculator |SC12345|CrawlyRoad| A |11/12/2014|30/04/2015
100100 |Calculator |SC12345|CrawlyRoad| C |16/12/2015|24/01/2017
I need to make a query which will only give me the distinct record where the AuditVisit date is maximum like in this case I only want the third row
100100 |Calculator |SC12345|CrawlyRoad| C |16/12/2015|24/01/2017
I have used group by but as I need to bring all the columns I am getting all the records as the AuditRating column is different in all three rows.
You can use TOP 1:
Select Top 1 * From YourTable Order By AuditVisit Desc
You don't want to have groups, so why use group by?
SELECT
*
FROM your_table
WHERE AuditVisit = (SELECT MAX(AuditVisit) FROM your_table)
Pretty self-explaining, I think.
If you want one record in MS Access, then you need to be very careful with SELECT TOP. It is really SELECT TOP WITH TIES.
Hence, the obvious answer of:
Select Top 1 *
From t
Order By AuditVisit Desc;
would return multiple rows if multiple rows have the same date. If you really want one, then you want to add a unique column as the last key in the order by:
Select Top 1 *
From t
Order By AuditVisit Desc, id;
I don't see such a key in your data, although you might have a combination of columns that are unique in each row (multiple columns can be added to the ORDER BY).
In MS Access -- even more so than in other databases -- primary keys are important on tables for this reason.

Update Table to order child records by date

I have two tables:
PARENT (EMAIL,NAME,ETC)
CHILD (EMAIL,DOC_DOC_ID,DOWNLOAD_DATE,RANK)
I need to generate a query that will update the CHILD.RANK Field, with a numerical sorting that will rank each distinct DOC_ID by the date that it was downloaded (1 = latest doc download)
SELECT
P.EMAIL,
C.DOC_ID,
MAX(C.DOWNLOAD_DATE)
FROM
PARENT P,
CHILD C
WHERE
P.EMAIL = C.EMAIL
Please dont laugh at what i have come up with so far!... i think my brain is fried!
If you are using Rank_ID for more than display (which should be left to the queries) your design may have issues.
Have you considered what would happen if you checked out DOC_ID = 1 today and then ran an update to give it rank one and then the same thing happened tomorrow and you now have two records for DOC_ID = 1 with a RANK of 1?
You could use something like this to just display the records in the correct order. Query 1 will just display the records in order. Query 2 will add a Rank value (requires the first query).
QUERY 1:
SELECT
LAST(EMAIL) AS EMAIL,
DOC_DOC_ID,
Max(DOWNLOAD_DATE) AS DOWNLOAD_DATE
FROM
CHILD
GROUP BY
DOC_DOC_ID
ORDER BY
Max(DOWNLOAD_DATE) DESC;
QUERY 2:
SELECT
testing.EMAIL,
testing.DOC_DOC_ID,
testing.DOWNLOAD_DATE,
(select
count(*)
from
Query1
where
DOWNLOAD_DATE>testing.DOWNLOAD_DATE)+1 AS RANK
FROM
Query1 as testing
ORDER BY
testing.DOWNLOAD_DATE DESC;

Find row number in a sort based on row id, then find its neighbours

Say that I have some SELECT statement:
SELECT id, name FROM people
ORDER BY name ASC;
I have a few million rows in the people table and the ORDER BY clause can be much more complex than what I have shown here (possibly operating on a dozen columns).
I retrieve only a small subset of the rows (say rows 1..11) in order to display them in the UI. Now, I would like to solve following problems:
Find the number of a row with a given id.
Display the 5 items before and the 5 items after a row with a given id.
Problem 2 is easy to solve once I have solved problem 1, as I can then use something like this if I know that the item I was looking for has row number 1000 in the sorted result set (this is the Firebird SQL dialect):
SELECT id, name FROM people
ORDER BY name ASC
ROWS 995 TO 1005;
I also know that I can find the rank of a row by counting all of the rows which come before the one I am looking for, but this can lead to very long WHERE clauses with tons of OR and AND in the condition. And I have to do this repeatedly. With my test data, this takes hundreds of milliseconds, even when using properly indexed columns, which is way too slow.
Is there some means of achieving this by using some SQL:2003 features (such as row_number supported in Firebird 3.0)? I am by no way an SQL guru and I need some pointers here. Could I create a cached view where the result would include a rank/dense rank/row index?
Firebird appears to support window functions (called analytic functions in Oracle). So you can do the following:
To find the "row" number of a a row with a given id:
select id, row_number() over (partition by NULL order by name, id)
from t
where id = <id>
This assumes the id's are unique.
To solve the second problem:
select t.*
from (select id, row_number() over (partition by NULL order by name, id) as rownum
from t
) t join
(select id, row_number() over (partition by NULL order by name, id) as rownum
from t
where id = <id>
) tid
on t.rownum between tid.rownum - 5 and tid.rownum + 5
I might suggest something else, though, if you can modify the table structure. Most databases offer the ability to add an auto-increment column when a row is inserted. If your records are never deleted, this can server as your counter, simplifying your queries.