Filtering Database Results to Top n Records for Each Value in a Lookup Column - sql

Let's say I have two tables in my database.
TABLE:Categories
ID|CategoryName
01|CategoryA
02|CategoryB
03|CategoryC
and a table that references the Categories and also has a column storing some random number.
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|24
CategoryA|22
CategoryC|105
.....(20,000 records)
CategoryB|3
Now, how do I filter out this data? So, I want to know what the 3 smallest numbers are out of each category and delete the rest. The end result would be like this:
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|2
CategoryA|5
CategoryA|18
CategoryB|3
CategoryB|500
CategoryB|1601
CategoryC|1
CategoryC|4
CategoryC|62
Right now, I can get the smallest numbers between all the categories, but I would like each category to be compared individually.
EDIT: I'm using Access and here's my code so far
SELECT TOP 10 cdt1.sourceCounty, cdt1.destCounty, cdt1.distMiles
FROM countyDistanceTable as cdt1, countyTable
WHERE cdt1.sourceCounty = countyTable.countyID
ORDER BY cdt1.sourceCounty, cdt1.distMiles, cdt1.destCounty
EDIT2: Thanks to Remou, here would be the working query that solved my problem. Thank you!
DELETE
FROM CategoriesAndNumbers a
WHERE a.Number NOT IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])

You could use something like:
SELECT a.CategoryType, a.Number
FROM CategoriesAndNumbers a
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
ORDER BY a.CategoryType
The difficulty with this is that Jet/ACE Top selects duplicate values where they exist, so you will not necessarily get three values, but more, if there are ties. The problem can often be solved with a key field, if one exists :
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number], [KeyField])
However, I do not think it will help in this instance, because the outer table will include ties.

Order it by number and take 3, find out what the biggest number is and then remove rows where Number is greater than the Number.
I imagine it would need to be two seperate queries as your business tier would hold the value for the biggest number out of the 3 results and dynamically build the query to delete the rest.

Related

MS Access Query to retrieve records with lasted date in orderdate column

I have table in Access Database with the following columns
ProductID|ProductName|StoreID|StoreName|AuditRating|AuditVisit|NextAuditDue
100100 |Calculator |SC12345|CrawlyRoad| B |11/12/2013|21/02/2014
100100 |Calculator |SC12345|CrawlyRoad| A |11/12/2014|30/04/2015
100100 |Calculator |SC12345|CrawlyRoad| C |16/12/2015|24/01/2017
I need to make a query which will only give me the distinct record where the AuditVisit date is maximum like in this case I only want the third row
100100 |Calculator |SC12345|CrawlyRoad| C |16/12/2015|24/01/2017
I have used group by but as I need to bring all the columns I am getting all the records as the AuditRating column is different in all three rows.
You can use TOP 1:
Select Top 1 * From YourTable Order By AuditVisit Desc
You don't want to have groups, so why use group by?
SELECT
*
FROM your_table
WHERE AuditVisit = (SELECT MAX(AuditVisit) FROM your_table)
Pretty self-explaining, I think.
If you want one record in MS Access, then you need to be very careful with SELECT TOP. It is really SELECT TOP WITH TIES.
Hence, the obvious answer of:
Select Top 1 *
From t
Order By AuditVisit Desc;
would return multiple rows if multiple rows have the same date. If you really want one, then you want to add a unique column as the last key in the order by:
Select Top 1 *
From t
Order By AuditVisit Desc, id;
I don't see such a key in your data, although you might have a combination of columns that are unique in each row (multiple columns can be added to the ORDER BY).
In MS Access -- even more so than in other databases -- primary keys are important on tables for this reason.

SELECT TOP 1 is returning multiple records

I shall link my database down below.
I have a query called 'TestMonday1' and what this does is return the student with the fewest 'NoOfFrees' and insert the result of the query into the lesson table. Running the query should help explain what i mean. The problem im having is my SQL code has 'SELECT TOP 1' yet if the query returns two students who have the same number of frees it returns both these records. Wit this being a timetable planner, it should only ever return one result, i shall also put the code below,
Many thanks
Code:
INSERT INTO Lesson ( StudentID, LessonStart, LessonEnd, DayOfWeek )
SELECT TOP 1 Availability.StudentID, Availability.StartTime,
Availability.EndTime, Availability.DayOfWeek
FROM Availability
WHERE
Availability.StartTime='16:00:00' AND
Availability.EndTime='18:00:00' AND
Availability.DayOfWeek='Monday' AND
LessonTaken IS NULL
ORDER BY
Availability.NoOfFrees;
This happens because Access returns all records in case of ties in ORDER BY (all records returned have the same values of fields used in ORDER BY).
You can add another field to ORDER BY to make sure there's no ties. StudentID looks like a good candidate (though I don't know your schema, replace with something else if it suits better):
ORDER BY
Availability.NoOfFrees, Availability.StudentID;

SQL Query to fetch information based on one or more condition. Getting combinations instead of exact number

I have two tables. Table 1 has about 750,000 rows and table 2 has 4 million rows. Table two has an extra ID field in which I am interested, so I want to write a query that will check if the 750,000 table 1 records exist in table 2. For all those rows in table 1 that exist in table 2, I want the respective ID based on same SSN. I tried the following query:
SELECT distinct b.UID, a.*
FROM [Analysis].[dbo].[Table1] A, [Proteus_8_2].dbo.Table2 B
where a.ssn = b.ssn
Instead of getting 750,000 rows in the output, I am getting 5.4 million records. Where am i going wrong?
Please help?
You're requesting all the rows in your select if b.UID is a unique field in column two.
Also if SSN is not unique in table one you can get the higher row count than the total row count for table 2.
You need to consider what you want from table 2 again.
EDIT
You can try this to return distinct combinations of ssn and uid when ssn is found in table 2 provided that ssn and uid have a cardinality of 1:1, i.e., every unique ssn has a single unique uid.
select distinct
a.ssn,b.[UID]
from [Analysis].[dbo].[Table1] a
cross apply
( select top 1 [uid] from [Proteus_8_2].[dbo].[Table2] where ssn = a.ssn ) b
where b.[UID] is not null
Try with LEFT JOIN
SELECT distinct b.UID, a.*
FROM [Analysis].[dbo].[Table1] A LEFT JOIN [Proteus_8_2].dbo.Table2 B
on a.ssn = b.ssn
Since the order detail table is in a one-many relationship to the order table, that is the expected result of any join. If you want something different, you need to define for us the business rule that will tell us how to select only one record from the Order detail table. You cannot effectively write SQL code without understanding the business rules that of what you are trying to achieve. You should never just willy nilly select one record out of the many, you need to understand which one you want.

Full text search across columns

Sorry for the bad post title but I couldn't summarize this better.
It's better to use an example. Say I have this simple table with two text columns (I'm leaving the other columns out).
Id Text_1 Text_2
1 a a b
2 c a b
Now if I want to search for '"a" and not "b"', in my current implementation I'm getting record 1 back. I understand why this is, it's because the search condition is a match on column "Text_1", while for record 2 it's not a match on any column.
However, for the end user this may not be intuitive, as they probably mean to exclude record 1 as well most of the time.
So my question is, if I want to tell SQL Server to do the matching "across all columns" (meaning that if the "NOT" portion is found on ANY column, the record shouldn't match), is it possible?
EDIT: This is what my query would look like for this example:
SELECT Id, TextHits.RANK Rank, Text_1, Text_2 FROM simple_table
JOIN CONTAINSTABLE(simple_table, (Text_1, Text_2), '"a" and not "b"') TextHits
ON TextHits.[KEY] = simple_table.Id
ORDER BY Rank DESC
The actual query is a bit more complicated (more columns, more joins, etc) but this is the general idea :)
Thanks!
The logic is going to be evaluated against each record so if you want an exclusion hit from one record in a row to cause an exclusion on the row you should use a NOT EXISTS and break out the fullText query into separate inclusionary and exclusionary parts...
SELECT Id,
TextHits.RANK Rank,
Text_1,
Text_2
FROM simple_table
JOIN CONTAINSTABLE(simple_table, (Text_1, Text_2), '"a"') TextHits
ON TextHits.[KEY] = simple_table.Id
WHERE NOT EXISTS (SELECT 1
FROM CONTAINSTABLE(simple_table, (Text_1, Text_2), '"b"') exclHits
WHERE TextHits.[KEY] = exclHits.[KEY])
ORDER BY Rank DESC

Find row number in a sort based on row id, then find its neighbours

Say that I have some SELECT statement:
SELECT id, name FROM people
ORDER BY name ASC;
I have a few million rows in the people table and the ORDER BY clause can be much more complex than what I have shown here (possibly operating on a dozen columns).
I retrieve only a small subset of the rows (say rows 1..11) in order to display them in the UI. Now, I would like to solve following problems:
Find the number of a row with a given id.
Display the 5 items before and the 5 items after a row with a given id.
Problem 2 is easy to solve once I have solved problem 1, as I can then use something like this if I know that the item I was looking for has row number 1000 in the sorted result set (this is the Firebird SQL dialect):
SELECT id, name FROM people
ORDER BY name ASC
ROWS 995 TO 1005;
I also know that I can find the rank of a row by counting all of the rows which come before the one I am looking for, but this can lead to very long WHERE clauses with tons of OR and AND in the condition. And I have to do this repeatedly. With my test data, this takes hundreds of milliseconds, even when using properly indexed columns, which is way too slow.
Is there some means of achieving this by using some SQL:2003 features (such as row_number supported in Firebird 3.0)? I am by no way an SQL guru and I need some pointers here. Could I create a cached view where the result would include a rank/dense rank/row index?
Firebird appears to support window functions (called analytic functions in Oracle). So you can do the following:
To find the "row" number of a a row with a given id:
select id, row_number() over (partition by NULL order by name, id)
from t
where id = <id>
This assumes the id's are unique.
To solve the second problem:
select t.*
from (select id, row_number() over (partition by NULL order by name, id) as rownum
from t
) t join
(select id, row_number() over (partition by NULL order by name, id) as rownum
from t
where id = <id>
) tid
on t.rownum between tid.rownum - 5 and tid.rownum + 5
I might suggest something else, though, if you can modify the table structure. Most databases offer the ability to add an auto-increment column when a row is inserted. If your records are never deleted, this can server as your counter, simplifying your queries.