Ignore duplicates in results of a select statement based upon secondary column

Ignore duplicates in results of a select statement based upon secondary column - sql

Sorry if the title is a bit confusing, this is my first time posting.
Essentially, I have a table called roombooking, where a room has a room number(r_no), a bookingref (b_ref) and a checkin and checkout date (checkin and checkout respectively). Due to multiple different b_refs, an r_no appears in the table multiple times, with varying checkin and checkout dates.
What I want is to select all r_nos where checkin != "dateX", and for it to display only rooms where it, and any duplicates, do not contain "dateX" in the checkin column.
To provide an example data:
R_NO B_REF CHECKIN
101 999 2019-09
101 998 2019-08
102 997 2019-07
What I essentially want to see when I run my SQL statement (where dateX = 2019-09) is for it to only select 102, as despite 101 (b_ref 998) having a different checkin date, it's duplicate has 2019-09 in the checkin column and so neither appear as a result.
For those wondering, my current SQL is:
SELECT DISTINCT r_no
from roombooking
where checkin != '2019-09';
However (using the example data) this would return both 101 and 102 as results, which I don't want.
Hopefully, this is clear, and again I apologize if not, it's my first time posting.

Break it down into 2 conditions to apply as your filters -
checkin should not equal the specified date
r_no should not be the same r_no in rows where checkin is equal to the specified date
For example,
SELECT DISTINCT r_no FROM roombooking
WHERE
checkin != '2019-09' AND
r_no NOT IN (SELECT DISTINCT r_no FROM roombooking WHERE checkin = '2019-09')
There are multiple ways to achieve this, depending on your use-case and data size. A few options are
Use a sub-query to select duplicate rooms and eliminate them in your main query, as shown above
Use a CTE to select duplicate rooms and eliminate them in your main query by joining with the CTE
Self join on the same table to eliminate duplicate rooms

As far as I understand your requirements, an easy way to go is
select distinct r_no from roombooking r1
where not exists (
select * from roombooking r2
where r1.r_no = r2.r_no
and r2.checkin = '2019-09'

Related

SQLite - Output count of all records per day including days with 0 records

I have a sqlite3 database maintained on an AWS exchange that is regularly updated by a Python script. One of the things it tracks is when any team generates a new post for a given topic. The entries look something like this:
id
client
team
date
industry
city
895
acme industries
blueteam
2022-06-30
construction
springfield
I'm trying to create a table that shows me how many entries for construction occur each day. Right now, the entries with data populate, but they exclude dates with no entries. For example, if I search for just
SELECT date, count(id) as num_records
from mytable
WHERE industry = "construction"
group by date
order by date asc
I'll get results that looks like this:
date
num_records
2022-04-01
3
2022-04-04
1
How can I make sqlite output like this:
date
num_records
2022-04-02
3
2022-04-02
0
2022-04-03
0
2022-04-04
1
I'm trying to generate some graphs from this data and need to be able to include all dates for the target timeframe.
EDIT/UPDATE:
The table does not already include every date; it only includes dates relevant to an entry. If no team posts work on a day, the date column will jump from day 1 (e.g. 2022-04-01) to day 3 (2022-04-03).

Given that your "mytable" table contains all dates you need as an assumption, you can first select all of your dates, then apply a LEFT JOIN to your own query, and map all resulting NULL values for the "num_records" field to "0" using the COALESCE function.
WITH cte AS (
SELECT date,
COUNT(id) AS num_records
FROM mytable
WHERE industry = "construction"
GROUP BY date
ORDER BY date
)
SELECT dates.date,
COALESCE(cte.num_records, 0) AS num_records
FROM (SELECT date FROM mytable) dates
LEFT JOIN cte
ON dates.date = cte.date

Find the latest record

In MS Access I have the DateList table, which holds the due date of different orders. Thus, the table has two columns: OrderNo and DueDate. For some order numbers, there could be multiple DueDates. The table could look like below:
OrderNo DueDate
100 12/9/2021
101 20/9/2021
102 30/9/2021
100 7/10/2021
102 11/10/2021
103 15/10/2021
…
My goal is write a query to fetch the latest DueDate of each OrderNr.
I created two queries;
the first one, qry1, to generate a list of OrdNo without duplications:
SELECT
DateList.OrderNo AS UniqOrderNo
FROM DateList
GROUPBY DateList.OrderNo;
in the second query, qry2, I used the DMax function in order to search through DueDates of each order for the maximum value.
SELECT
qry1.UniqOrderNo
,DMax("[DueDate]","[DateList]","[OrderNo]='[qry1]![UniqOrderNo]'") AS LatDuDate
FROM qry1
INNER JOIN DateList
ON qry1.UniqOrderNo = DateList.OrderNo;
LatDuDate represents the latest DueDate of the Order.
The query is unfortunately does not work and returns nothing.
Now my questions:
Is there something wrong with my approach / queries?
Is there better way to accomplish this task in MS Access?

You almost figured it out yourself. Max returns you the biggest value of the group.
SELECT Max(DueDate) DueDate, OrderNo
FROM DateList
GROUP BY OrderNo

Similar to Christian's answer, but since OrderNo is a unique id, you can simply select the First() instead of grouping - it performs better. **
Of course it depends on the number of records the table holds.
SELECT First(OrderNo) AS OrderNo, Max(DueDate) AS DueDate
FROM DateList;
** Source: Allen Browne - Optimizing queries

Query: Find the earliest date with a null entry which still works with no null entries

I am building a query that displays the next due date for users. The number of dates stays the same, but the due date will change, depending on how many records have a comment from the user.
SELECT tbl_Description.Sample,
tbl_Description.User,
Min(tbl_Data.TestDate) As DueDate
FROM tbl_Description
INNER
JOIN tbl_Data
ON tbl_Description.DescriptionID = tbl_Data.DateID
WHERE tbl_Data.Comment IS NULL
GROUP
BY tbl_Description.Sample,
tbl_Description.User;
However, when every record from tbl_Data has a comment, the query returns empty records. This may happen because the WHERE and IS NULL statement returns nothing if every record exists. Preferably, I would still like the null record to appear with something in the [DueDate] field, such as a blank or “Complete” comment.
tbl_Description
Sample User
1 Betty
tbl_Data (v1)
Date Comments
05/01/2018 Orange
05/08/2018 Orange-Brown
05/15/2018
05/22/2018
Query Output
Sample User DueDate
1 Betty 05/15/2018
tbl_Data (v2)
Date Comments
05/01/2018 Orange
05/08/2018 Orange-Brown
05/15/2018 Brown
05/22/2018 Brown-Black
Query Output (Query returns nothing at the moment)
Sample User DueDate
1 Betty Complete
Any help would be appreciated!

Now knowing that we also can group by DescriptionID, I suggest to query the minimum TestDate for records from tbl_Data without a Comment separately and (outer) join the result to the table tbl_Description. This way, each Description is contained in the result, and when all Dates for a Description have a Comment, the DueDate will appear blank:
SELECT tbl_Description.Sample, tbl_Description.User, Uncommented.DueDate
FROM tbl_Description
LEFT JOIN (
SELECT tbl_Data.DateID, Min(tbl_Data.TestDate) AS DueDate
FROM tbl_Data
WHERE (((tbl_Data.Comment) Is Null))
GROUP BY tbl_Data.DateID
) AS Uncommented ON tbl_Description.DescriptionID = Uncommented.DateID;

Given a single column of effective dates, is there a SQL statement that can transform that into date ranges?

Similar to another question I've posted, given the following table...
Promo EffectiveDate
------ -------------
PromoA 1/1/2016
PromoB 4/1/2016
PromoC 7/1/2016
PromoD 10/1/2016
PromoE 1/1/2017
What is the easiest way to transform it into start and end dates, like so...
Promo StartDate EndDate
------ --------- ---------
PromoA 1/1/2016 4/1/2016
PromoB 4/1/2016 7/1/2016
PromoC 7/1/2016 10/1/2016
PromoD 10/1/2016 1/1/2017
PromoE 1/1/2017 null (ongoing until a new Effective Date is added)
Update
Correlated queries seem to be the simplest solution, but as I understand it, they are extremely inefficient since the subquery has to run once per row of the outer select.
What I was thinking as a potential solution was something along the lines of selecting the values from the table a second time, but eliminating the first result, then pairing them up with the first select by ordinal index with a simple outer left join.
As an example, substituting letters for dates above, the first select would be like A,B,C,D,E and second would be B,C,D,E (which is the first select minus the first record 'A') then pairing them up by ordinal index with a simple outer left join, resulting in A-B, B-C, C-D, D-E, E-null. However I couldn't figure out the syntax to make that work.

A correlated sub-query can lookup the additional field you need.
SELECT
yourTable.*,
(
SELECT MIN(lookup.EffectiveDate)
FROM yourTable AS lookup
WHERE lookup.EffectiveDate > yourTable.EffectiveDate
)
FROM
yourTable
EDIT
The notion of "has to run once per row" is a mis-understanding of how SQL generates the execution plan that actually runs. The same can be said for joining one table to another, the join has to be run at-least once per row... There is indeed a larger cost to a correlated sub-query, but with appropriate indexes it won't be "extemely high", and the functionality described does warrant it.
If you had another field that was guaranteed to be sequential, then it would be trivial, but do not try to re-use the existing Promo field for that additional purpose.
SELECT
this.*,
next.EffectiveEpoch
FROM
yourTable this
LEFT JOIN
yourTable next
ON next.sequential_id = this.sequential_id + 1

Yes, you can use a correlated query with LIMIT :
SELECT t.promo,t.effectiveDate as start_date,
(SELECT s.effectiveDate FROM YourTable s
WHERE s.date > t.date
ORDER BY s.effectiveDate
LIMIT 1) as end_date
FROM YourTable t
EDIT: Here is a solution with a join :
SELECT t.promo,t.effectiveDate as start_date,
MIN(s.effectiveDate) as end_date
FROM YourTable t
LEFT JOIN YourTable s
ON(t.date < s.date)
GROUP BY t.promo,t.effectiveDate

show this, use subquery
select
p.promo,
p.EffectiveDate as "Start",
(select n.EffectiveDate from table_promo n where n.EffectiveDate >
p.EffectiveDate order by n.EffectiveDate limit 1) as "End"
from table_promo p

select and delete query based on older entries

I have an Excel sheet that is pushing data to an Access database using ADO. It is essentially putting invoices into a database. Sometimes I will revise my invoice and therefore the database will end up with the same invoice twice. I need to make a select and delete query that will find duplicates based on the invoice number, and delete the older version of the invoice (older record), for a simple example:
id invoice# total item datestamp
1 1234 456.29$ shoes 06/06/2016 03:51
2 1234 78.58$ boots 06/06/2016 03:51
3 1234 22.74$ scarf 06/06/2016 03:51
4 1234 539.34$ shoes 06/07/2016 12:44
4 1234 66.24$ pants 06/07/2016 12:44
As you can see row 4 and 5 are my new invoice for this customer. I want every previous order of the same invoice # to be deleted. Please note: they are not actually duplicates, only the invoice number is duplicated. The query needs to see dupliactes based on invoice number and criteria sees dates older than the most recent date.
At that point it is way beyond me. I would appreciate the help.

Consider using a correlated aggregate subquery in WHERE clause:
DELETE *
FROM InvoiceTable
WHERE NOT datestamp IN
(SELECT Max(datestamp)
FROM InvoiceTable sub
WHERE sub.InvoiceNumber = InvoiceTable.InvoiceNumber)

As I said, try being conservative and not deleting. Instead, select rows that are based on the maximum date stamp for a given invoice number:
SELECT
invoices.id, invoices.invoice, invoices.total, invoices.item, invoices.datestamp
FROM
invoices
INNER JOIN
(SELECT
id, MAX(datestamp) AS maxdate
FROM
invoices
GROUP BY
id) lastinv
ON invoices.id = lastinv.id AND
invoices.datestamp = lastinv.maxdate
This is untested code, but should, pretty much do what you want. All you have to do is mangle it into Microsoft Access, as this is T-SQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Ignore duplicates in results of a select statement based upon secondary column - sql

As far as I understand your requirements, an easy way to go is select distinct r_no from roombooking r1 where not exists ( select * from roombooking r2 where r1.r_no = r2.r_no and r2.checkin = '2019-09'

Related

SQLite - Output count of all records per day including days with 0 records

Find the latest record

Query: Find the earliest date with a null entry which still works with no null entries

Given a single column of effective dates, is there a SQL statement that can transform that into date ranges?

select and delete query based on older entries

Categories

Resources