Extra records when used as a sub query: Access

Extra records when used as a sub query: Access - sql

I'm a rookie developer with basic SQL experience and this problem has been 'doing my head in' for the last couple of days. I've gone to ask a question here a couple times and thought... not yet... keep trying.
I have a table:
ID
Store
Product_Type
Delivery_Window
Despatch_Time
Despatch_Type
Pallets
Cartons
Many other columns (start_week and day_num are two of them)
My goal is to get a list of of store by product_type with the minimum despatch_time with all the other column information.
I've tested the base query.
SELECT Product_Type, Store, Min(Despatch_Time) as MinDes
FROM table
GROUP BY Store, Product_Type
Works well, I get 200 rows as expected.
Now I want those 200 rows to have the other related record information : Delivery_Window, start_week, etc
I've tried the following.
SELECT * FROM Table WHERE EXISTS
(SELECT Product_Type, Store, Min(Despatch_Time) as MinDes
FROM table
GROUP BY Store, Product_Type)
I've tried doing inner and right joins all returned more than 200 records, my original amount.
I inspected the additional records and it is where there is the same despatch time for a store and product type but for a different despatch type.
So I need a hand in creating a query where I limit it by the initial sub query but even if there is matching minimum despatch times it will still limit the count to one record by store and product type.
Current Query is:
SELECT *
FROM table AS A INNER JOIN
(Select Min(Despatch_Time) as MinDue, store, product_type
FROM table
WHERE day_num = [Forms]![FRM_SomeForm]![combo_del_day] AND start_week =[Forms]![FRM_SomeForm]![txt_date1]
GROUP BY store, product_type) AS B
ON (A.product_type = B.product_type) AND (A.store = B.store) AND (A.Despatch_Time = B.MinDue);

I think you want:
SELECT t.*
FROM table as t
WHERE t.Dispatch_Time = (SELECT MIN(t2.Dispatch_Time)
FROM table as t2
WHERE t2.Store = t.Store AND t2.Product_Type = t.Product_Type);
The above will return duplicates. In order to avoid duplicates, you need a key to provide uniqueness. Let me assume you have a primary key pk:
SELECT t.*
FROM table as t
WHERE t.pk = (SELECT TOP (1) t2.pk
FROM table as t2
WHERE t2.Store = t.Store AND t2.Product_Type = t.Product_Type
ORDER BY t2.Dispatch_Time, t2.pk
);

Related

SQL SELECT filtering out combinations where another column contains empty cells, then returning records based on max date

I have run into an issue I don't know how to solve. I'm working with a MS Access DB.
I have this data:
I want to write a SELECT statement, that gives the following result:
For each combination of Project and Invoice, I want to return the record containing the maximum date, conditional on all records for that combination of Project and Invoice being Signed (i.e. Signed or Date column not empty).
In my head, first I would sort the irrelevant records out, and then return the max date for the remaining records. I'm stuck on the first part.
Could anyone point me in the right direction?
Thanks,
Hulu

Start with an initial query which fetches the combinations of Project, Invoice, Date from the rows you want returned by your final query.
SELECT
y0.Project,
y0.Invoice,
Max(y0.Date) AS MaxOfDate
FROM YourTable AS y0
GROUP BY y0.Project, y0.Invoice
HAVING Sum(IIf(y0.Signed Is Null,1,0))=0;
The HAVING clause discards any Project/Invoice groups which include a row with a Null in the Signed column.
If you save that query as qryTargetRows, you can then join it back to your original table to select the matching rows.
SELECT
y1.Project,
y1.Invoice,
y1.Desc,
y1.Value,
y1.Signed,
y1.Date
FROM
YourTable AS y1
INNER JOIN qryTargetRows AS sub
ON (y1.Project = sub.Project)
AND (y1.Invoice = sub.Invoice)
AND (y1.Date = sub.MaxOfDate);
Or you can do it without the saved query by directly including its SQL as a subquery.
SELECT
y1.Project,
y1.Invoice,
y1.Desc,
y1.Value,
y1.Signed,
y1.Date
FROM
YourTable AS y1
INNER JOIN
(
SELECT y0.Project, y0.Invoice, Max(y0.Date) AS MaxOfDate
FROM YourTable AS y0
GROUP BY y0.Project, y0.Invoice
HAVING Sum(IIf(y0.Signed Is Null,1,0))=0
) AS sub
ON (y1.Project = sub.Project)
AND (y1.Invoice = sub.Invoice)
AND (y1.Date = sub.MaxOfDate);

Write A SQL query, which should be possible in MS-Access too, like this:
SELECT
Project,
Invoice,
MIN([Desc]) Descriptions,
SUM(Value) Value,
MIN(Signed) Signed,
MAX([Date]) "Date"
FROM data
WHERE Signed<>'' AND [Date]<>''
GROUP BY
Project,
Invoice
output:
Project
Invoice
Descriptions
Value
Signed
Date
A
1
Ball
100
J.D.
2022-09-20
B
1
Sofa
300
J.D.
2022-09-22
B
2
Desk
100
J.D.
2022-09-23
Note: for invoice 1 on project A, you will see a value of 300, which is the total for that invoice (when grouping on Project='A' and Invoice=1).
Maybe I should have used DCONCAT (see: Concatenation in between records in Access Query ) for the Description, to include 'TV' in it. But I am unable to test that so I am only referring to this answer.

Try joining a second query:
Select *
From YourTable As T
Inner Join
(Select Project, Invoice, Max([Date]) As MaxDate
From YourTable
Group By Project, Invoice) As S
On T.Project = S.Project And T.Invoice = S.Invoice And T.Date = S.MaxDate

Amazon SQL job interview question: customers who made 2+ purchases

You have a simple table that has only two fields: CustomerID, DateOfPurchase. List all customers that made at least 2 purchases in any period of six months. You may assume the table has the data for the last 10 years. Also, there is no PK or unique value.
My friend already got the job, despite the fact that he couldn't answer this question. I was curious how this kind of question can be solved.
Thanks

From an abstract view this problem is about efficiently self joining a table with no PK or unique identifier.
This is very tricky as you see there can be scenarios like
a customer making exactly 2 purchase in 6 month that too on same date (which can look like duplicate record)
a customer making >=2 purchase in 6 month on different date(the usual case).
One of the thing that needs to be done here is generate a column that can act
like a unique identifier which can be achieved here using row_number
After having a unique identifier it is easy to join on your required conditions and unique identifier from 1st alias != unique identifier from 2nd alias (meaning joining all rows from both alias except with same row, same row != different row with same data as in 1st scenario)
Putting it all together, it can achieved using
common table expressions to start with a single source of data that includes a manually added unique identifier and then doing the required business
row_number which helps us assign that unique identifier to our single source of data generated in a common table expression.
refer the below query for technical details.
with tempPurchase as (
select *,
row_number() over (order by CustomerID) as rowNumber -- this is crucial part
from purchase
)
select distinct(tp1.CustomerID) from tempPurchase as tp1
join tempPurchase as tp2 on tp1.CustomerID = tp2.CustomerID
and tp1.DateOfPurchase >= tp2.DateOfPurchase
and tp1.DateOfPurchase <= DATEADD(month, 6, tp2.DateOfPurchase)
and tp1.rowNumber != tp2.rowNumber; -- this is crucial part
Refer db fiddle here for complete working solution.

We can try using exists logic here to detect records for the same customer occurring within 6 months. Then, find distinct customers, which implies that any such matching customer has at least two purchases.
SELECT DISTINCT CustomerID
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.CustomerID = t1.CustomerID AND
t2.DateOfPurchase > t1.DateOfPurchase AND
t2.DateOfPurchase <= DATEADD(month, 6, t1.DateOfPurchase));
Note that this answer assumes that there would only be at most one distinct purchase per day by a given customer. A better approach would be:
SELECT DISTINCT CustomerID
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.CustomerID = t1.CustomerID AND
t2.PK <> t1.PK AND
t2.DateOfPurchase >= t1.DateOfPurchase AND
t2.DateOfPurchase <= DATEADD(month, 6, t1.DateOfPurchase));
The above query reads as saying find, for each customer, any relationship between 2 records within 6 months of each other which are distinct purchases. This assumes that the table has a PK primary key column. Ideally, every table should have some kind of logical primary key.

Try this:
SELECT distinct CustomerID
FROM purchase t1
WHERE 1 < (SELECT count(1) FROM purchase t2
WHERE t2.CustomerID = t1.CustomerID AND
t2.DateOfPurchase >= t1.DateOfPurchase AND
t2.DateOfPurchase <= DATEADD(month, 6, t1.DateOfPurchase))
Idea is to pick one record from outer table t1 and check in inner table t2 if there are any purchases made within 6 months including the one which you picked from outer table. If count from subquery is greater than 1 then we have the eligible customer.

SQL - Removing result of one table based on the tree structure of another

To better define the question I'm asking, it'd probably be easier to introduce you to the data I'm working with.
Essentially I have two tables joined that kind of look like this:
Table 1
Product ID AccountLinkID (FK)
PRODUCT00001 AC000001
PRODUCT00001 AC000002
PRODUCT00001 AC000003
PRODUCT00001 AC000004
Table 2
Link (FK) AccountType
AC000001 1
AC000002 2
AC000003 3
AC000004 4
As part of some data i'm looking at, I want to make sure that if any ProductID is linked to an account type '4' that the product ID is removed from the search.
The problem Is that the foreign key isn't also a single number - as one product can be linked to multiple account types (for example, one produce could be linked to a sellers account, buyers account, customer account etc)
So in this instance - account type 4 is something like a 'dummy' account, therefore any productID's linked to it aren't ID's I want including in the search.
I can't think of how to use the account type as a means to remove the product id.
Thank you in advance for any advice.

If you want just one row per productid, you can join, aggregate, and filter out with a having clause
select t1.productid
from table1 t1
inner join table2 t2 on t2.link = t1.accountlinkid
group by t1.productid
having max(case when t2.accounttype = 4 then 1 else 0 end) = 0
If, on the other hand, you want entire rows from t1, then window functions are a better option:
select t.*
from (
select t1.*,
max(case when t2.accounttype = 4 then 1 else 0 end) over(partition by t1.productid) has_type4
from table1 t1
inner join table2 t2 on t2.link = t1.accountlinkid
) t
where has_type4 = 0

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here

You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).

Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).

A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Nested SQL Queries with Self JOIN - How to filter rows OUT

I have an SQLite3 database with a table upon which I need to filter by several factors. Once such factor is to filter our rows based on the content of other rows within the same table.
From what I've researched, a self JOIN is going to be required, but I am not sure how I would do that to filter the table by several factors.
Here is a sample table of the data:
Name Part # Status Amount
---------------------------------
Item 1 12345 New $100.00
Item 2 12345 New $15.00
Item 3 35864 Old $132.56
Item 4 12345 Old $15.00
What I need to do is find any Items that have the same Part #, one of them has an "Old" Status and the Amount is the same.
So, first we would get all rows with Part # "12345," and then check if any of the rows have an "Old" status with a matching Amount. In this example, we would have Item2 and Item4 as a result.
What now would need to be done is to return the REST of the rows within the table, that have a "New" Status, essentially discarding those two items.
Desired Output:
Name Part # Status Amount
---------------------------------
Item 1 12345 New $100.00
Removed all "Old" status rows and any "New" that had a matching "Part #" and "Amount" with an "Old" status. (I'm sorry, I know that's very confusing, hence my need for help).
I have looked into the following resources to try and figure this out on my own, but there are so many levels that I am getting confused.
Self-join of a subquery
ZenTut
Compare rows and columns of same table
The first two links dealt with comparing columns within the same table. The third one does seem to be a pretty similar question, but does not have a readable answer (for me, anyway).
I do Java development as well and it would be fairly simple to do this there, but I am hoping for a single SQL query (nested), if possible.

The "not exists" statment should do the trick :
select * from table t1
where t1.Status = 'New'
and not exists (select * from table t2
where t2.Status = 'Old'
and t2.Part = t1.Part
and t2.Amount = t1.Amount);

This is a T-SQL answer. Hope it is translatable. If you have a big data set for matches you might change the not in to !Exists.
select *
from table
where Name not in(
select Name
from table t1
join table t2
on t1.PartNumber = t2.PartNumber
AND t1.Status='New'
AND t2.Status='Old'
and t1.Amount=t2.Amount)
and Status = 'New'

could be using an innner join a grouped select for get status old and not only this
select * from
my_table
INNER JOIN (
select
Part_#
, Amount
, count(distinct Status)
, sum(case when Status = 'Old' then 1 else 0 )
from my_table
group part_#, Amount,
having count(distinct Status)>1
and sum(case when Status = 'Old' then 1 else 0 ) > 0
) t on.t.part_# = my_table.part_#
and status = 'new'
and my_table.Amount <> t.Amount

Tried to understand what you want best I could...
SELECT DISTINCT yt.PartNum, yt.Status, yt.Amount
FROM YourTable yt
JOIN YourTable yt2
ON yt2.PartNum = yt.PartNum
AND yt2.Status = 'Old'
AND yt2.Amount != yt.Amount
WHERE yt.Status = 'New'
This gives everything with a new status that has an old status with a different price.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas