Access query to grab +5 or more duplicates

Access query to grab +5 or more duplicates - sql

i have a little problem with an Access query ( dont ask me why but i cannot use a true SGBD but Access )
i have a huge table with like 920k records
i have to loop through all those data and grab the ref that occur more than 5 time on the same date
table = myTable
--------------------------------------------------------------
| id | ref | date | C_ERR_ANO |
--------------------------------------------|-----------------
| 1 | A12345678 | 2012/02/24 | A 4565 |
| 2 | D52245708 | 2011/05/02 | E 5246 |
| ... | ......... | ..../../.. | . .... |
--------------------------------------------------------------
so to resume it a bit, i have like 900000+ records
there is duplicates on the SAME DATE ( oh by the way there is another collumn i forgot to add that have C_ERR_ANO as name)
so i have to loop through all those row, grab each ref based on date AND errorNumber
and if there is MORE than 5 time with the same errorNumber i have to grab them and display it in the result
i ended up using this query:
SELECT DISTINCT Centre.REFERENCE, Centre.DATESE, Centre.C_ERR_ANO
FROM Centre INNER JOIN (SELECT
Centre.[REFERENCE],
COUNT(*) AS `toto`,
Centre.DATESE
FROM Centre
GROUP BY REFERENCE
HAVING COUNT(*) > 5) AS Centre_1
ON Centre.REFERENCE = Centre_1.REFERENCE
AND Centre.DATESE <> Centre_1.DATESE;
but this query isent good
i tried then
SELECT DATESE, REFERENCE, C_ERR_ANO, COUNT(REFERENCE) AS TOTAL
FROM (
SELECT *
FROM Centre
WHERE (((Centre.[REFERENCE]) NOT IN (SELECT [REFERENCE]
FROM [Centre] AS Tmp
GROUP BY [REFERENCE],[DATESE],[C_ERR_ANO]
HAVING Count(*)>1 AND [DATESE] = [Centre].[DATESE]
AND [C_ERR_ANO] = [Centre].[C_ERR_ANO]
AND [LIBELLE] = [Centre].[LIBELLE])))
ORDER BY Centre.[REFERENCE], Centre.[DATESE], Centre.[C_ERR_ANO])
GROUP BY REFERENCE, DATESE, C_ERR_ANO
still , not working
i'm struggeling

Your group by clause needs to include all of the items in your select. Why not use:
select Centre.DATESE, Centre.C_ERR_ANO, Count (*)
Group by Centre.DATESE, Centre.C_ERR_ANO
HAVING COUNT (*) > 5
If you need other fields then you can add them, as long as you ensure the same fields appear in the select as the group by.
No idea what is going on with the formatting here!

Related

Greatest N Per Group with JOIN and multiple order columns

I have two tables:
Table0:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-18 | 100 |
| aa | 1 | 12-10 | 101 |
| bb | 2 | 12-10 | 102 |
| cc | 1 | 12-09 | 100 |
| cc | 2 | 12-12 | 103 |
| cc | 2 | 12-01 | 109 |
| cc | 1 | 12-07 | 101 |
| dd | 1 | 12-08 | 100 |
and
Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |
I'm trying to output results where:
ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.
Given my sample data, my results should look like this:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-10 | 101 |
| cc | 2 | 12-01 | 109 |
| dd | 1 | 12-08 | 100 |
I've tried these statements:
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM
FROM TABLE0
GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MSD.MASTY =MA."TYPE"
...which generates a syntax error
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MAB
FROM TABLE0
GROUP BY "ID") AS MAS,
((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY
FROM TABLE0
WHERE "TYPE" = 1
GROUP BY "ID")
UNION
(SELECT "ID", MIN("TIME"), MAX("TYPE")
FROM TABLE0
WHERE "TYPE" = 2
GROUP BY "ID")) AS MACU
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MACU."ID" = QTS."ID"
AND MA."TIME" = MACU.MACTM
AND MA."TYPE" = MACU.MACTB
... which is getting the wrong results.

Answering your direct question "how to avoid...":
You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG
in your data, I cannot say
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id
I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)
These are ok:
SELECT
ID, max(site), min(time)
FROM
table
GROUP BY
id
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id,site
I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it
Usually people ask this when they want to identify:
The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"
In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table
SELECT * FROM
(
SELECT
ID, min(time) as mintime
FROM
table
GROUP BY
id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime
What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)
Looking at your actual problem:
The ID value must exist in two tables.
The Type value must be largest group by id.
The Time value must be smallest in the largest type group.
Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:
You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time
SELECT t.id,min(t.time) FROM
(
SELECT
ID, max(type) as maxtype
FROM
table
GROUP BY
id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id
If you can't see why, let me know

demo:db<>fiddle
SELECT DISTINCT ON (t0.id)
t0.id,
type,
time,
first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time
ID must exist in both tables
This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.
SITE should be the value from the same row as the minimum TIME value.
This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.
TYPE must be the maximum for each ID.
To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...
TIME must be the minimum value for the maximum TYPE for each ID.
... Then you can order this result by time additionally to assure this condition.
Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.
DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.

I would write this using distinct on and in/exists:
select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?

You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Select and count in the same query on two tables

I've got these two tables:
___Subscriptions
|--------|--------------------|--------------|
| SUB_Id | SUB_HotelId | SUB_PlanName |
|--------|--------------------|--------------|
| 1 | cus_AjGG401e9a840D | Free |
|--------|--------------------|--------------|
___Rooms
|--------|-------------------|
| ROO_Id | ROO_HotelId |
|--------|-------------------|
| 1 |cus_AjGG401e9a840D |
| 2 |cus_AjGG401e9a840D |
| 3 |cus_AjGG401e9a840D |
| 4 |cus_AjGG401e9a840D |
|--------|-------------------|
I'd like to select the SUB_PlanName and count the rooms with the same HotelId.
So I tried:
SELECT COUNT(*) as 'ROO_Count', SUB_PlanName
FROM ___Rooms
JOIN ___Subscriptions
ON ___Subscriptions.SUB_HotelId = ___Rooms.ROO_HotelId
WHERE ROO_HotelId = 'cus_AjGG401e9a840D'
and
SELECT
SUB_PlanName,
(
SELECT Count(ROO_Id)
FROM ___Rooms
Where ___Rooms.ROO_HotelId = ___Subscriptions.SUB_HotelId
) as ROO_Count
FROM ___Subscriptions
WHERE SUB_HotelId = 'cus_AjGG401e9a840D'
But I get empty datas.
Could you please help ?
Thanks.

You need to use GROUP BY whenever you do some aggregation(here COUNT()). Below query will give you the number of ROO_ID only for the SUB_HotelId = 'cus_AjGG401e9a840D' because you have this condition in WHERE. If you want the COUNTs for all Hotel_IDs then you can simply remove the WHERE filter from this query.
SELECT s.SUB_PlanName, COUNT(*) as 'ROO_Count'
FROM ___Rooms r
JOIN ___Subscriptions s
ON s.SUB_HotelId = r.ROO_HotelId
WHERE r.ROO_HotelId = 'cus_AjGG401e9a840D'
GROUP BY s.SUB_PlanName;
To be safe, you can also use COUNT(DISTINCT r.ROO_Id) if you don't want to double count a repeating ROO_Id. But your table structures seem to have unique(non-repeating) ROO_Ids so using a COUNT(*) should work as well.

Grabbing the cost with the max date grouping on number openquery

I am trying to select information from another server, and insert it into a table through open query... Here is where I am at so far:
INSERT INTO smallprojects..PhyInv_310QADLockedDet (MasterRecid, location, partnum, qty)
SELECT ##IDENTITY, ld_loc, ld_part, ld_qty_oh
FROM OPENQUERY(LANSRHQAD, 'SELECT ld_loc,ld_part,ld_qty_oh FROM PUB.ld_det as a left outer join PUB.pt_mstr as b on a.ld_part = b.pt_part where pt_status <> ''OB'' and ld_part not like ''S%'' and ld_part not like ''N%'' and ld_loc = ''310'' ')
But this will insert multiple part numbers if the PUB.ld_det has multiple entries for that part, sort of like the example below:
Here is the data (PUB.ld_det):
Part | Date | Qty
-------------------
1000 | 10-02 | 0
1000 | 10-03 | 2
1001 | 10-2 | 0
1001 | 10-2 | 2
I would like my result to be a insert into a table as:
Part | Qty
-------------------
1000 | 2
1001 | 2
Currently it is returning as:
Part | Qty
-----------
1000 | 0
1000 | 2
1001 | 0
1001 | 2
So when I go back to update this table I just have to hope it finds the right row.
How can I avoid bringing in the multiples and only bring it in with the highest date? The open query thing messes with me so much

Here is one simple method is you want one row per part:
INSERT INTO smallprojects..PhyInv_310QADLockedDet (MasterRecid, location, partnum, qty)
SELECT TOP (1) WITH TIES ##IDENTITY, ld_loc, ld_part, ld_qty_oh
FROM OPENQUERY(LANSRHQAD, 'SELECT ld_loc,ld_part,ld_qty_oh FROM PUB.ld_det as a left outer join PUB.pt_mstr as b on a.ld_part = b.pt_part where pt_status <> ''OB'' and ld_part not like ''S%'' and ld_part not like ''N%'' and ld_loc = ''310'' ')
ORDER BY ROW_NUMBER() OVER (PARTITION BY ld_part ORDER BY ld_qty_oh DESC);
Use RANK() if you want duplicates when there are ties.

SQL Query Count and Label if first

I'm using SQL Server management Studio 2010.
I need my query to populate a column if the record contains the first occurrence of a value.
The table my query returns is huge so I'll just use pretend columns to get my point across. My query currently returns a table that looks like this
| ROW | ItemNumber | DateOpen | Status |
| 1 | 10045 | 5/5/2005 | Open |
| 2 | 10045 | 5/5/2005 | Open |
| 3 | 10046 | 5/5/2005 | Open |
| 4 | 10046 | 5/5/2005 | Open |
| 5 | 10046 | 5/5/2005 | Open |
I've already added the row indicator in to the query thinking it would help identify the first occurrence of an ItemNumber. I need to have a new column that marks an X if the record is the first occurrence.
I have this so far
DECLARE #ItemData Table(itemRow BIGINT, itemNumber BIGINT, DateOpen VARCHAR(15), status VARCHAR(15))
INSERT INTO #ItemData (itemRow, ItemNumber, DateOpen, Status)
SELECT Row_Number() OVER(ORDER BY Cm_ItemNumber) AS 'ROW'
,Cm_ItemNumber AS ItemNumber
,Dates_DateOpen AS DateOpen
,St_Status AS Status
FROM db_Items
JOIN db_Dates ON Dates_Item = Cm_ItemID
JOIN db_Status ON St_ID = Cm_StatusID
Select * from #ItemData
The reason its thrown into a table then selected seperately is because there's actually a union and a lot more stuff in the query and when I realized I needed to do a unique checker and add it to a column, I figured the easiest way would be an "after the fact" type thing and it would go into the Select * from #ItemData portion.

I haven't tested this, but it's along the lines I'd be playing with...
using your existing query as the first of 2 CTEs :
With AllData as
(
SELECT Row_Number() OVER(ORDER BY Cm_ItemNumber) AS 'ROW'
,Cm_ItemNumber AS ItemNumber
,Dates_DateOpen AS DateOpen
,St_Status AS Status
FROM db_Items
JOIN db_Dates ON Dates_Item = Cm_ItemID
JOIN db_Status ON St_ID = Cm_StatusID
),
FirstRows as
(
SELECT Min(ROW) as Row, ItemNumber
FROM AllData
GROUP BY ItemNumber
)
SELECT
ad.*,
Case When fr.Row IS NULL then '' else 'X' end as X_Col
FROM AllData ad LEFT JOIN FirstRows fr
ON ad.ROW=fr.Row

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Access query to grab +5 or more duplicates - sql

Related

Greatest N Per Group with JOIN and multiple order columns

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

Select and count in the same query on two tables

Grabbing the cost with the max date grouping on number openquery

SQL Query Count and Label if first

Categories

Resources