max DISTINCT returns multiple rows - sql

I am working on an sql script which is executed by a .bat daily and outputs a list of IDs, the date of access, and their level.
While it returns what I want, mostly, I noticed that some of the outputted rows are duplicates.
Could someone please help me modify my script so that it outputs only one date (the latest) for each ID?
Thank you very much.
SELECT T.ID
+ ';' + substring(convert(char, convert(date , T.QDATE ) ), 1, 10)
+ ';' + A.[LEVEL]
FROM
(SELECT CID AS 'ID',
MAX (DISTINCT EDATE) QDATE
FROM [XXXXXXXXXXXXXXXXXXXXXXXX].[XXX].[XXXXXXXXXXXXXXX]
GROUP BY CID
) T ,
[XXXXXXXXXXXXXXXXXXXXXXXX].[XXX].[XXXXXXXXXXXXXXX] A
WHERE
T.ID = A.CID
AND T.QDATE = A.EDATE
ORDER BY A.[CID]
EDIT: I've added a bit of sample data from table A
| QID | CID | LEVEL | EDATE | OP | STATUS |
|-----|-----|-------|------------|----|--------|
| 1 |00001| LOW | 2021-07-16 | 01 | CLOSED |
| 2 |00001| LOW | 2021-07-16 | 01 | CLOSED |
| 3 |00002| MEDIUM| 2021-07-16 | 01 | CLOSED |
| 4 |00003| LOW | 2021-07-16 | 01 | CLOSED |
In this bit of data, my output contains both rows for CID 00001. Looking for a way to delete the duplicate rows from the output and not make any modifications to the db itself.

Your data is showing only a date portion context of your EDate field. Is is really a date or date/time. It would suggest date/time due to your call to CONVERT( Date, T.QDate) in the query. Your sample data SHOULD show context of time, such as to the second. I would not suspect there are multiple records with the same time-stamp to the second, but its your data.
The DISTINCT should not be at the inner query, but the OUTER query, but IF you have multiple entries for the same CID AT the exact same time AND there are multiple values for Leve, OP, and Status, then you will get multiple.
However, if the values are the same across-the-board as in your sample data, you SHOULD be good with
SELECT DISTINCT
T.ID + ';'
+ substring(convert(char, convert(date , T.QDATE ) ), 1, 10)
+ ';' + A.[LEVEL]
FROM
( SELECT
CID AS 'ID',
MAX (EDATE) QDATE
FROM
[XXXXXXXXXXXXXXXXXXXXXXXX].[XXX].[XXXXXXXXXXXXXXX]
GROUP BY
CID ) T
JOIN [XXXXXXXXXXXXXXXXXXXXXXXX].[XXX].[XXXXXXXXXXXXXXX] A
ON T.ID = A.CID
AND T.QDATE = A.EDATE
ORDER BY
A.CID
The distinct keyword in this context means only give me 1 unique record per each combination of all columns. So in your sample data, you would only have 1 record result for the CID = '00001'.

Related

Greatest N Per Group with JOIN and multiple order columns

I have two tables:
Table0:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-18 | 100 |
| aa | 1 | 12-10 | 101 |
| bb | 2 | 12-10 | 102 |
| cc | 1 | 12-09 | 100 |
| cc | 2 | 12-12 | 103 |
| cc | 2 | 12-01 | 109 |
| cc | 1 | 12-07 | 101 |
| dd | 1 | 12-08 | 100 |
and
Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |
I'm trying to output results where:
ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.
Given my sample data, my results should look like this:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-10 | 101 |
| cc | 2 | 12-01 | 109 |
| dd | 1 | 12-08 | 100 |
I've tried these statements:
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM
FROM TABLE0
GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MSD.MASTY =MA."TYPE"
...which generates a syntax error
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MAB
FROM TABLE0
GROUP BY "ID") AS MAS,
((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY
FROM TABLE0
WHERE "TYPE" = 1
GROUP BY "ID")
UNION
(SELECT "ID", MIN("TIME"), MAX("TYPE")
FROM TABLE0
WHERE "TYPE" = 2
GROUP BY "ID")) AS MACU
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MACU."ID" = QTS."ID"
AND MA."TIME" = MACU.MACTM
AND MA."TYPE" = MACU.MACTB
... which is getting the wrong results.
Answering your direct question "how to avoid...":
You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG
in your data, I cannot say
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id
I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)
These are ok:
SELECT
ID, max(site), min(time)
FROM
table
GROUP BY
id
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id,site
I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it
Usually people ask this when they want to identify:
The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"
In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table
SELECT * FROM
(
SELECT
ID, min(time) as mintime
FROM
table
GROUP BY
id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime
What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)
Looking at your actual problem:
The ID value must exist in two tables.
The Type value must be largest group by id.
The Time value must be smallest in the largest type group.
Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:
You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time
SELECT t.id,min(t.time) FROM
(
SELECT
ID, max(type) as maxtype
FROM
table
GROUP BY
id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id
If you can't see why, let me know
demo:db<>fiddle
SELECT DISTINCT ON (t0.id)
t0.id,
type,
time,
first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time
ID must exist in both tables
This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.
SITE should be the value from the same row as the minimum TIME value.
This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.
TYPE must be the maximum for each ID.
To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...
TIME must be the minimum value for the maximum TYPE for each ID.
... Then you can order this result by time additionally to assure this condition.
Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.
DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.
I would write this using distinct on and in/exists:
select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;

SQL query that finds dates between a range and takes values from another query & iterates range over them?

Sorry if the wording for this question is strange. Wasn't sure how to word it, but here's the context:
I'm working on an application that shows some data about the how often individual applications are being used when users make a request from my web server. The way we take data is by every time the start page loads, it increments a data table called WEB_TRACKING at the date of when it loaded. So there are a lot of holes in data, for example, an application might've been used heavily on September 1st but not at all September 2nd. What I want to do, is add those holes with a value on hits of 0. This is what I came up with.
Select HIT_DATA.DATE_ACCESSED, HIT_DATA.APP_ID, HIT_DATA.NAME, WORKDAYS.BENCH_DAYS, NVL(HIT_DATA.HITS, 0) from (
select DISTINCT( TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY')) as BENCH_DAYS
FROM WEB_TRACKING WEB
) workDays
LEFT join (
SELECT TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY') as DATE_ACCESSED, APP.APP_ID, APP.NAME,
COUNT(WEB.IP_ADDRESS) AS HITS
FROM WEB_TRACKING WEB
INNER JOIN WEB_APP APP ON WEB.APP_ID = APP.APP_ID
WHERE APP.IS_ENABLED = 1 AND (APP.APP_ID = 1 OR APP.APP_ID = 2)
AND (WEB.ACCESS_TIME > TO_DATE('08/04/2018', 'MM/DD/YYYY')
AND WEB.ACCESS_TIME < TO_DATE('09/04/2018', 'MM/DD/YYYY'))
GROUP BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), APP.APP_ID, APP.NAME
ORDER BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), app_id DESC
) HIT_DATA ON HIT_DATA.DATE_ACCESSED = WORKDAYS.BENCH_DAYS
ORDER BY WORKDAYS.BENCH_DAYS
It returns all the dates that between the date range and even converts null hits to 0. However, it returns null for app id and app name. Which makes sense, and I understand how to give a default value for 1 application. I was hoping someone could help me figure out how to do it for multiple applications.
Basically, I am getting this (in the case of using just one application):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| NULL | NULL | 08/04/2018 | 0 |
| 1 | test_app | 08/05/2018 | 1 |
| NULL | NULL | 08/06/2018 | 0 |
But I want this(with multiple applications):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| 1 | test_app | 08/04/2018 | 0 |<- these 0's are converted from null
| 1 | test_app | 08/05/2018 | 1 |
| 1 | test_app | 08/06/2018 | 0 | <- these 0's are converted from null
| 2 | prod_app | 08/04/2018 | 2 |
| 2 | prod_app | 08/05/2018 | 0 | <- these 0's are converted from null
So again to reiterate the question in this long post. How should I go about populating this query so that it fills up the holes in the dates but also reuses the application names and ids and populates that information as well?
You need a list of dates, that probably comes from a number generator rather than a table (if that table has holes, your report will too)
Example, every date for the past 30 days:
select trunc(sysdate-30) + level as bench_days from dual connect by level < 30
Use TRUNC instead of turning a date into a string in order to cut the time off
Now you have a list of dates, you want to add in repeating app id and name:
select * from
(select trunc(sysdate-30) + level as bench_days from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
Now you have all your dates, crossed with all your apps. 2 apps and 30 days will make a 60 row resultset via a cross join. Left join your stat data onto it, and group/count/sum/aggregate ...
select app.app_id, app.name, dat.artificialday, COALESCE(stat.ct, 0) as hits from
(select trunc(sysdate-30) + level as artificialday from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
LEFT JOIN
(SELECT app_id, trunc(access_time) accdate, count(ip_address) ct from web_tracking group by app_id, trunc(access_time)) stat
ON
stat.app_id = app.app_id AND
stat.accdate = dat.artificialday
You don't have to write the query this way/do your grouping as a subquery, I'm just representing it this way to lead you to thinking about your data in blocks, that you build in isolation and join together later, to build more comprehensive blocks

How to create end date that is one day less than the next start date created by another another query with sql?

I queried off of a table that pulls in anyone who has working time percentage of less than 100 and all their working time records if they met the less than 100 criteria.
This table contains the columns: id, eff_date (of working time percentage), and percentage. This table does not contain end_date.
Problem: how to build on top of the query below and add a new column called end_date that is one date less than the next eff_date?
Current query
select
j1.id, j1.eff_date, j1.percentage
from
working_time_table j1
where
exists (select 1
from working_time_table j2
where j2.id = j1.id and j2.percentage < 100)
Data returned from the query above:
ID | EFF_DATE| PERCENTAGE
------------------------
12 | 01-JUN-2012 | 70
12 | 03-MAR-2013 | 100
12 | 13-DEC-2014 | 85
The desired result set is:
ID | EFF_DATE | PERCENTAGE | END_DATE
-------------------------------------------
12 | 01-JUN-2012 | 70 | 02-MAR-2013
12 | 03-MAR-2013 | 100 | 12-DEC-2014
12 | 13-DEC-2014 | 85 | null
You didn't state your DBMS so this is ANSI SQL using window functions:
select j1.id,
j1.eff_date,
j1.percentage,
lead(j1.eff_date) over (partition by j1.id order by j1.eff_date) - interval '1' day as end_date
from working_time_table j1
where exists (select 1
from working_time_table j2
where j2.id = j1.id and j2.percentage < 100);
First off, curious if the "id" column is unique or it has duplicate values like the 12's in your sample, or is that a unique column or primary key possibly. It would be WAAAAY easier to do this if there was a unique
id column that held the order. If you don't have a unique ID column,
are you able to add one to the table? Again, would simplify this
tremendously.
This took forever to get right, I hope this helps, burned many hours on it.
Props to Akhil for helping me finally get the query right. He is a true SQL genius.
Here is the ..
SQLFIDDLE
SELECT
id,
firstTbl.eff_Date,
UPPER(DATE_FORMAT(DATE_SUB(
STR_TO_DATE(secondTbl.eff_Date, '%d-%M-%Y'),
INTERVAL 1 DAY), '%d-%b-%Y')) todate,
percentage FROM
(SELECT
(#cnt := #cnt + 1) rownum,
id, eff_date, percentage
FROM working_time_table,
(SELECT
#cnt := 0) s) firstTbl
LEFT JOIN
(SELECT
(#cnt1 := #cnt1 + 1) rownum,
eff_date
FROM working_time_table,
(SELECT
#cnt1 := 0) s) secondTbl
ON (firstTbl.rownum + 1) = secondTbl.rownum

Query to Calculate totalcost based on description

I have question regarding sql script. I have a custom view, below is the data
================================================================================
ql_siteid | ql_rfqnum | ql_vendor | ql_itemnum | totalcost_option | description
================================================================================
SGCT | 1002 | VND001 | ITEM002 | 12500 |
SGCT | 1002 | VND001 | ITEM001 | 1350 |
SGCT | 1002 | VND002 | ITEM002 | 11700 |
SGCT | 1002 | VND002 | ITEM001 | 1470 | Nikon
SGCT | 1002 | VND002 | ITEM001 | 1370 | Asus
================================================================================
And i want the result like below table:
VND001 = 13850
VND002 = Asus 13070, Nikon 13170
where 13850 is come from 12500+1350, 13070 is come from 11700+1370 and 13170 is come from 11700+1470. All the cost is calculated from totalcost_option and will be group based on vendor
So please give me some advise
To get the exact output you required use the following statement: (where test_table is your table name):
SELECT ql_vendor || ' = ' ||
LISTAGG( LTRIM(description||' ')||totalcost, ', ')
WITHIN GROUP (ORDER BY description)
FROM (
WITH base_cost AS (
SELECT ql_vendor, SUM(totalcost_option) sumcost
FROM test_table WHERE description IS NULL
GROUP BY ql_vendor
),
individual_cost AS (
SELECT ql_vendor, totalcost_option icost, description
FROM test_table WHERE description IS NOT NULL
)
SELECT ql_vendor, sumcost + NVL(icost,0) totalcost, description
FROM base_cost LEFT OUTER JOIN individual_cost USING (ql_vendor)
)
GROUP BY ql_vendor;
Details:
The Outer select just takes the individual rows and combines them to the String-representation. Just remove it and you will get a single row for each vendor/description combination.
The inner select joins two sub-select. The first one gets the base_cost for each vendor by summing up all rows without a description. The second gets the individual cost for each row with a description.
The join combines them - and left outer joins displays the base_cost for vendors which don't have a matching row with description.
Assuming you have a version of Oracle 11g or later, using ListAgg will do the combination of the comma separated tuples for you. The rest of the string is generated by simply concatenating the components together from an intermediate table - I've used a derived table (X) here, but you could also use a CTE.
Edit
As pointed out in the comments, there's a whole bunch more logic missing around the Null description items I missed in my original answer.
The following rather messy query does project the required result, but I believe this may be indicative that a table design rethink is necessary. The FULL OUTER JOIN should ensure that rows are returned even if there are no base / descriptionless cost items for the vendor.
WITH NullDescriptions AS
(
SELECT "ql_vendor", SUM("totalcost_option") AS "totalcost_option"
FROM MyTable
WHERE "description" IS NULL
GROUP BY "ql_vendor"
),
NonNulls AS
(
SELECT COALESCE(nd."ql_vendor", mt."ql_vendor") AS "ql_vendor",
NVL(mt."description", '') || ' '
|| CAST(NVL(mt."totalcost_option", 0)
+ nd."totalcost_option" AS VARCHAR2(30)) AS Combined
FROM NullDescriptions nd
FULL OUTER JOIN MyTable mt
ON mt."ql_vendor" = nd."ql_vendor"
AND mt."description" IS NOT NULL
)
SELECT x."ql_vendor" || ' = ' || ListAgg(x.Combined, ', ')
WITHIN GROUP (ORDER BY x.Combined)
FROM NonNulls x
WHERE x.Combined <> ' '
GROUP BY x."ql_vendor";
Updated SqlFiddle here
Your logic seems to be: If description is always NULL for a vendor then you want that as the total cost. Otherwise, you want the NULL value of description added to all the other values. The following query implements this logic. The output is in a different format from your answer -- this format is more consistent with a SQL result set:
select ql_vendor,
(sum(totalcost_option) +
(case when description is not null then max(totalcost_null) else 0 end)
)
from (select v.*, max(description) over (partition by ql_vendor) as maxdescription,
sum(case when description is null then totalcost_option else 0 end) over (partition by ql_vendor) as totalcost_null
from view v
) t
where maxdescription is null or description is not null
group by ql_vendor, description;

Access query to grab +5 or more duplicates

i have a little problem with an Access query ( dont ask me why but i cannot use a true SGBD but Access )
i have a huge table with like 920k records
i have to loop through all those data and grab the ref that occur more than 5 time on the same date
table = myTable
--------------------------------------------------------------
| id | ref | date | C_ERR_ANO |
--------------------------------------------|-----------------
| 1 | A12345678 | 2012/02/24 | A 4565 |
| 2 | D52245708 | 2011/05/02 | E 5246 |
| ... | ......... | ..../../.. | . .... |
--------------------------------------------------------------
so to resume it a bit, i have like 900000+ records
there is duplicates on the SAME DATE ( oh by the way there is another collumn i forgot to add that have C_ERR_ANO as name)
so i have to loop through all those row, grab each ref based on date AND errorNumber
and if there is MORE than 5 time with the same errorNumber i have to grab them and display it in the result
i ended up using this query:
SELECT DISTINCT Centre.REFERENCE, Centre.DATESE, Centre.C_ERR_ANO
FROM Centre INNER JOIN (SELECT
Centre.[REFERENCE],
COUNT(*) AS `toto`,
Centre.DATESE
FROM Centre
GROUP BY REFERENCE
HAVING COUNT(*) > 5) AS Centre_1
ON Centre.REFERENCE = Centre_1.REFERENCE
AND Centre.DATESE <> Centre_1.DATESE;
but this query isent good
i tried then
SELECT DATESE, REFERENCE, C_ERR_ANO, COUNT(REFERENCE) AS TOTAL
FROM (
SELECT *
FROM Centre
WHERE (((Centre.[REFERENCE]) NOT IN (SELECT [REFERENCE]
FROM [Centre] AS Tmp
GROUP BY [REFERENCE],[DATESE],[C_ERR_ANO]
HAVING Count(*)>1 AND [DATESE] = [Centre].[DATESE]
AND [C_ERR_ANO] = [Centre].[C_ERR_ANO]
AND [LIBELLE] = [Centre].[LIBELLE])))
ORDER BY Centre.[REFERENCE], Centre.[DATESE], Centre.[C_ERR_ANO])
GROUP BY REFERENCE, DATESE, C_ERR_ANO
still , not working
i'm struggeling
Your group by clause needs to include all of the items in your select. Why not use:
select Centre.DATESE, Centre.C_ERR_ANO, Count (*)
Group by Centre.DATESE, Centre.C_ERR_ANO
HAVING COUNT (*) > 5
If you need other fields then you can add them, as long as you ensure the same fields appear in the select as the group by.
No idea what is going on with the formatting here!