Clean up 'duplicate' data while preserving most recent entry

Clean up 'duplicate' data while preserving most recent entry - sql

I want to display each crew member, basic info, and the most recent start date from their contracts. With my basic query, it returns a row for each contract, duplicating the basic info with a distinct start and end date.
I only need one row per person, with the latest start date (or null if they have never yet had a start date).
I have limited understanding of group by and partition functions. Queries I have reverse engineered for similar date use partition and create temp tables where they select from. Ultimately I could reuse that but it seems more convoluted than what we need.
select
Case when P01.EMPLOYMENTENDDATE < getdate() then 'Y'
else ''
end as "Deactivate",
concat(p01.FIRSTNAME,' ',p01.MIDDLENAME) as "First and Middle",
p01.LASTNAME,
p01.PIN,
(select top 1 TELENO FROM PW001P0T WHERE PIN = P01.PIN and TELETYPE = 6 ORDER BY TELEPRIORITY) as "EmailAddress",
org.NAME AS Vessel,
case
WHEN c02.CODECATEGORY= '20' then 'MARINE'
WHEN c02.CODECATEGORY= '10' then 'MARINE'
ELSE 'HOTEL' end as "Department",
c02.name as RankName,
c02.Alternative RankCode,
convert(varchar, ACT.DATEFROM,101) EmbarkDate,
convert(varchar,(case when ACT.DATEFROM is null then p03.TODATEESTIMATED else ACT.DATEFROM end),101) DebarkDate
FROM PW001P01 p01
JOIN PW001P03 p03
ON p03.PIN = p01.PIN
LEFT JOIN PW001C02 c02
ON c02.CODE = p03.RANK
/*LEFT JOIN PW001C02 CCIRankTbl
ON CCIRankTbl.CODE = p01.RANK*/
LEFT JOIN PWORG org
ON org.NUMORGID = dbo.ad_scanorgtree(p03.NUMORGID, 3)
LEFT JOIN PWORGVESACT ACT
ON ACT.numorgid=dbo.ad_scanorgtree(p03.numorgid,3)
where P01.EMPLOYMENTENDDATE > getdate()-10 or P01.EMPLOYMENTENDDATE is null
I only need to show one row per column. The first 5 columns will be the same always. The last columns depend on contract, and we just need data from the most recent one.
<table><tbody><tr><th>Deactivate</th><th>First and Middle</th><th>Lastname</th><th>PIN</th><th>Email</th><th>Vessel</th><th>Department</th><th>Rank</th><th>RankCode</th><th>Embark</th><th>Debark</th></tr><tr><td> </td><td>Martin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship1</td><td>Marine</td><td>ViceCaptain</td><td>VICE</td><td>9/1/2008</td><td>9/20/2008</td></tr><tr><td> </td><td>Matin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship2</td><td>Marine</td><td>Captain</td><td>CAP</td><td>12/1/2008</td><td>12/20/2008</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship1</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>5/1/2009</td><td>8/1/2009</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship3</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>10/1/2010</td><td>12/20/2010</td></tr></tbody></table>

Change your query to a SELECT DISTINCT on the main query and use a sub-select for DebarkDate column:
(SELECT TOP 1 A.DATEFROM FROM PWORGVESACT A WHERE A.numorgid = ACT.numorgid ORDER BY A.DATEFROM DESC) AS DebarkDate
You can do whatever conversions on the date you need to from the result of that sub-query.

Related

TSQL "where ... group by ..." issue that needs solution like "having ..."

I have 3 sub-tables of different formats joined together with unions if this affects anything into full-table. There I have columns "location", "amount" and "time". Then to keep generality for my later needs I union full-table with location-table that has all possible "location" values and other fields are null into master-table.
I query master-table,
select location, sum(amount)
from master-table
where (time...)
group by location
However some "location" values are dropped because sum(amount) is 0 for those "location"s but I really want to have full list of those "location"s for my further steps.
Alternative would be to use HAVING clause but from what I understand HAVING is impossible here because i filter on "time" while grouping on "location" and I would need to add "time" in grouping which destroys the purpose. Keep in mind that the goal here is to get sum(amount) in each "location"
select location, sum(amount)
from master-table
group by location, time
having (time...)
To view the output:
with the first code I get
loc1, 5
loc3, 10
loc6, 1
but I want to get
loc1, 5
loc2, 0
loc3, 10
loc4, 0
loc5, 0
loc6, 1
Any suggestions on what can be done with this structure of master-table? Alternative solution to which I have no idea how to code would be to add numbers from the first query result to location-table (as a query, not actual table) with the final result query that I've posted above.

What you want will require a complete list of locations, then a left-outer join using that table and your calculated values, and IsNull (for tsql) to ensure you see the 0s you expect. You can do this with some CTEs, which I find valuable for clarity during development, or you can work on "putting it all together" in a more traditional SELECT...FROM... statement. The CTE approach might look like this:
WITH loc AS (
SELECT DISTINCT LocationID
FROM location_table
), summary_data as (
SELECT LocationID, SUM(amount) AS location_sum
FROM master-table
GROUP BY LocationID
)
SELECT loc.LocationID, IsNull(location_sum,0) AS location_sum
FROM loc
LEFT OUTER JOIN summary_data ON loc.LocationID = summary_data.LocationID
See if that gets you a step or two closer to the results you're looking for.

I can think of 2 options:
You could move the WHERE to a CASE WHEN construction:
-- Option 1
select
location,
sum(CASE WHEN time <'16:00' THEN amount ELSE 0 END)
from master_table
group by location
Or you could JOIN with the possible values of location (which is my first ever RIGHT JOIN in a very long time 😉):
-- Option 2
select
x.location,
sum(CASE WHEN m.time <'16:00' THEN m.amount ELSE 0 END)
from master_table m
right join (select distinct location from master_table) x ON x.location = m.location
group by x.location
see: DBFIDDLE

The version using T-SQL without CTEs would be:
SELECT l.location ,
ISNULL(m.location_sum, 0) as location_sum
FROM master-table l
LEFT JOIN (
SELECT location,
SUM(amount) as location_sum
FROM master-table
WHERE (time ... )
GROUP BY location
) m ON l.location = m.location
This assumes that you still have your initial UNION in place that ensures that master-table has all possible locations included.

It is the where clause that excludes some locations. To ensure you retain every location you could introduce "conditional aggregation" instead of using the where clause: e.g.
select location, sum(case when (time...) then amount else 0 end) as location_sum
from master-table
group by location
i.e. instead of excluding some rows from the result, place the conditions inside the sum function that equate to the conditions you would have used in the where clause. If those conditions are true, then it will aggregate the amount, but if the conditions evaluate to false then 0 is summed, but the location is retained in the result.

Oracle: select just last update of date

I have the following query that return me: 100 rows
SELECT uni_id, uni_mast_id, uni_type
FROM UNIVERSITIES
WHERE uni_master ='SO88'AND uni_stat= 'OK'
now i need to do a join with another table and to obtain last entry of that day then:
SELECT uni_id, uni_teach_name, MAX(cal_update), cal_status
FROM UNIVERSITIES
LEFT JOIN CALENDAR
ON unı_id = cal_id
WHERE uni_master = 'SO88'
AND uni_stat = 'OK'
AND cal_name = 'REGISTRED'
GROUP BY uni_id, uni_teach_name, uni_stat
ORDER BY cal_update
but this query gives me 102 records, because cal_update appears 2 times.
One for example with date : 22-OCT-2020 11:34:55 another for the same uni_id at time 22-OCT-2020 11:30:22
I want just to get the max date for that date, not both.
In this case the query with the join needs to return the same records of the first select query.

I think you can do what you want using row_number():
SELECT UNI_ID, UNI_TEACH_NAME, CAL_UPDATE, CAL_STATUS
FROM (SELECT U.UNI_ID, U.UNI_TEACH_NAME, C.CAL_UPDATE, C.CAL_STATUS,
ROW_NUMBER() OVER (PARTITION BY U.UNI_ID, TRUNC(C.CAL_UPDATE) ORDER BY C.CAL_UPDATE DESC) as seqnum
FROM UNIVERSITIES U LEFT JOIN
CALENDAR C
ON U.UNI_ID = C.CAL_ID AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88' AND
U.UNI_STAT= 'OK'
) UC
WHERE seqnum = 1;
I have to guess where the columns come from, because the question is not clear. Any filtering columns from CALENDAR should be in the ON clause if you are using a LEFT JOIN.

You can replace the last part of the query, while aliasing the MAX(cal_update) with cal_update , as
ORDER BY cal_update DESC
FETCH FIRST 1 ROW WITH TIES
for DB version 12c+ to descendingly order by the concerned column in order to pick the record with the latest value for that column.
WITH TIES option stand for bringing all records with the same datetime values, might be replaced with ONLY in order to bring only one row even for those cases occur.
The column call_status(within the select list) should be removed which's a non- aggregated column

As an alternative to a subquery and rank, you could use KEEP...LAST :
SELECT U.UNI_ID,
U.UNI_TEACH_NAME,
MAX(C.CAL_UPDATE) AS CAL_UPDATE,
MAX(C.CAL_STATUS) KEEP (DENSE_RANK LAST ORDER BY C.CAL_UPDATE) AS CAL_STATUS
FROM UNIVERSITIES U
LEFT JOIN CALENDAR C
ON U.UNI_ID = C.CAL_ID
AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88'
AND U.UNI_STAT= 'OK'
GROUP BY U.UNI_ID,
U.UNI_TEACH_NAME,
TRUNC(C.CAL_UPDATE)
I've moved the CAL_NAME check into the outer join's ON clause; if it's in the WHERE clause then it will effectively turn it back into an inner join. So this will get one row per university per day that the calendar was updated: "I want just to get the max date for that date". And it will show nulls for the calendar fields if there is no matching calendar, since it's an outer join.
If you actually only want the latest update on any day then just remove the TRUNC(C.CAL_UPDATE) from the grouping:
SELECT U.UNI_ID,
U.UNI_TEACH_NAME,
MAX(C.CAL_UPDATE) AS CAL_UPDATE,
MAX(C.CAL_STATUS) KEEP (DENSE_RANK LAST ORDER BY C.CAL_UPDATE) AS CAL_STATUS
FROM UNIVERSITIES U
LEFT JOIN CALENDAR C
ON U.UNI_ID = C.CAL_ID
AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88'
AND U.UNI_STAT= 'OK'
GROUP BY U.UNI_ID,
U.UNI_TEACH_NAME
db<>fiddle with some made-up data; and also (just for fun) showing Gordon's query with the calendar name clause in both places to show the difference, and to show this gets the same result for that dummy data. (And an 18c version which shows Barbaros' too; getting back a single row.)

SQL - Count new entries based on last date

I have a table with the follow structure
ID ReportDate Object_id
What I need to know, is the count of new and count of old (Object id's)
For example: If I have the data below:
I want the following output grouped by ReportDate:
I thought a way doing it using a Where clause based on date, however i need the data for all the dates I have in the table. To see the count of what already existed in the previous report and what is new at that report. Any Ideas?
Edit: New/Old definition- New would be the records that never appeared before that report run date and appeared on this one, whereas old is the number of records that had at least one match in previous dates. I'll edit the post to include this info.

managed to do it using a left join. Below is my solution in case it helps anyone in the future :)
SELECT table.ReportRunDate,
-1*sum(table.ReportRunDate = new_table.init_date) as count_new,
-1*sum(table.ReportRunDate <> new_table.init_date) as count_old,
count(*) as count_total
FROM table LEFT JOIN
((SELECT Object_ID, min(ReportRunDate) as init_date
FROM table
GROUP By OBJECT_ID) as new_table)
ON table.Object_ID = new_table.Object_ID
GROUP BY ReportRunDate

This would work in Oracle, not sure about ms-access:
SELECT ReportDate
,COUNT(CASE WHEN rnk = 1 THEN 1 ELSE NULL END) count_of_new
,COUNT(CASE WHEN rnk <> 1 THEN 1 ELSE NULL END)count_of_old
FROM (SELECT ID
,ReportDate
,Object_id
,RANK() OVER (PARTITION BY Object_id ORDER BY ReportDate) rnk
FROM table_name)
GROUP BY ReportDate
Inner query should rank each occurence of object_id based on the ReportDate so the 1st occurrence of certain object_id will have rank = 1, the next one rank = 2 etc.
Then the outer query counts how many records with rank equal/not equal 1 are the within each group.
I assumed that 1 object_id can appear only once within each reportDate.

Unpivot date columns to a single column of a complex query in Oracle

Hi guys, I am stuck with a stubborn problem which I am unable to solve. Am trying to compile a report wherein all the dates coming from different tables would need to come into a single date field in the report. Ofcourse, the max or the most recent date from all these date columns needs to be added to the single date column for the report. I have multiple users of multiple branches/courses for whom the report would be generated.
There are multiple blogs and the latest date w.r.t to the blogtitle needs to be grouped, i.e. max(date_value) from the six date columns should give the greatest or latest date for that blogtitle.
Expected Result:
select u.batch_uid as ext_person_key, u.user_id, cm.batch_uid as ext_crs_key, cm.crs_id, ir.role_id as
insti_role, (CASE when b.JOURNAL_IND = 'N' then
'BLOG' else 'JOURNAL' end) as item_type, gm.title as item_name, gm.disp_title as ITEM_DISP_NAME, be.blog_pk1 as be_blogPk1, bc.blog_entry_pk1 as bc_blog_entry_pk1,bc.pk1,
b.ENTRY_mod_DATE as b_ENTRY_mod_DATE ,b.CMT_mod_DATE as BlogCmtModDate, be.CMT_mod_DATE as be_cmnt_mod_Date,
b.UPDATE_DATE as BlogUpDate, be.UPDATE_DATE as be_UPDATE_DATE,
bc.creation_date as bc_creation_date,
be.CREATOR_USER_ID as be_CREATOR_USER_ID , bc.creator_user_id as bc_creator_user_id,
b.TITLE as BlogTitle, be.TITLE as be_TITLE,
be.DESCRIPTION as be_DESCRIPTION, bc.DESCRIPTION as bc_DESCRIPTION
FROM users u
INNER JOIN insti_roles ir on u.insti_roles_pk1 = ir.pk1
INNER JOIN crs_users cu ON u.pk1 = cu.users_pk1
INNER JOIN crs_mast cm on cu.crsmast_pk1 = cm.pk1
INNER JOIN blogs b on b.crsmast_pk1 = cm.pk1
INNER JOIN blog_entry be on b.pk1=be.blog_pk1 AND be.creator_user_id = cu.pk1
LEFT JOIN blog_CMT bc on be.pk1=bc.blog_entry_pk1 and bc.CREATOR_USER_ID=cu.pk1
JOIN gradeledger_mast gm ON gm.crsmast_pk1 = cm.pk1 and b.grade_handler = gm.linkId
WHERE cu.ROLE='S' AND BE.STATUS='2' AND B.ALLOW_GRADING='Y' AND u.row_status='0'
AND u.available_ind ='Y' and cm.row_status='0' and and u.batch_uid='userA_157'
I am getting a resultset for the above query with multiple date columns which I want > > to input into a single columnn. The dates have to be the most recent, i.e. max of the dates in the date columns.
I have successfully done the Unpivot by using a view to store the above
resultset and put all the dates in one column. However, I do not
want to use a view or a table to store the resultset and then do
Unipivot simply because I cannot keep creating views for every user
one would query for.
The max(date_value) from the date columns need to be put in one single column. They are as follows:
* 1) b.entry_mod_date, 2) b.cmt_mod_date ,3) be.cmt_mod_date , 4) b.update_Date ,5) be.update_date, 6) bc.creation_date *
Apologies that I could not provide the desc of all the tables and the
fields being used.
Any help to get the above mentioned max of the dates from these
multiple date columns into a single column without using a view or a
table would be greatly appreciated.*

It is not clear what results you want, but the easiest solution is to use greatest().
with t as (
YOURQUERYHERE
)
select t.*,
greatest(entry_mod_date, cmt_mod_date, cmt_mod_date, update_Date,
update_date, bc.creation_date
) as greatestdate
from t;

select <columns>,
case
when greatest (b_ENTRY_mod_DATE) >= greatest (BlogCmtModDate) and greatest(b_ENTRY_mod_DATE) >= greatest(BlogUpDate)
then greatest( b_ENTRY_mod_DATE )
--<same implementation to compare each time BlogCmtModDate and BlogUpDate separately to get the greatest then 'date'>
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
case
when greatest (be_cmnt_mod_Date) >= greatest (be_UPDATE_DATE)
then greatest( be_cmnt_mod_Date )
when greatest (be_UPDATE_DATE) >= greatest (be_cmnt_mod_Date)
then greatest( be_UPDATE_DATE )
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
GREATEST(bc_creation_date)
,<columns>
FROM table
<rest of the query>

How to perform running sum (balance) in SQL

I have 2 SQL Tables
unit_transaction
unit_detail_transactions
(tables schema here: http://sqlfiddle.com/#!3/e3204/2 )
What I need is to perform an SQL Query in order to generate a table with balances. Right now I have this SQL Query but it's not working fine because when I have 2 transactions with the same date then the balance is not calculated correctly.
SELECT
ft.transactionid,
ft.date,
ft.reference,
ft.transactiontype,
CASE ftd.isdebit WHEN 1 THEN MAX(ftd.debitaccountid) ELSE MAX(ftd.creditaccountid) END as financialaccountname,
CAST(COUNT(0) as tinyint) as totaldetailrecords,
ftd.isdebit,
SUM(ftd.amount) as amount,
balance.amount as balance
FROM unit_transaction_details ftd
JOIN unit_transactions ft ON ft.transactionid = ftd.transactionid
JOIN
(
SELECT DISTINCT
a.transactionid,
SUM(CASE b.isdebit WHEN 1 THEN b.amount ELSE -ABS(b.amount) END) as amount
--SUM(b.debit-b.credit) as amount
FROM unit_transaction_details a
JOIN unit_transactions ft ON ft.transactionid = a.transactionid
CROSS JOIN unit_transaction_details b
JOIN unit_transactions ft2 ON ft2.transactionid = b.transactionid
WHERE (ft2.date <= ft.date)
AND ft.unitid = 1
AND ft2.unitid = 1
AND a.masterentity = 'CONDO-A'
GROUP BY a.transactionid,a.amount
) balance ON balance.transactionid = ft.transactionid
WHERE
ft.unitid = 1
AND ftd.isactive = 1
GROUP BY
ft.transactionid,
ft.date,
ft.reference,
ft.transactiontype,
ftd.isdebit,
balance.amount
ORDER BY ft.date DESC
The result of the query is this:
Any clue on how to perform a correct SQL that will show me the right balances ordered by transaction date in descendant mode?
Thanks a lot.
EDIT: THINK OF 2 POSSIBLE SOLUTIONS
The problem is generated when you have the same date in 2 transactions, so here is what Im going to do:
Save Date and Time into "date" column. That way there won't be 2 exact dates.
OR
Create a "priority" column and set the priority for each record. So if I found that the date already exists and it has priority = 1 then the current priority will be 2.
What do you think?

There are two ways to do a running sum. I am going to show the syntax on a simpler table, to give you an idea.
Some databases (Oracle, PostgreSQL, SQL Server 2012, Teradata, DB2 for instance) support cumulative sums directly. For this you use the following function:
select sum(<val>) over (partition by <column> order by <ordering column>)
from t
This is a windows function that will calculate the running sum of for each group of records identified by . The order of the sum is .
Alas, many databases don't support this functionality, so you would need to do a self join to do this in a single SELECT query in the database:
select t.column, sum(tprev.<val>) as cumsum
from t left join
t tprev
where t.<column> = tprev.<column> and
t.<ordering column> >= tprev.<ordering column>
group by t.column
There is also the possibility of creating another table and using a cursor to assign the cumulative sum, or of doing the sum at the application level.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Clean up 'duplicate' data while preserving most recent entry - sql

Related

TSQL "where ... group by ..." issue that needs solution like "having ..."

Oracle: select just last update of date

SQL - Count new entries based on last date

Unpivot date columns to a single column of a complex query in Oracle

How to perform running sum (balance) in SQL

Categories

Resources