Optimized query to get the most recent records for each person in a log table - sql

Is there a better way to setup this query?
select *
from peopleLog
where index in (select max(index) index from peopleLog group by personID)
I have a log table that is inserted into every time a person's record is updated. The records are added, not replaced. This means that the table contains multiple records for each person. I want to pull the most resent record for each person from this table. The table has about 30 fields, so I don't think that grouping would be the best option, but I might be wrong.
The index field is the identity field in this SQL Server table, and is set to auto increment.
personID is the person's identification number and is unique the person.

you can use window function instead:
select * from (
select * , row_number() over (partition by personID order by index desc) rn
from peopleLog
) t where t.rn = 1
also an index on personId and "index" column would help)

Often a correlated subquery has the best performance:
select pl.*
from peopleLog pl
where pl.index = (select max(pl2.index)
from peopleLog pl2
where pl2.personID = pl.personID
);
In particular, this can take advantage of an index on peopleLog(personID, index).

Related

How can I make selection based on conditions on SQL?

There is a table based on ID an those ID's status keys:
The table
I need to write query that will bring higher status key of the same ID. For example; query will bring only the row with status key number 9 for ID number 123. But it will bring the row with status key number 2 for ID number 156.
Hope I managed to explain myself clearly. Please help me with this query.
Use max() aggregation
select id, max(status_key)
from tablename
group by id
You didn't tag your backend, this would work with many backends and older versions of many backends (assuming you have other columns too in your table - otherwise do only group by):
select myTable.*
from myTable
inner join
(select id, max(statusKey) as statusKey
from myTable
group by id) tmp on myTable.id = tmp.id and myTable.statusKey = tmp.statusKey;

Delete specific record from multiple duplicates in the table

How do I delete specific record from multiple duplicates
below is the table for eg
This is just one of the example and we have many cases like this. From this table I need to delete rank 2 and 3.
Kindly suggest me best way to identify duplicate records and delete the specific rows
This should work
delete
from <your table> t
where rank != (select top(rank)
from <your table> tt
where tt.emp_id = t.emp_id
order by rank desc --put asc if you want to keep the lowest rank
)
group by t.emp_id
I do not encourage record deleting but this solution can help with expiring records or deleting them:
The table should have a unique ID and a field that allows you to identify that the record has been expired. If it does not, I recommend adding it to the table. You can creating a composite ID in your query but down the road you will wish you had these attributes.
Create a query that identifies every record where the RANK <> 1. This will be your subquery.
Write your UPDATE query
UPDATE A
SET [EXPIRE_DTTM] = GETDATE()
FROM *TableNameWithTheRecords* A
INNER JOIN (*SubQuery*) B ON A.UniqueID = B.UniqueID
**If you truly want to delete the records, use this:
DELETE FROM *TableNameWithTheRecords*
WHERE *UniqueID* = (SELECT *UniqueID* FROM *TableNameWithTheRecords* WHERE RANK <> 1)
WITH tbl_alias AS
(
SELECT emp_ID,
RN = ROW_NUMBER() OVER(PARTITION BY emp_ID ORDER BY emp_ID)
FROM tblName
)
DELETE FROM tbl_alias WHERE RN > 1

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

How to optimize SELECT some_field, max(primary_key) FROM table GROUP BY some_field

I have SQL query in SQL Azure:
SELECT some_field, max(primary_key) FROM table GROUP BY some_field
Table has currently over 6 million rows. Index on (some_field asc, primary_key desc) is created. primary_key field is incremental. There is about 700 distinct values of some_field. This select takes at least 30 seconds.
There are only inserts into this table, no updates or deletes.
I can create separate table to store some_field and maximal value of primary key and write trigger to build it, but I am looking for more elegant solution. Is there any?
Dont know if this will be performant but you you can give it a shot...
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY some_field ORDER BY primary_key DESC) AS rn
FROM table
)
SELECT *
FROM cte
WHERE rn = 1
Definitely do the secondary table of "somefield" and "highestPK" columns that is indexed on the "somefield" column. Build that once up front as a baseline and use that.
Then, whenever any new records are inserted into your 6 million record table, have a simple trigger to update your secondary table with something as simple as..
update SecondaryTable
set highestPK = newlyInsertedPKID
where somefield = newlyInsertedSomeFieldValue
This way, it stays updated with every insert as the highest PK for your "somefield" column will qualify, and if no update is available, insert into the secondary table with the new "somefield" value.

SQL - Group By unique column combination

I am trying to write a script that will return the latest values for a unique documentid-physician-patient triplet. I need the script to act similar to a group by statement, except group by only works with one column at a time. I need to date and status information for only the most recent unique triplet. Please let me know what you will need to see from me to help. Here is the current, very bare, statement:
SELECT
TransmissionSend.CreateTimestamp,
TransmissionSendItem.Status,
TransmissionSendItem.PhysicianId,
TransmissionSendItem.DocumentIdDisplay,
Utility.SqlFunctions_NdnListToAccountList(TransmissionSendItem.NdocNum) AS AccountNum
FROM
Interface_SFAX.TransmissionSend,
Interface_SFAX.TransmissionSendItem
WHERE
TransmissionSend.ID = TransmissionSendItem.childsub --I don't know exactly what this does, I did not write this script. It must stay here though for the exact results.
ORDER BY TransmissionSend.CreateTimestamp DESC -- In the end, each latest result of the unique triplet will be ordered from most recent to oldest in return
My question is, again, how can I limit results to only the latest status for each physician id, document id, and account number combination?
First select the MAX(date) with the documentid GROUP BY documentid then select all data from the table by the first select result for example with an inner join.
SELECT table.additionalData, J.id, J.date
FROM table
INNER JOIN (SELECT id, MAX(date) AS date
FROM table GROUP BY id) AS J
ON J.id = table.id
AND J.date /* this is the max date */ = table.date