How to pull the most recent values from a T-SQL table - sql

I have a database table that I need to process with either a view or a stored procedure or something else that gives me a result based on the live data.
The table holds records of people with data associated with each one. The thing is that people can be in the table more than once. Each record shows a time when one or more pieces of information was recorded for an individual.
The identifier field for the people is cardholder_index. I need to take a DISTINCT list of that field. There is also a date field called bio_complete_date. What I need to do is, for all the other fields in the table, take the most recent non-null (or possibly non-zero) value.
For instance, there is a bmi field. For each distinct cardholder index, I need to take the most recent (by the bio_complete_date field) non-null bmi for that cardholder_index. But there's also a body_fat field, and I need to take the most recent non-null value in that field, which might not necessarily be the same row as the most recent non-null bmi value.
For the record, the table itself does have its own unique identifier column, bio_id, if that helps.
I don't need to show when the most recent piece of information was taken. I just need to show the data itself.
I figure I need to do a distinct on the card_holder index, and then join to it the result sets of querys for each other field. It's writing the subqueries that is giving me problems.

From your description I guess your table looks something like this:
create table people (
bio_id int identity(1,1),
cardholder_index int,
bio_complete_date date,
bmi int,
body_fat int
)
If so, one way (of many) to do the query would be to use correlated queries to pull the latest non-null value for the cardholder_index, either using subqueries like this:
select
cardholder_index,
(
select top 1 bmi
from people
where cardholder_index = p.cardholder_index and bmi is not null
order by bio_complete_date desc
) as latest_bmi,
(
select top 1 body_fat
from people
where cardholder_index = p.cardholder_index and body_fat is not null
order by bio_complete_date desc
) as latest_body_fat
from people p
group by cardholder_index
or to use the apply operator like this:
select cardholder_index, latest_bmi.bmi, latest_body_fat.body_fat
from people p
outer apply (
select top 1 bmi
from people
where cardholder_index = p.cardholder_index and bmi is not null
order by bio_complete_date desc
) as latest_bmi
outer apply (
select top 1 body_fat
from people
where cardholder_index = p.cardholder_index and body_fat is not null
order by bio_complete_date desc
) as latest_body_fat
group by cardholder_index, latest_bmi.bmi, latest_body_fat.body_fat
Sample SQL Fiddle demo

Related

How to group by one column and limit to rows where another column has the same value for all rows in group?

I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
Updated demo with tests and more rows.
If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid's from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:
create table userinteractions_whitelist as
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1

Get distinct information across many fields some of which are NULL

I have a table with just over 65 million rows and 140 columns. The data comes from several sources and is submitted at least every month.
I look for a quick way to grab specific fields from this data only where they are unique. Thing is, I want to process all the information to link which invoice was sent with which identifying numbers and it was sent by whom. Issue is, I don't want to iterate over 65 million records. If I can get distinct values, then I will only have to process say 5 million records as opposed to 65 million. See below for a description of the data and SQL Fiddle for a sample
If say a client submits an invoice_number linked to passport_number_1, national_identity_number_1 and driving_license_1 every month, I only want one row where this appears. i.e. the 4 fields have got to be unique
If they submit the above for 30 months then on the 31st month they send the invoice_number linked to passport_number_1, national_identity_number_2 and driving_license_1, I want to pick this row also since the national_identity field is new hence the whole row is unique
By linked to I mean they appear on the same row
For all fields its possible to have Null occurring at one point.
The 'pivot/composite' columns are the invoice_number and
submitted_by. If any of those aren't there, drop that row
I also need to include the database_id with the above data. i.e.
the primary_id which is auto generated by the postgresql database
The only fields that don't need to be returned are the other_column
and yet_another_column. Remember the table has 140 columns so don't
need them
With the results, create a new table that will hold this unique
records
See this SQL fiddle for an attempt to recreate the scenario.
From that fiddle, I'd expect a result like:
Row 1, 2 & Row 11: Only one of them shall be kept as they are exactly the
same. Preferably the row with the smallest id.
Row 4 and Row 9: One of them would be dropped as they are exactly the
same.
Row 5, 7, & 8: Would be dropped since they are missing either the
invoice_number or submitted_by.
The result would then have Row (1, 2 or 11), 3, (4 or 9), 6 and 10.
To get one representative row (with additional fields) from a group with the four distinct fields:
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
;
Note that it is unpredictable which row exactly is returned unless you specify an ordering (documentation on distinct)
Edit:
To order this result by id simply adding order by id to the end doesn't work, but it can be done by eiter using a CTE
with distinct_rows as (
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
-- ...
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
)
select *
from distinct_rows
order by id;
or making the original query a subquery
select *
from (
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
-- ...
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
) t
order by id;
quick way to grab specific fields from this data only where they are unique
I don't think so. I think you mean you want to select a distinct set of rows from a table in which they are not unique.
As far as I can tell from your description, you simply want
SELECT distinct invoice_number, passport_number,
driving_license_number, national_id_number
FROM my_table
where invoice_number is not null
and submitted_by is not null;
In your SQLFiddle example, that produces 5 rows.

Selecting the last entry in sql database for each id field

Hi all I am using SQL server.
I have one table that has a whole list of details on cars and events that have happened with those cars.
What I need is to be able to pick out the last entry for each vehicle based on their (Reg_No) registration number.
I have the following to work with
Table name = UnitHistory
Columns = indx (This is just the primary key, with increment)
Transdate(This is my date time column) and have Reg_No (Unique to each vehicle) .
There are about 45 vehicles with registration numbers if that helps?
I have looked at different examples but they all seem to have another table to work with.
Please help me. Thanks in advance for the help
WITH cte
AS
(
SELECT *,
ROW_NUMBER() OVER
(
PARTITION BY Reg_No
ORDER BY Transdate DESC
) AS RowNumber
FROM unithistory
)
SELECT *
FROM cte
WHERE RowNumber = 1
If you only need the index and the transdatem and they are both incremental (I am assuming that a later date corresponds to a higher index number) then the simplest query would be:
SELECT Reg_No, MAX(indx), MAX(Transdate)
FROM UnitHistory
GROUP BY Reg_No
If you want all data for a known Reg_No, you can use Dd2's answer
If you want a list of all Reg_No's with thier data, you will need a subquery

How do I check if all posts from a joined table has the same value in a column?

I'm building a BI report for a client where there is a 1-n related join involved.
The joined table has a field for employee ID (EmplId).
The query that I've built for this report is supposed to give a 1 in its field "OneEmployee" if all the related posts have the same employee in the EmplId field, null if it's different employees, i.e:
TaskTrans
TaskTransHours > EmplId: 'John'
TaskTransHours > EmplId: 'John'
This should give a 1 in the said field in the query
TaskTrans
TaskTransHours > EmplId: 'John'
TaskTransHours > EmplId: 'George'
This should leave the said field blank
The idea is to create a field where a case function checks this and returns the correct value. But my problem is whereas there is a way to check for this through SQL.
select not count(*) from your_table
where employee_id = GIVEN_ID
and your_field not in ( select min(your_field)
from your_table
where employee_id = GIVEN_ID);
Note: my first idea was to use LIMIT 1 in the inner query, but MYSQL didn't like it, so min it was - the points to use any, but only one. Min should work, but the field should be indexed, then this query will actually execute rather fast, as only indexes would be used (obviously employee_id should also be indexed).
Note2: Do not get too confused with not in front of count(*), you want 1 when there is none that is different, I count different ones, and then give you the not count(*), which will be one if count is 0, otherwise 0.
Seems a job for a window COUNT():
SELECT
…,
CASE COUNT(DISTINCT TaskTransHours.EmplId) OVER () WHEN 1 THEN 1 END
AS OneEmployee
FROM …

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.
Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.
Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC
I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.
But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK
Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.
In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1
Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well
If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.
I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)