Have a table in postgres(called ledger) that keeps data about some keywords with a structure like below:
- id
- keyword
- updatedAt
- createdAt
- ...other details
A single keyword may be in multiple rows, want a query such that it will return only the recent record to a keyword. In other words, will want to select all from ledger but with each keyword appearing once(only the very recent update)
Use distinct on.
You did not mention what is the initial value of the updatedat column. Assuming it is null:
select distinct on (keyword)
keyword,
coalesce(updatedat, createdat) as last_change_time,
other_column
from ledger
order by keyword, last_change_time desc
Related
I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
Updated demo with tests and more rows.
If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid's from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:
create table userinteractions_whitelist as
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1
I have a table with 500,000+ rows and the following columns:
Symbol, ExternalCode, ExternalCodeType, StartDate
Symbol should be unique but it's not.
There are a handful of rows (~60) that have the same value for Symbol but have a different ExternalCode+StartDate pair.
I want to create a table of uniques so that, when there are multiple entries for the same Symbol, I only take the one with the most recent StartDate.
Is there a simple/elegant way to do this?
In SQL-Server this can be solved without JOINing.
Try this:
SELECT *
FROM (SELECT SYMBOL,
STARTDATE,
EXTERNALCODE,
EXTERNALCODETYPE,
Row_number()
OVER (
PARTITION BY SYMBOL
ORDER BY STARTDATE DESC) RN
FROM TABLENAME) T
WHERE T.RN = 1
The ROW_NUMBER function starts a new series of 'ID's ordered by date (so that the latest always equals 1) and partitioned by Symbol, so that each symbol has it's own set of IDs.
Hope the answer is clear.
i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.
Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.
Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC
I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.
But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK
Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.
In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1
Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well
If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.
I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)
Given the following records (the first row being the column names):
name platform other_columns date
Eric Ruby something somedate
Eric Objective-C something somedate
Joe Ruby something somedate
How do I retrieve a singular record with all columns, such that the name column is always unique in the results set? I would like the query in this example to return the first Eric (w/ Ruby) record.
I think the closest I've gotten is to use "select distinct on (name) *...", but that requires me to order by name first, when I actually want to order the records by the date column.
Order records by date
If there are multiple records with the same name, select one (which does not matter)
Select all columns
How do I achieve this in Rails on PostgreSQL?
You can't do a simple .group(:name) because that produces a GROUP BY name in your SQL when you'll be selecting ungrouped and unaggregated columns, that leaves ambiguity as to which row to pick and PostgreSQL (rightly IMHO) complains:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
If you start adding more columns to your grouping with something like this:
T.group(T.columns.collect(&:name))
then you'll be grouping by things you don't want to and you'll end up pulling out the whole table and that's not what you want. If you try aggregating to avoid the grouping problem, you'll end up mixing different rows (i.e. one column will come from one row while another column will come from some other row) and that's not what you want either.
ActiveRecord really isn't built for this sort of thing but you can bend it to your will with some effort.
You're using AR so you presumably have an id column. If you have PostgreSQL 8.4 or higher, then you could use window functions as a sort of localized GROUP BY; you'll need to window twice: once to figure out the name/thedate pairs and again to pick just one id (just in case you have multiple rows with the same name and thedate which match the earliest thedate) and hence get a unique row:
select your_table.*
from your_table
where id in (
-- You don't need DISTINCT here as the IN will take care of collapsing duplicates.
select min(yt.id) over (partition by yt.name)
from (
select distinct name, min(thedate) over (partition by name) as thedate
from your_table
) as dt
join your_table as yt
on yt.name = dt.name and yt.thedate = dt.thedate
)
Then wrap that in a find_by_sql and you have your objects.
If you're using Heroku with a shared database (or some other environment without 8.4 or higher), then you're stuck with PostgreSQL 8.3 and you won't have window functions. In that case, you'd probably want to filter out the duplicates in Ruby-land:
with_dups = YourTable.find_by_sql(%Q{
select yt.*
from your_table yt
join (select name, min(thedate) as thedate from your_table group by name) as dt
on yt.name = dt.name and yt.thedate = dt.thedate
});
# Clear out the duplicates, sorting by id ensures consistent results
unique_matches = with_dups.sort_by(&:id).group_by(&:name).map { |x| x.last.first }
If you're pretty sure that there won't be duplicate name/min(thedate) pairs then the 8.3-compatible solution might be your best bet; but, if there will be a lot of duplicates, then you want the database to do as much work as possible to avoid creating thousands of AR objects that you're just going to throw away.
Maybe someone else with stronger PostgreSQL-Fu than me will come along and offer something nicer.
I you don't care for which row is retrieved when multiple names are there (this will be true for all columns) and the table has that structure you can simply do a query like
SELECT * FROM table_name GROUP BY `name` ORDER BY `date`
or in Rails
TableClass.group(:name).order(:date)
Get a list of names and minimum dates, and join that back to the original table to get the rowset you're looking for.
select
b.*
from
(select name, min(date) as mindate from table group by name) a
inner join table b
on a.name = b.name and a.mindate = b.date
I know this question is 8 years old. Current ruby version is 2.5.3. 2.6.1 is released. Rails stable version is 5.2.2. 6.0.0 beta2 is released.
Lets name your table Person.
Person.all.order(:date).group_by(&:name).map{|p| p.last.last}
Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}
Explanation: First get all records in person table. Then sorted by date (descending or ascending) and then group by name (record with duplicate name will be grouped).
Person.all.order(:date).group_by(&:name)
This returns hash.
{"Eric" => [#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">], "Joe" => [#<Person id: 3, name: "Joe", other_fields: "">]}
Solution 1: .map method.
Person.all.order(:date).group_by(&:name).map{|p| p.last.last}
We got hash. We loop that as array. p.last will give
[[#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">],[#<Person id: 3, name: "Joe", other_fields: "">]]
Get first or last record of nested array using p.last.first or p.last.last.
Solution 2: .collect or .each method.
Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}
I have a complex query and which may return more than one record per group. There is a field that has a numeric sequential number. If in a group there is more than one record returned I just want the record with the highest sequential number.
I’ve tried using the SQL MAX function, but if I try to add more than one field it returns all records, instead of the one with the highest sequential field in that group.
I am trying to accomplish this in MS Access.
Edit: 4/5/11
Trying to create a table as an example of what I am trying to do
I have the following table:
tblItemTrans
ItemID(PK)
Eventseq(PK)
ItemTypeID
UserID
Eventseq is a number field that increments for each ItemID. (Don’t ask me why, that’s how the table was created.) Each ItemID can have one or many Evenseq’s. I only need the last record (max(Eventseq)) PER each ItemTypeID.
Hope this helps any.
SELECT A.*
FROM YourTable A
INNER JOIN (SELECT GroupColumn, MAX(SequentialColumn) MaxSeq
FROM YourTable
GROUP BY GroupColumn) B
ON A.GroupColumn = B.GroupColumn AND A.SequentialColumn = B.MaxSeq
If your SequentialNumber is an ID (unique across the table), then you could use
select *
from tbl
where seqnum in (
select max(seqnum) from tbl
group by groupcolumn)
If it is not, an alternative to Lamak's query is the Access domain function DMAX
select *
from tbl
where seqnum = DMAX("seqnum", "tbl", "groupcolumn='" & groupcolumn & "'")
Note: if the groupcolumn is a date, use # instead of single quotes ' in the above, if it is a numeric, remove the single quotes.