Select last unique polymorphic objects ordered by created at in Rails - sql

I'm trying to get unique polymorphic objects by the value of one of the columns. I'm using Postgres.
The object has the following properties: id, available_type, available_id, value, created_at, updated_at.
I'm looking to get the most recent object per available_id (recency determined by created_at) for records with the available_type of "User".
I've been trying ActiveRecord queries like this:
Service.where(available_type: "User").order(created_at: :desc).distinct(:available_id)
But it isn't limiting to one per available_id.

Try
Service.where(id: Service
.where(available_type: "User")
.group(:available_id)
.maximum(:id).values)

Using a left join is going to be your probably most efficient way
The following sql selects only rows where there are no rows with a larger created_at.
See this post for more info: https://stackoverflow.com/a/27802817/5301717
query = <<-SQL
SELECT m.* # get the row that contains the max value
FROM services m # "m" from "max"
LEFT JOIN services b # "b" from "bigger"
ON m.available_id = b.available_id # match "max" row with "bigger" row by `home`
AND m.available_type = b.available_type
AND m.created_at < b.created_at # want "bigger" than "max"
WHERE b.created_at IS NULL # keep only if there is no bigger than max
AND service.available_type = 'User'
SQL
Service.find_by_sql(query)
distinct doesn't take a column name as an argument, only true/false.
distinct is for returning only distinct records and has nothing to do with filtering for a specific value.
if you need a specific available_id, you need to use where
e.g.
Service.distinct.where(available_type: "User").where(available_id: YOUR_ID_HERE).order(created_at: :desc)
to only get the most recent add limit
Service.distinct.where(available_type: "User").where(available_id: YOUR_ID_HERE).order(created_at: :desc).limit(1)
if you need to get the most recent of each distinct available_id, that will require a loop
first get the distinct polymorphic values by only selecting the columns that need to be distinct with select:
available_ids = Service.distinct.select(:available_id).where(available_type: 'User')
then get the most recent of each id:
recents = []
available_ids.each do |id|
recents << Service.where(available_id: id).where(available_type: 'User').order(created_at: :desc).limit(1)
end

Related

How can I find an associated table oldest record filtering by one of it's attributes?

I have the model Subscription which has_many Versions.
A Version has a status, plan_id and authorized_at date.
Any changes made to a Subscription comes from a Version modifications updating it's parent Subscription.
The goal is to find each Subscription's Version with the oldest authorized_at date WHERE the versions.plan_id is the same as the subscriptions.plan_id (in other words I need the authorization date of the Version where the plan_id changed to the current Subscription's plan_id).
This is the query I've come up with. I'm getting an error in the aggregate function syntax:
syntax error at or near "MIN" LINE 3: MIN (authorized_at) from versions ^
query:
select subscriptions.id,
MIN (authorized_at) from versions
where versions.plan_id = subscriptions.plan_id
) as current_version
from subscriptions
join versions on subscriptions.id = versions.subscription_id
where versions.status = 'processed'
I also am not sure if I should be grouping the versions by plan_id and then picking from each group. I'm kind of lost.
You can use a lateral subquery which can best be described as a foreach loop in SQL. They are an extremly performant way to select columns from a single correlated record or even aggregates from a group of related records.
For each row in subscriptions the DB will select a single row from versions ordered by authorized_at:
SELECT "subscriptions".*,
"latest_version"."authorized_at" AS current_version,
"latest_version"."id" AS current_version_id -- could be very useful
FROM "subscriptions"
LATERAL
(
SELECT "versions"."authorized_at", "versions"."id"
FROM "versions"
WHERE "versions"."subscription_id" = "subscriptions"."id" -- lateral reference
AND "versions"."plan_id" = "subscriptions"."plan_id"
AND "versions"."status" = 'processed'
ORDER BY "versions"."authorized_at" ASC
LIMIT 1
) latest_version ON TRUE
Creating lateral joins in ActiveRecord can be done either with SQL strings or Arel:
class Subscription < ApplicationRecord
# Performs a lateral join and selects the
# authorized_at of the latest version
def self.with_current_version
lateral = Version.arel_table.then do |v|
v.project(
v[:authorized_at],
v[:id] # optional
).where(
v[:subscription_id].eq(arel_table[:id])
.and(v[:plan_id].eq(arel_table[:plan_id]) )
.and(v[:status].eq('processed'))
)
.order(v[:authorized_at].asc)
.take(1) # limit 1
.lateral('latest_version ON TRUE')
end
lv = Arel::Table.new(:latest_version) # just a table alias
select(
*where(nil).arel.projections, # selects everything previously selected
lv[:authorized_at].as("current_version"),
lv[:id].as("current_version_id") # optional
).joins(lateral.to_sql)
end
end
If you just want to select the id and current_version column you should consider using pluck instead of selecting database models that aren't properly hydrated.
You can use DISTINCT ON to filter out rows, and keep a single one per subscription -- the first one per group according to the ORDER BY clause.
For example:
select distinct on (s.id) s.id, v.authorized_at
from subscription s
join versions v on v.subscription_id = s.id and v.plan_id = s.plan_id
where v.status = 'processed'
order by s.id, v.authorized_at
Below code will give you the versions where Version's plan_id is equal to Subscription's plan_id.
#versions = Version.joins("LEFT JOIN subscriptions ON subscriptions.plan_id = versions.plan_id")
To filter records by Version's status
#versions = Version.joins("LEFT JOIN subscriptions ON subscriptions.plan_id = versions.plan_id").where(status: "processed")
To filter records by Version's status and order by authorized_at in ascending order.
#versions = Version.joins("LEFT JOIN subscriptions ON subscriptions.plan_id = versions.plan_id").where(status: "processed").order(:authorized_at)
To filter records by Version's status and order by authorized_at in decending order.
#versions = Version.joins("LEFT JOIN subscriptions ON subscriptions.plan_id = versions.plan_id").where(status: "processed").order(authorized_at: :desc)
Hope this works for you!

Access 2010 Find Most Recent Status via SQL

Need list of "ALL" document numbers (k002) with their the most recent maintenance date (lm01_s) and status_code. The code below finds last date from the entire table and any record with that date. This is not what I need. There is only one table. If I drop the status_code from the equation, this is easy.
SELECT k002, lm01_s, status_code
FROM stat_trans
WHERE (lm01_s = ANY (SELECT MAX(lm01_s) FROM stat_trans)) ORDER BY lm01_s;
I have also tried this ...
SELECT k002, lm01_s, advice_code
FROM romis_stat_trans
WHERE lm01_s IN (((SELECT Max(lm01_s) FROM romis_stat_trans GROUP BY k002)));
I have tried so many things that I forget what I have tried. Everything has been a dead end.
Use a subquery in the where clause to return only the records where lm01_s is equal to the max lm01_s. I found it's important to use a table alias or else Access will confuse the fields.
select k002,
lm01_s,
status_code
from stat_trans
where lm01_s=(select max(sc.lm01_s)
from stat_trans as sc
where sc.k002 = stat_trans.k002)

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

NHibernate Return Values

I am currently working on a project using NHiberate as the DAL with .NET 2.0 and NHibernate 2.2.
Today I came to a point where I had to join a bunch of entities/collections to get what I want. That is fine.
What got me was that I do not want the query to return a list of objects of a certain entity type but rather the result would include various properties from different entities.
The following query is not what I am doing but it is kind of query that I am talking about here.
select order.id, sum(price.amount), count(item)
from Order as order
join order.lineItems as item
join item.product as product,
Catalog as catalog
join catalog.prices as price
where order.paid = false
and order.customer = :customer
and price.product = product
and catalog.effectiveDate < sysdate
and catalog.effectiveDate >= all (
select cat.effectiveDate
from Catalog as cat
where cat.effectiveDate < sysdate
)
group by order
having sum(price.amount) > :minAmount
order by sum(price.amount) desc
My question is, in this case what type result is supposed to be returned? It is certainly not of type Order, neither is of type LineItems.
Thanks for your help!
John
you can always use List of object[] for returning data and it will work fine.
This is called a projection, and it happens any time you specify an explicit select clause that contains rows from various tables (or even aggregate / summary data from a single table).
Using LINQ you can create anonymous objects to store these rows of data, like this:
var crunchies = (from foo in bar
where foo.baz == quux
select new { foo.corge, foo.grault }).ToList();
Then you can do crunchies[0].corge for example to pull out the rows & columns.
If you are using NHibernate.Linq this will "just work".
If you're using HQL or Criteria API, then what Fahad mentioned will work. You'll get a List<object[]> as a result, and the index of the array references the order of the columns that you returned in your select clause.

MySQL to return only last date / time record

We have a database that stores vehicle's gps position, date, time, vehicle identification, lat, long, speed, etc., every minute.
The following select pulls each vehicle position and info, but the problem is that returns the first record, and I need the last record (current position), based on date (datagps.Fecha) and time (datagps.Hora). This is the select:
SELECT configgps.Fichagps,
datacar.Ficha,
groups.Nombre,
datagps.Hora,
datagps.Fecha,
datagps.Velocidad,
datagps.Status,
datagps.Calleune,
datagps.Calletowo,
datagps.Temp,
datagps.Longitud,
datagps.Latitud,
datagps.Evento,
datagps.Direccion,
datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
Group by Fichagps;
I need same result I'm getting, but instead of the older record I need the most recent
(LAST datagps.Fecha / datagps.Hora).
How can I accomplish this?
Add ORDER BY datagps.Fecha DESC, datagps.Hora DESC LIMIT 1 to your query.
I'm not sure why you are having any problems with this as Lex's answers seem good.
I would start putting ORDER BY's in your query so it puts them in an order, when it's showing the record you want as the first one in the list, then add the LIMIT.
If you want the most recent, then the following should be good enough:
ORDER BY datagps.Fecha DESC, datagps.Hora DESC
If you simply want the record that was added to the database most recently (irregardless of the date/time fields), then you could (assuming you have an auto-incremental primary key in the datagps table (I assume it's called dataID for this example)):
ORDER BY datagps.dataID DESC
If these aren't showing the data you want - then there is something missing from your example (maybe data-types aren't DATETIME fields? - if not - then maybe a CONVERT to change them from their current type before ORDERing BY would be a good idea)
EDIT:
I've seen the screenshot and I'm confused as to what the issue is still. That appears to be showing everything in order. Are you implying that there are many more than 5 records? How many are you expecting?
Do you mean: for each record returned, you want the one row from the table datagps with the latest date and time attached to the result? If so, how about this:
# To show how the query will be executed
# comment to return actual results
EXPLAIN
SELECT
configgps.Fichagps, datacar.Ficha, groups.Nombre, datagps.Hora, datagps.Fecha,
datagps.Velocidad, datagps.Status, datagps.Calleune, datagps.Calletowo,
datagps.Temp, datagps.Longitud, datagps.Latitud, datagps.Evento,
datagps.Direccion, datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
########### Add this section
LEFT JOIN datagps b ON (
configgps.Fichagps = b.Fichagps
# wrong condition
#AND datagps.Hora < b.Hora
#AND datagps.Fecha < b.Fecha)
# might prevent indexes to be used
AND (datagps.Fecha < b.Fecha OR (datagps.Fecha = b.Fecha AND datagps.Hora < b.Hora))
WHERE b.Fichagps IS NULL
###########
Group by configgps.Fichagps;
Similar question here only that that one uses outer joins.
Edit (again):
The conditions are wrong so corrected it. Can you show us the output of the above EXPLAIN query so we can pinpoint where the bottle neck is?
As hurikhan77 said, it will be better if you could convert both of the the columns into a single datetime field - though I'm guessing this would not be possible for your case (since your database is already being used?)
Though if you can convert it, the condition (on the join) would become:
AND datagps.FechaHora < b.FechaHora
After that, add an index for datagps.FechaHora and the query would be fast(er).
What you probably want is getting the maximum of (Fecha,Hora) per grouped dataset? This is a little complicated to accomplish with your column types. You should combine Fecha and Hora into one column of type DATETIME. Then it's easy to just SELECT MAX(FechaHora) ... GROUP BY Fichagps.
It could have helped if you posted your table structure to understand the problem.