Rails Active Record group query in SQLite vs Postgres - sql

I am getting an error when running a query in SQLite vs Postgres with Active Record. There is an aggregate function error in Postgres but nothing in SQLite. My question is, what is SQLite doing here? Is it ignoring the group? Notice that GROUP BY "song" doesn't return an error.
SQLite output:
Playlist.where(user_id:user.id).group(:song)
Playlist Load (0.3ms) SELECT "playlists".* FROM "playlists" WHERE
"playlists"."user_id" = ? GROUP BY "song" [["user_id", 1]]
=> #<ActiveRecord::Relation [#<Playlist id: 2, user_id: 1, song_id: 1,
created_at: "2017-04-27 01:18:01", updated_at: "2017-04-27 01:18:01">]>
Playlist.where(user_id:user.id).group(:song_id)
Playlist Load (0.4ms) SELECT "playlists"."id", "playlists"."user_id",
"playlists"."song_id" FROM "playlists" WHERE "playlists"."user_id" = ?
GROUP BY "song" [["user_id", 1]]
=> #<ActiveRecord::Relation [#<Playlist id: 2, user_id: 1, song_id:
1, created_at: "2017-04-27 01:18:01">]>
Playlist.where(user_id:user.id).group(:id,:song_id)
Playlist Load (0.2ms) SELECT "playlists".* FROM "playlists" WHERE
"playlists"."user_id" = ? GROUP BY "playlists"."id",
"playlists"."song_id" [["user_id", 1]]
=> #<ActiveRecord::Relation [#<Playlist id: 1, user_id: 1, song_id: 1,
created_at: "2017-04-27 01:18:00", updated_at: "2017-04-27 01:18:00">,
#<Playlist id: 2, user_id: 1, song_id: 1, created_at: "2017-04-27
01:18:01", updated_at: "2017-04-27 01:18:01">]>
Playlist.where(user_id:user.id).group(:song).count
(0.2ms) SELECT COUNT(*) AS count_all, "playlists"."song_id" AS
playlists_song_id FROM "playlists" WHERE "playlists"."user_id" = ?
GROUP BY "playlists"."song_id" [["user_id", 1]]
Song Load (2.1ms) SELECT "songs".* FROM "songs"
WHERE "songs"."id" = ? LIMIT 1 [["id", 1]]
=> {#<Song id: 1, title: "A song", artist: "Artist", user_id: 1,
created_at: "2017-04-27 01:17:39">=>2}
Postgres:
Playlist.where(user_id:user.id).group(:song_id)
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column
"playlists.id" must appear in the GROUP BY clause or be used in an
aggregate function
SELECT "playlists".* FROM "playlists" WHERE "playlists"."user_id" = $1
GROUP BY "playlists"."song_id"
Playlist.where(user_id:user.id).group(:id,:song_id)
Playlist Load (0.6ms) SELECT "playlists".* FROM "playlists" WHERE
"playlists"."user_id" = $1 GROUP BY "playlists"."id",
"playlists"."song_id" [["user_id", 1]]
=> #<ActiveRecord::Relation [#<Playlist id: 1, user_id: 1, song_id: 1,
created_at: "2017-04-27 01:25:34", updated_at: "2017-04-27 01:25:34">,
#<Playlist id: 2, user_id: 1, song_id: 1, created_at: "2017-04-27
01:25:36", updated_at: "2017-04-27 01:25:36">]>
Playlist.where(user_id:user.id).group(:song).count
(0.5ms) SELECT COUNT(*) AS count_all, "playlists"."song_id" AS
playlists_song_id FROM "playlists" WHERE "playlists"."user_id" = $1
GROUP BY "playlists"."song_id" [["user_id", 1]]
Song Load (0.3ms) SELECT "songs".* FROM "songs" WHERE "songs"."id" =
$1 LIMIT 1 [["id", 1]]
=> {#<Song id: 1, title: "A song", artist: "Artist", user_id: 1,
created_at: "2017-04-27 01:25:26", updated_at: "2017-04-27
01:25:26">=>2}

A common rule for group by: you may select only columns listed in group by and aggregate functions. This approach supported by sql standard and will work in any sql database.
If you are selecting columns not listed in a group by clause, for example select * … group by song_id, it may work in some databases and will not work in others.
If you specify selected columns in your code it will work:
Playlist.
where(user_id:user.id).
select("song_id").
group(:song_id)
You may read details about group by in PostgreSQL documentation: https://www.postgresql.org/docs/current/static/sql-select.html#SQL-GROUPBY

Related

How do I insert and update array columns in Node-Postgres?

I have the following table in Postgres:
_id: integer, user_id: integer, items: Array
I wish to insert the following into the table:
1, 1, [{productId: 1, size: 'large', quantity: 5}]
Next I wish to update the row with the following:
1, 1, [{productId: 1, size: 'small', quantity: 3}]
How do I do this in node-postgres?
Pseudocode:
update cart
set items.quantity = 3
where cart._id = 1
and cart.items.product_id = 1
and cart.items.size='large'

Elixir, Ecto compare datetime in SQL query

I have a problem with ecto query. I have this function:
def get_critials() do
critical_time = DateTime.to_naive(Timex.shift(Timex.now, seconds: -600))
query = "SELECT d.*"
<> " FROM sc_devices AS d"
<> " INNER JOIN log_device_commands AS ldc ON ldc.device_id = d.id"
<> " WHERE ldc.inserted_at < timestamp '#{critical_time}'"
{:ok, result} = Ecto.Adapters.SQL.query(Repo, query, [], [:rows])
result.rows
end
What I want is to get all records from table sc_devices where column updated_at in log_device_commands is older than 600 seconds, but I get that output:
And I receive this output:
[
[1, "LAMP 1XX_1", "1.st Lamp on the corner", 1,
"6c7572e1-460f-43dd-b137-90c21d33525b", "XCA190SS2020DE", 3, 1, 1, 46.55472,
15.64667, true, nil, ~N[2020-11-12 20:32:22.000000],
~N[2020-11-12 20:32:22.000000], 2],
[1, "LAMP 1XX_1", "1.st Lamp on the corner", 1,
"6c7572e1-460f-43dd-b137-90c21d33525b", "XCA190SS2020DE", 3, 1, 1, 46.55472,
15.64667, true, nil, ~N[2020-11-12 20:32:22.000000],
~N[2020-11-12 20:32:22.000000], 2],
[1, "LAMP 1XX_1", "1.st Lamp on the corner", 1,
"6c7572e1-460f-43dd-b137-90c21d33525b", "XCA190SS2020DE", 3, 1, 1, 46.55472,
15.64667, true, nil, ~N[2020-11-12 20:32:22.000000],
~N[2020-11-12 20:32:22.000000], 2]
]
Any ideas how can I solve that?
You could use postgres CURRENT_TIMESTAMP - INTERVAL '600 seconds' instead of using an elixir variable inside the query.
Also, I see you commented that you want to filter by updated_at but your query is actually filtering by inserted_at.

Active record where.not returns empty array

Following command
Fight.last.fight_logs.where(item_id: nil)
generates sql:
Fight Load (0.3ms) SELECT "fights".* FROM "fights" ORDER BY "fights"."id" DESC LIMIT $1 [["LIMIT", 1]]
FightLog Load (0.2ms) SELECT "fight_logs".* FROM "fight_logs" WHERE "fight_logs"."fight_id" = $1 AND "fight_logs"."item_id" IS NULL LIMIT $2 [["fight_id", 27], ["LIMIT", 11]]
and returns:
#<ActiveRecord::AssociationRelation [#<FightLog id: 30, fight_id: 27, attack: 0, block: 0, item_id: nil, user_id: 1, damage: 11.0, created_at: "2017-11-02 16:20:55", updated_at: "2017-11-02 16:20:57">, #<FightLog id: 31, fight_id: 27, attack: 0, block: 0, item_id: nil, user_id: 20, damage: 3.0, created_at: "2017-11-02 16:20:57", updated_at: "2017-11-02 16:20:57">, #<FightLog id: 33, fight_id: 27, attack: 0, block: 0, item_id: nil, user_id: 1, damage: 1.0, created_at: "2017-11-02 16:21:40", updated_at: "2017-11-02 16:21:40">, #<FightLog id: 32, fight_id: 27, attack: 0, block: 0, item_id: nil, user_id: 20, damage: 7.0, created_at: "2017-11-02 16:21:33", updated_at: "2017-11-02 16:21:40">, #<FightLog id: 34, fight_id: 27, attack: 0, block: 0, item_id: nil, user_id: 1, damage: 12.0, created_at: "2017-11-02 16:21:47", updated_at: "2017-11-02 16:21:48">, #<FightLog id: 35, fight_id: 27, attack: 0, block: 0, item_id: nil, user_id: 20, damage: 14.0, created_at: "2017-11-02 16:21:48", updated_at: "2017-11-02 16:21:48">]>
but
Fight.last.fight_logs.where.not(item_id: 1)
generates sql:
Fight Load (1.0ms) SELECT "fights".* FROM "fights" ORDER BY "fights"."id" DESC LIMIT $1 [["LIMIT", 1]]
FightLog Load (0.8ms) SELECT "fight_logs".* FROM "fight_logs" WHERE "fight_logs"."fight_id" = $1 AND ("fight_logs"."item_id" != $2) LIMIT $3 [["fight_id", 27], ["item_id", 1], ["LIMIT", 11]]
and returns:
#<ActiveRecord::AssociationRelation []>
How it is possible? What i'm doing wrong?
You should specify NULL value in your query since you have it in your database:
Fight.last.fight_logs.where('item_id != ? OR item_id IS NULL', 1)
This is just how SQL works:
select 1 != NULL;
+-----------+
| 1 != NULL |
+-----------+
| NULL |
+-----------+
You can look at this answer to clarify the issue.
Also, I would recommend avoiding using default NULL values in your database, there is nice answer about it. You can simply use default: 0, null: false your case.

rethinkdb: secondary compound indexes / aggregation queries and intermediate documents generation

Let's assume such table content where for the same product_id, we have as many rows than updates during status==1 (published) and finally status==0 (unpublished) and then becomes==2 (deleted)
{id: <auto>, product_id: 1, last_updated: 2015-12-1, status: 1, price: 1}
{id: <auto>, product_id: 2, last_updated: 2015-12-1, status: 1, price: 10}
{id: <auto>, product_id: 1, last_updated: 2015-12-2, status: 1, price: 2}
{id: <auto>, product_id: 1, last_updated: 2015-12-3, status: 0, price: 2}
{id: <auto>, product_id: 2, last_updated: 2015-12-2, status: 0, price: 10}
{id: <auto>, product_id: 3, last_updated: 2015-12-2, status: 1, price: 123}
{id: <auto>, product_id: 1, last_updated: 2015-12-4, status: 2, price: 2}
{id: <auto>, product_id: 2, last_updated: 2015-12-4, status: 2, price: 10}
Now, I am trying to find a way, maybe using a secondary compound index, do get for example, given a date like in col1 (using r.time)
DATE STATUS==1 STATUS==0 STATUS==2
2015-12-1 [101, 102] [] []
2015-12-2 [103, 106] [105] []
2015-12-3 [106] [104, 105] []
2015-12-4 [] [] [107, 108]
The difficulty here, is that a product_id document is still to be considered as the most recent status as long as its last_updated date is less or equal to the provided date.
I try by grouping by product_id, then take the max('last_updated'), then only keep each reduction unique document if status==1
I have in mind to have an index for each status / given_date
Or another solution, would be to insert in another table the result of an aggregation which would only store a unique document per date, containing all the initial documents ids matching the same criteria, and so on...
And then later perform joins using these intermediate records to fetch the values of each product_id at the given date/status.
something like:
{
date: <date_object>,
documents: [
{id: document_id, status: 1},
{id: document_id, status: 1},
{id: document_id, status: 2},
{id: document_id, status: 0},
...
]
}
Please advise
Edit 1:
This is an example of a query I try to run to analyse my data, here it is for example to get an overview of the statuses for each group with more than 1 document:
r.db('test').table('products_10k_sample')
.group({index: 'product_id'})
.orderBy(r.desc('last_updated'))
.ungroup()
.map(function(x){
return r.branch(
x('reduction').count().gt(1),
x('reduction').map(function(m){
return [m('last_updated').toISO8601(), m('status'), m('product_id')]
}),
null
)
})

Getting rows in the same table

I'm trying to get rows in a Table.
Imagine I got two records (t1 and t2). I want to get rows that do not have the t1.start_hour BETWEEN t2.start_hour and t2.finish_hour. I basically want only to get the occurrences that don not have conflict in hours with another one.
This is the table:
create_table "occurrences", :force => true do |t|
t.string "start_hour"
t.string "finish_hour"
t.date "start_date"
t.date "finish_date"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
t.integer "activity_id"
end
And this is the SQL query I came up so far:
Occurrence.find_by_sql("SELECT * FROM occurrences t1 INNER JOIN occurrences t2 ON (t1.start_hour NOT BETWEEN t2.start_hour and t2.finish_hour)")
It gives me duplicate results. I'm not be able to remove them and get the correct answer.
Thanks for the help in advance.
Example
INPUT
#<Occurrence id: 1, start_hour: "19:00", finish_hour: "20:20", start_date: "2012-05-30", finish_date: "2012-05-30", created_at: "2012-05-30 09:58:19", updated_at: "2012-05-30 09:58:19", activity_id: 1>,
#<Occurrence id: 2, start_hour: "19:30", finish_hour: "20:10", start_date: "2012-05-30", finish_date: "2012-05-30", created_at: "2012-05-30 09:58:19", updated_at: "2012-05-30 09:58:19", activity_id: 2>,
#<Occurrence id: 3, start_hour: "22:00", finish_hour: "23:20", start_date: "2012-05-30", finish_date: "2012-05-30", created_at: "2012-05-30 09:58:20", updated_at: "2012-05-30 09:58:20", activity_id: 3>
OUTPUT
#<Occurrence id: 1, start_hour: "19:00", finish_hour: "20:20", start_date: "2012-05-30", finish_date: "2012-05-30", created_at: "2012-05-30 09:58:19", updated_at: "2012-05-30 09:58:19", activity_id: 1>,
#<Occurrence id: 3, start_hour: "22:00", finish_hour: "23:20", start_date: "2012-05-30", finish_date: "2012-05-30", created_at: "2012-05-30 09:58:20", updated_at: "2012-05-30 09:58:20", activity_id: 3>
The record with the start_hour = 19:30 does not output because is between 19:00 and 20:20 of another one.
EDIT:
I Got the solution:
Occurrence.find_by_sql("SELECT start_hour FROM occurrences WHERE start_hour NOT IN (SELECT t2.start_hour FROM occurrences t1 INNER JOIN occurrences t2 ON ((t1.activity_id <> t2.activity_id AND t2.start_hour BETWEEN t1.start_hour and t1.finish_hour)))")
Thanks for the help
Let assume there are 3 records in table. (I am taking integer in place of datatime as this will be easier)
id start end
1 1 3
2 4 5
3 7 9
When I will try to find the rows were start is not in between start and end, than for id =1 both first row will be true and will come in result. Similarly for row with id =2 (start=4) both rows will qualify (making third row come twice in result) Same will happen for third row and you will end up with six rows.
Its not very clear what you are trying to achieve here, but putting distinct will remove the duplicate.
EDIT: You might consider putting inner joins on start and finish date.
not tested (from memory)
you need to exlude the record itself --> t1.activity_id <> t2.activity_id
left join or you won't get the good ones
where there is no right side
would need to test this :p
SELECT * FROM occurrences t1
left JOIN occurrences t2
ON (t1.activity_id <> t2.activity_id and t1.start_hour BETWEEN t2.start_hour and t2.finish_hour)
where t2.activity_id is null