Rails combine group by and min - sql

Assume we have a table called Activities
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
I want to find out the player_id and it's first event_date as first_date.
SQL:
SELECT Activities.player_id, min(Activities.event_date) as first_date
FROM `activities`
GROUP BY `activities`.`player_id`
Result table:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
How to do it in Rails?
I've tried this one but retrieve an Activity collection which only contains player_id.
Activity.select('Activities.player_id, min(Activities.event_date) as first_date')
.group(:player_id)
Like this
[#<Activity:0x00007f94923bb888 player_id: 1>, #<Activity:0x00007f94923bb608 player_id: 2>, #<Activity:0x00007f94923b9ba0 player_id: 3>]

Actually your above query result
result = [#<Activity:0x00007f94923bb888 player_id: 1>, #<Activity:0x00007f94923bb608 player_id: 2>, #<Activity:0x00007f94923b9ba0 player_id: 3>]
loads first_date column, as first_date is not the attribute of Activity Object and rails do not display the virtual columns in this way.
You can access it using this syntax
result.last.first_date
If you need to see date value in result objects then modify your query like this
Activity.select('Activities.player_id, min(Activities.event_date) as event_date').group(:player_id)
then you will be able to get your desired result
<Activity:0x00007f94923bb608 player_id: 2, event_date: 'date value' >, #<Activity:0x00007f94923b9ba0 player_id: 3, event_date: 'date value' >]

In the result you got only player_id because when you do Activity.select('..'), rails returns ActiveRecord model object.
You probably want to run a custom query and convert the output to an array like this:
result = ActiveRecord::Base.connection.execute("SELECT Activities.player_id, min(Activities.event_date) as first_date FROM activities GROUP BY activities.player_id")
result.to_a # Converts PG::Result to an array, 'result.as_json' converts to json
Hope this helps.

Related

How to use ID JOIN instead of DATEDIFF()

Write a SQL query to find all dates' id with a higher temperature compared to its previous dates (yesterday).
Try out if you want: https://leetcode.com/problems/rising-temperature/
Input:
Weather table:
+----+------------+-------------+
| id | recordDate | temperature |
+----+------------+-------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+----+------------+-------------+
Output:
+----+
| id |
+----+
| 2 |
| 4 |
+----+
Here's my code:
SELECT w_2.id AS "Id"
FROM Weather w_1
JOIN Weather w_2
ON w_1.id + 1 = w_2.id
WHERE w_1.temperature < w_2.temperature
But my code won't be accepted even if it looks exactly like the expected output.
I know the answer is:
SELECT w2.id
FROM Weather w1, Weather w2
WHERE w2.temperature > w1.temperature
AND DATEDIFF(w2.recordDate, w1.recordDate) = 1
But I tried to not use DATEDIFF because this function is not available in PostgreSQL.
The queries are not compatible. You should join the table on recordDate, not on Id.
SELECT w_2.id AS "Id"
FROM Weather w_1
JOIN Weather w_2
ON w_1.recordDate + 1 = w_2.recordDate
WHERE w_1.temperature < w_2.temperature
Do not assume that Id is sequential and ordered in the same way as recordDate, although the sample data may suggest this.

how to join tables on cases where none of function(a) in b

Say in MonetDB (specifically, the embedded version from the "MonetDBLite" R package) I have a table "events" containing entity ID codes and event start and end dates, of the format:
| id | start_date | end_date |
| 1 | 2010-01-01 | 2010-03-30 |
| 1 | 2010-04-01 | 2010-06-30 |
| 2 | 2018-04-01 | 2018-06-30 |
| ... | ... | ... |
The table is approximately 80 million rows of events, attributable to approximately 2.5 million unique entities (ID values). The dates appear to align nicely with calendar quarters, but I haven't thoroughly checked them so assume they can be arbitrary. However, I have at least sense-checked them for end_date > start_date.
I want to produce a table "nonevent_qtrs" listing calendar quarters where an ID has no event recorded, e.g.:
| id | last_doq |
| 1 | 2010-09-30 |
| 1 | 2010-12-31 |
| ... | ... |
| 1 | 2018-06-30 |
| 2 | 2010-03-30 |
| ... | ... |
(doq = day of quarter)
If the extent of an event spans any days of the quarter (including the first and last dates), then I wish for it to count as having occurred in that quarter.
To help with this, I have produced a "calendar table"; a table of quarters "qtrs", covering the entire span of dates present in "events", and of the format:
| first_doq | last_doq |
| 2010-01-01 | 2010-03-30 |
| 2010-04-01 | 2010-06-30 |
| ... | ... |
And tried using a non-equi merge like so:
create table nonevents
as select
id,
last_doq
from
events
full outer join
qtrs
on
start_date > last_doq or
end_date < first_doq
group by
id,
last_doq
But this is a) terribly inefficient and b) certainly wrong, since most IDs are listed as being non-eventful for all quarters.
How can I produce the table "nonevent_qtrs" I described, which contains a list of quarters for which each ID had no events?
If it's relevant, the ultimate use-case is to calculate runs of non-events to look at time-till-event analysis and prediction. Feels like run length encoding will be required. If there's a more direct approach than what I've described above then I'm all ears. The only reason I'm focusing on non-event runs to begin with is to try to limit the size of the cross-product. I've also considered producing something like:
| id | last_doq | event |
| 1 | 2010-01-31 | 1 |
| ... | ... | ... |
| 1 | 2018-06-30 | 0 |
| ... | ... | ... |
But although more useful this may not be feasible due to the size of the data involved. A wide format:
| id | 2010-01-31 | ... | 2018-06-30 |
| 1 | 1 | ... | 0 |
| 2 | 0 | ... | 1 |
| ... | ... | ... | ... |
would also be handy, but since MonetDB is column-store I'm not sure whether this is more or less efficient.
Let me assume that you have a table of quarters, with the start date of a quarter and the end date. You really need this if you want the quarters that don't exist. After all, how far back in time or forward in time do you want to go?
Then, you can generate all id/quarter combinations and filter out the ones that exist:
select i.id, q.*
from (select distinct id from events) i cross join
quarters q left join
events e
on e.id = i.id and
e.start_date <= q.quarter_end and
e.end_date >= q.quarter_start
where e.id is null;

SELECT 1 ID and all belonging elements

I try to create a json select query which can give me back the result on next way.
1 row contains 1 main_message_id and belonging messages. (Like the bottom image.) The json format is not a requirement, if its work with other methods, it will be fine.
I store the data as like this:
+-----------------+---------+----------------+
| main_message_id | message | sub_message_id |
+-----------------+---------+----------------+
| 1 | test 1 | 1 |
| 1 | test 2 | 2 |
| 1 | test 3 | 3 |
| 2 | test 4 | 4 |
| 2 | test 5 | 5 |
| 3 | test 6 | 6 |
+-----------------+---------+----------------+
I would like to create a query, which give me back the data as like this:
+-----------------+-----------------------+--+
| main_message_id | message | |
+-----------------+-----------------------+--+
| 1 | {test1}{test2}{test3} | |
| 2 | {test4}{test5}{test6} | |
| 3 | {test7}{test8}{test9} | |
+-----------------+-----------------------+--+
You can use json_agg() for that:
select main_message_id, json_agg(message) as messages
from the_table
group by main_message_id;
Note that {test1}{test2}{test3} is invalid JSON, the above will return a valid JSON array e.g. ["test1", "test2", "test3"]
If you just want a comma separated list, use string_agg();
select main_message_id, string_ag(message, ', ') as messages
from the_table
group by main_message_id;

Postgresql select on Rails UTC date time, then group by string and perform an average

In my Rails app, I'm trying to perform a query without ActiveRecord. Essentially what I want to do is select records whose created_at matches a given DateTime, then group the records by string type, then average their values. Note: I'm using PostgreSQL.
So for example, running the desired query on the Movie records below would yield something like:
{ Blah: 6, Hi: 2, Hello: 4}
id | value | event_id | user_id | created_at | updated_at | type
----+-------+----------+---------+----------------------------+----------------------------+-------------
1 | 1 | 1 | 1 | 2014-01-22 03:42:44.86269 | 2014-02-15 01:54:15.562552 | Blah
2 | 10 | 1 | 1 | 2014-01-22 03:42:44.86269 | 2014-02-15 01:54:15.574191 | Blah
3 | 1 | 1 | 1 | 2014-01-22 03:42:44.86269 | 2014-02-15 01:54:15.577179 | Hi
4 | 2 | 1 | 1 | 2014-01-22 03:42:44.86269 | 2014-02-15 01:54:15.578864 | Hi
5 | 7 | 1 | 1 | 2014-01-22 03:42:44.86269 | 2014-02-15 01:54:15.580517 | Hello
6 | 1 | 1 | 1 | 2014-01-22 03:42:44.86269 | 2014-02-15 01:54:15.58203 | Hello
(6 rows)
I think I can piece together the group by and average points, but I'm running into a wall trying to match records based on the created_at. I've tried:
select * from movies where 'movies.created_at' = '2014-01-22 03:42:44.86269'
And a few other variations where I try to call to_char, including:
select * FROM movies WHERE 'movies.created_at' = to_char('2014-01-22 03:42:44.86269'::TIMESTAMP, 'YYYY-MM-DD HH24:MI:SS');
The ActiveModel record for the first record in the above looks like this:
=> #<Movie id: 1, value: "1", event_id: 1, user_id: 1, created_at: "2014-01-22 03:42:44", updated_at: "2014-02-15 01:54:15", type: "Blah">
Its created_at, which is an ActiveSupport::TimeWithZone class looks like:
=> Wed, 22 Jan 2014 03:42:44 UTC +00:00
I imagine it has something to do with UTC but I can't figure it out. If anyone has any ideas I'd greatly appreciate it.
Single-quoted values are interpreted by Postgres as literal strings. So your first query is looking for records where the literal string movies.created_at is equal to the literal string 2014-01-22 03:42:44.86269 - none of which exist.
Quoted identifiers in Postgres are quoted with double-quotes; also note that references with explicit table references (movies.created_at) are correctly quoted with the dot outside the quotes ("movies"."created_at") - if the dot is inside the quotes, it is interpreted as part of the column name.
You may want to keep the Postgres SQL reference handy in the future. :)

How to check date in postgresql

my table name is tbl1. The fileds are id,name,txdate.
| ID | NAME | TXDATE |
| 1 | RAJ | 1-1-2013 |
| 2 | RAVI | |
| 3 | PRABHU | 25-3-2013 |
| 4 | SAT | |
Now i want to use select query for check txdate < 2-2-2013 in which rows have txdate not empty and the select also retrun which rows have txdate empty.
The Result is like this
| ID | NAME | TXDATE |
| 1 | RAJ | 1-1-2013 |
| 2 | RAVI | |
| 4 | SAT | |
Any feasible solution is there?.
With out using union it is possible?.
Assuming that the TXDATE is of data type DATE then you can use WHERE "TXDATE" < '2013-2-2' OR "TXDATE" IS NULL. Something like:
SELECT *
FROM table1
WHERE "TXDATE" < '2013-2-2'
OR "TXDATE" IS NULL;
See it in action:
SQL Fiddle Demo
I don't now what database your are using and what data type the TXDATE is.
I just tried on my postgreSQL 9.2, with a field "timestamp without time zone".
I have three rows in the table , like:
ac_device_name | ac_last_heartbeat_time
----------------+-------------------------
Nest-Test1 |
Nest-Test3 |
Nest-Test2 | 2013-04-10 15:06:18.287
Then use below statement
select ac_device_name,ac_last_heartbeat_time
from at_device
where ac_last_heartbeat_time<'2013-04-11';
It is ok to return only one record:
ac_device_name | ac_last_heartbeat_time
----------------+-------------------------
Nest-Test2 | 2013-04-10 15:06:18.287
I think you can try statement like:
select * from tbl1 where TXDATE<'2-2-2013' and TXDATE is not NULL
this statement also works in my environment.