Select unique records and display as category headers in rails - sql

I have a rails 3.2 app running on PostgreSQL, and have some data I want to display in my view, which is stored in the database in this structure:
+----+--------+------------------+--------------------+
| id | name | sched_start_date | task |
+----+--------+------------------+--------------------+
| 1 | "Ben" | 2013-03-01 | "Check for debris" |
+----+--------+------------------+--------------------+
| 2 | "Toby" | 2013-03-02 | "Carry out Y1.1" |
+----+--------+------------------+--------------------+
| 3 | "Toby" | 2013-03-03 | "Check oil seals" |
+----+--------+------------------+--------------------+
I would like to display a list of tasks for each name, and for the names to be ordered ASC by the first sched_start_date they have, which should look like ...
Ben
2013-03-01 – Check for debris
Toby
2013-03-02 – Carry out Y1.1
2013-03-03 – Check oil seals
The approach I starting taking was to run a query for unique names and order them by sched_start_date ASC, then run a query for each name to get their tasks.
To get a list of unique names, the SQL would look like this.
select *
from (
select distinct on (name) name, sched_start_date
from tasks
) p
order by sched_start_date;
I would like to know if this is the correct approach (querying for unique names then running another query for all their tasks), or if there is a better rails way.

To get the data sorted like you describe, you might want to use min() as window function in the ORDER BY clause:
SELECT name, sched_start_date, task
FROM tasks
ORDER BY min(sched_start_date) OVER (PARTITION BY name), 1, 2, 3
Your original query would need an additional ORDER BY item to get the earliest date per name:
SELECT DISTINCT ON (name) name, sched_start_date, task
FROM tasks
ORDER BY 1, 2, 3;
I also added task (3) as last ORDER BY item to break ties, in case there can be more than one per date.
But the output is still ordered by name, not by date.
Getting your peculiar format with all data stuffed into one column is a bit more complex:
SELECT one_col
FROM (
WITH x AS (
SELECT name, min(sched_start_date) AS min_start
FROM tasks
GROUP BY 1
)
SELECT 2 AS rnk, name
,sched_start_date::text || ' – ' || task AS one_col
,sched_start_date, min_start
FROM tasks
JOIN x USING (name)
UNION ALL
SELECT 1 AS rnk, name, name, NULL::date, min_start
FROM x
ORDER BY min_start, name, rnk, sched_start_date, task
) y

Assuming that you have associations in your model you would be able to run
#employees = Employee.order(:name, :sched_start_date, :task).includes(:tasks)
You could then iterate over them:
#employees.each do |employee|
employee.name
employee.tasks.each do |task|
task.name
end
end
This isn't gonna exactly match your needs, but should show you where to start.

Related

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

Is there a way to select results after a certain id in an order list?

I'm trying to implement a cursor-based paginating list based off of data from a Postgres database.
As an example, say I have a table with the following columns:
id | firstname | lastname
I want to paginate this data, which would be pretty simple if I only ever wanted to sort it by the id, but in my case, I want the option to sort by last name, and there's guaranteed to be multiple people with the same last name.
If I have a select statement like follows:
SELECT * FROM people
ORDER BY lastname ASC;
In the case, I could make my encoded cursor contain information about the lastname so I could pick up where I left off, but since there will be multiple users with the same last name, this will be buggy. Is there a way in SQL to only get the results after a certain id in an ordered list where it is not the column by which the results are sorted?
Example results from the select statement:
1 | John | Doe
4 | John | Price
2 | Joe | White
6 | Jim | White
3 | Sam | White
5 | Sally | Young
If I wanted a page size of 3, I couldn't add WHERE lastname <= :lastname as I'd have duplicate data on the list since it would return ids 2, 6, and 3 during that call. In my case, it'd be helpful if I could add to my query something similar to AFTER id = 6 where it could skip everything until it finds that id in the ordered list.
Yes. If I understand correctly:
select t.*
from t
where (lastname, id) > (select t2.lastname, t2.id
from t t2
where t2.id = ?
)
order by t.lastname;
I think I would add firstname into the mix, but it is the same idea.
Limit and offset are used for pagination e.g.:
SELECT id, lastname, firstname FROM people
Order by lastname, firstname, id
Offset 0
Limit 10
This will bring you the first to the 10th row, to retrieve the next page you need to specify the offset to 10
Here the documentation:
https://www.postgresql.org/docs/9.6/static/queries-limit.html

The best way to keep count data in postgres

I need to create a statistic for some aggragete date splitted by days.
For example:
select
(select count(*) from bananas) as bananas_count,
(select count(*) from apples) as apples_count,
(select count(*) from bananas where color = 'yellow') as yellow_bananas_count;
obviously I will get:
bananas_count | apples_count | yellow_bananas_count
--------------+------------------+ ---------------------
123| 321 | 15
but I need to get that data grouped by day, we need to know how many banaras we had yesterday.
The first thought which I got is create aview, but in that case i will not be able split by dates ( or I don't know how to do it).
I need a performance-wise database sided implementation of this task.

Postgresql union query with priority on one query

So I have a table with 2 columns
class_id | title
CS124 | computer tactics
CS101 | intro to computers
MATH157 | math stuff
CS234 | other CS stuff
FRENCH50 | TATICS of french
ENGR101 | engineering ETHICS
OTHER1 | other CS title
I want to do a sort of smart search for auto complete where a user searches for something.
Lets say they type 'CS' into the box I want to search using both the class_id and title with a limit of lets say 5 for this example. I first want to search for class_ids like 'CS%' with a limit of 5 ordered by class_id. This will return the 3 cs classes.
Then if there is any room left in the limit I want to search using title like '%CS% and combine them but have the class_id matches be first, and make sure that duplicates are removed from the bottom like like cs234 where it would match on both queries.
So the end result for this query would be
CS101 | intro to computers
CS124 | computer tactics
CS234 | other CS stuff
ENGR101 | engineering ETHICS
FRENCH50 | TATICS of french
I am trying to do something like this
(select * from class_infos
where LOWER(class_id) like LOWER('CS%')
order by class_id)
union
(select * from class_infos
where LOWER(title) like LOWER('%CS%')
order by class_id)
limit 30
But it is not putting them in the right order or make the class id query have priority. Anyone have any suggestions
Here is the sqlfiddle
http://sqlfiddle.com/#!15/5368b
Have you try something like this?
SQL Fiddle Demo
SELECT *
FROM
(
(select 1 as priority, *
from class_infos
where LOWER(class_id) like LOWER('CS%'))
union
(select 2 as priority, *
from class_infos
where
LOWER(title) like LOWER('%CS%')
and not LOWER(class_id) like LOWER('CS%')
)
) as class
ORDER BY priority, class_id
limit 5

Need a Complex SQL Query

I need to make a rather complex query, and I need help bad. Below is an example I made.
Basically, I need a query that will return one row for each case_id where the type is support, status start, and date meaning the very first one created (so that in the example below, only the 2/1/2009 John's case gets returned, not the 3/1/2009). The search needs to be dynamic to the point of being able to return all similar rows with different case_id's etc from a table with thousands of rows.
There's more after that but I don't know all the details yet, and I think I can figure it out if you guys (an gals) can help me out here. :)
ID | Case_ID | Name | Date | Status | Type
48 | 450 | John | 6/1/2009 | Fixed | Support
47 | 450 | John | 4/1/2009 | Moved | Support
46 | 451 | Sarah | 3/1/2009 | |
45 | 432 | John | 3/1/2009 | Fixed | Critical
44 | 450 | John | 3/1/2009 | Start | Support
42 | 450 | John | 2/1/2009 | Start | Support
41 | 440 | Ben | 2/1/2009 | |
40 | 432 | John | 1/1/2009 | Start | Critical
...
Thanks a bunch!
Edit:
To answer some people's questions, I'm using SQL Server 2005. And the date is just plain date, not string.
Ok so now I got further in the problem. I ended up with Bliek's solution which worked like a charm. But now I ran into the problem that sometimes the status never starts, as it's solved immediately. I need to include this in as well. But only for a certain time period.
I imagine I'm going to have to check for the case table referenced by FK Case_ID here. So I'd need a way to check for each Case_ID created in the CaseTable within the past month, and then run a search for these in the same table and same manner as posted above, returning only the first result as before. How can I use the other table like that?
As usual I'll try to find the answer myself while waiting, thanks again!
Edit 2:
Seems this is the answer. I don't have access to the full DB yet so I can't fully test it, but it seems to be working with the dummy tables I created, to continue from Bliek's code's WHERE clause:
WHERE RowNumber = 1 AND Case_ID IN (SELECT Case_ID FROM CaseTable
WHERE (Date BETWEEN '2007/11/1' AND '2007/11/30'))
The date's screwed again but you get the idea I'm sure. Thanks for the help everyone! I'll get back if there're more problems, but I think with this info I can improvise my way through most of the SQL problems I currently have to deal with. :)
Maybe something like:
select Case_ID, Name, MIN(date), Status, Type
from table
where type = 'Support'
and status = 'Start'
group by Case_ID, Name, Status, Type
EDIT: You haven't provided a lot of details about what you really want, so I'd suggest that you read all the answers and choose one that suits your problem best. So far I'd say that Tomalak's answer is closest to what you're looking for...
SELECT
c.ID,
c.Case_ID,
c.Name,
c.Date,
c.Status,
c.Type
FROM
CaseTable c
WHERE
c.Type = 'Support'
AND c.Status = 'Start'
AND c.Date = (
SELECT MIN(Date)
FROM CaseTable
WHERE Case_ID = c.Case_ID AND Type = c.Type AND Status = c.Status)
/* GROUP BY only needed when for a given Case_ID several rows
exist that fulfill the WHERE clause */
GROUP BY
c.ID,
c.Case_ID,
c.Name,
c.Date,
c.Status,
c.Type
This query benefits greatly from indexes on the Case_ID, Date, Status and Type columns.
Added value though the fact that the filter on Support and Status only needs to be set in one place.
As an alternative to the GROUP BY clause, you can do SELECT DISTINCT, which would increase readability (this may or may not affect overall performance, I suggest you measure both variants against each other). If you are sure that for no Case_ID in your table two rows exist that have the same Date, you won't need GROUP BY or SELECT DISTINCT at all.
In SQL Server 2005 and beyond I would use Common Table Expressions (CTE). This offers lots of possibilities like so:
With ResultTable (RowNumber
,ID
,Case_ID
,Name
,Date
,Status
,Type)
AS
(
SELECT Row_Number() OVER (PARTITION BY Case_ID
ORDER BY Date ASC)
,ID
,Case_ID
,Name
,Date
,Status
,Type
FROM CaseTable
WHERE Type = 'Support'
AND Status = 'Start'
)
SELECT ID
,Case_ID
,Name
,Date
,Status
,Type
FROM ResultTable
WHERE RowNumber = 1
Don't apologize for your date formatting, it makes more sense that way.
SELECT ID, Case_ID, Name, MIN(Date), Status, Type
FROM caseTable
WHERE Type = 'Support'
AND status = 'Start'
GROUP BY ID, Case_ID, Name, Status, Type