Time based priority in Active Record Query - sql

I have a table which has job listings, which when displayed are normally ordered by the created_at field descending. I am in the process of adding a "featured" boolean flag which would add the ability for customers to get more visibility to their job listing. I'd like to have the featured listings pinned to the top of the search results if the job is less than X days old. How would I modify by existing query to support this?
Jobs.where("expiration_date >= ? and published = ?", Date.today, true).order("created_at DESC")
Current query pulls back all current, published jobs, ordered by created_at.

Unlike some other databases (like Oracle) PostgreSQL has a fully functional boolean type. You can use it directly in an ORDER BY clause without applying a CASE statement - those are great for more complex situations.
Sort order for boolean values is:
FALSE -> TRUE -> NULL
If you ORDER BY bool_expressionDESC, you invert the order to:
NULL -> TRUE -> FALSE
If you want TRUE first and NULL last, use the NULLS LAST clause of ORDER BY:
ORDER BY (featured AND created_at > now() - interval '11 days') DESC NULLS LAST
, created_at DESC
Of course, NULLS LAST is only relevant if featured or created_at can be NULL. If the columns are defined NOT NULL, then don't bother.
Also, FALSE would be sorted before NULL. If you don't want to distinguish between these two, you are either back to a CASE statement, or you can throw in NULLIF() or COALESCE().
ORDER BY NULLIF(featured AND created_at > now() - interval '11 days'), FALSE)
DESC NULLS LAST
, created_at DESC
Performance
Note, how I used:
created_at > now() - interval '11 days'
and not:
now() - created_at < interval '11 days'
In the first example, the expression to the right is a constant that is calculated once. Then an index can be utilized to look up matching rows. Very efficient.
The latter cannot usually be used with an index. A value has to be computed for every single row, before it can be checked against the constant expression to the right. Don't do this if you can avoid it. Ever!

Not sure what you want to achieve here.
I guess you'll be paginating the results. If so, and you want to display featured jobs always on top, regardless of the page, then you should pull them from the DB separately. If you just want to display them on the first page, order by published like this :
Jobs.where("expiration_date >= ?", Date.today).order("published DESC, created_at DESC")
If you want to pull them separately :
#featured_jobs = Jobs.where("expiration_date >= ? and published = ?", Date.today, true).order("created_at DESC")
#regular_jobs = Jobs.where("expiration_date >= ? AND published = ?", Date.today, false).order("created_at DESC") #Paginate them too ... depends on the gem you're using

Related

Timestamp to date in SQL

Here is what I did:
Select count(check_id)
From Checks
Where timestamp::date > '2012-07-31'
Group by 1
Is it right to do it like I did or is there a better way? Should/could I have used the DateDIFF function in my WHERE clause? Something like: DATEDIFF(day, timestamp, '2012/07/31') > 0
Also, I need to figure out how I'd calculate the total rate of acceptance for this
time period? Can anyone provide their expertise with this?
Is it right to do it like I did or is there a better way?
Using a cast like that is a perfectly valid way to convert a timestamp to a date (I don't understand the reference to the non-existing datediff though - why would adding anything to a timestamp change it)
However, the cast has one drawback: if there is an index on the column "timestamp" it won't be used.
But as you just want a range after a certain date, there is no reason to cast the column to begin with.
The following will achieve the same thing as your query, but can make use of an index on the column "timestamp" in case there is one and using it is considered beneficial by the optimizer.
Select count(distinct check_id)
From Checks
Where "timestamp" > date '2012-07-31' + 1
Note the + 1 which selects the day after, otherwise the query would include rows that are on that date but after midnight.
I removed the unnecessary group by from your query.
If you want to get a count per day, then you will need to include the day in the SELECT list. In that case casting is a good way to do it:
Select "timestamp"::date, count(distinct check_id)
From Checks
Where "timestamp" > date '2012-07-31' + 1
group by "timestamp"::date

Pagination based on time

I have a PostgreSQL database with posts and I am trying to implement pagination for it.
The table looks like this:
postid | title | author | created
where created has the type timestamp without timezone.
My query looks like
SELECT * from posts
WHERE extract(EPOCH FROM created) < :limit
ORDER BY created DESC
LIMIT 3
Here :limit is a long in java which I pass as a parameter.
However, I always retrieve the same three newest posts, even if I have a limit smaller than the timestamp of the three posts. So I guess that the extract(EPOCH FROM created) part is wrong but I do not know how to fix it.
Your code looks fine, and should do what you want, provided that :limit is really what you think it is.
I would, however, suggest moving the conversion to the right operand rather than converting the stored value to epoch. This is much more efficient, and may take advantage of an index on the timestamp column:
SELECT *
from posts
WHERE created < date'1970-01-01' + :limit * interval '1 second'
ORDER BY created DESC
LIMIT 3
Or:
WHERE created < to_timestamp(:limit::bigint)
A possible problem is that :limit is given in milliseconds rather than seconds. If so:
WHERE created < to_timestamp((:limit/1000)::bigint)

SELECT if exists where DATE=TODAY, if not where DATE=YESTERDAY

I have a table with some columns and a date column (that i made a partition with)
For example
[Amount, Date ]
[4 , 2020-4-1]
[3 , 2020-4-2]
[5 , 2020-4-4]
I want to get the latest Amount based on the Date.
I thought about doing a LIMIT 1 with ORDER BY, but, is that optimized by BigQuery or it will scan my entire table?
I want to avoid costs at all possible, I thought about doing a query based on the date today, and if nothing found search for yesterday, but I don't know how to do it in only one query.
Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(amount ORDER BY `date` DESC LIMIT 1)[SAFE_OFFSET(0)]
FROM `project.dataset.table`
WHERE `date` >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
Note: above assumes your date field is of DATE data type.
If your date field is a partition, you can use it in WHERE clause to filter which partitions should be read in your query.
In your case, you could do something like:
SELECT value
FROM <your-table>
WHERE Date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
ORDER BY Data DESC
LIMIT 1
This query basically will:
Filter only today's and yesterday's partitions
Order the rows by your Date field, from the most recent to the older
Select the first element of the ordered list
If the table has a row with today's date, the query will return the data for today. If it dont't, the query will return the data for yesterday.
Finally, I would like to attach here this reference regarding querying partitioned tables.
I hope it helps
The LIMIT order stops the query whet it gets the amount of results indicated.
I think the query should be something like this, I'm not sure if "today()-1" returns
SELECT Amount
FROM <table> as t
WHERE date(t.Date) = current_date()
OR date(t.Date) = makedate(year(current_date()), dayofyear(current_date())-1);
Edited: Sorry, my answer is for MariaDB I now see you ask for Google-BigQuery which I didn't even know, but it looks like SQL, I hope it has some functions like the ones I posted.

Using a top 10 query to then search all records associated with them

I'm not super experienced with sql in general, and I'm trying to accomplish a pretty specific task- I want to first run a query to get the ID's of all my units with the top number of hits, and then from that run again to get the messages and counts of all the types of hits for those IDs in a specific time period.For the first query, I have this:
SELECT entity, count(entity) as Count
from plugin_status_alerts
where entered BETWEEN now() - INTERVAL '14 days' AND now()
group by entity
order by count(entity) DESC
limit 10
which results in this return:
"38792";3
"39416";2
"37796";2
"39145";2
"37713";2
"37360";2
"37724";2
"39152";2
"39937";2
"39667";2
The idea is to then use that result set to then run another query that orders by entity and status_code. I tried something like this:
SELECT status_code, entity, COUNT(status_code) statusCount
FROM plugin_status_alerts
where updated BETWEEN now() - INTERVAL '14 days' AND now() AND entity IN
(SELECT id.entity, count(id.entity) as Count
from plugin_status_alerts id
where id.updated BETWEEN now() - INTERVAL '14 days' AND now()
group by id.entity
order by count(id.entity) DESC
limit 10
)
GROUP BY status_code, entity
but I get the error
ERROR: subquery has too many columns
I'm not sure if this is the route I should be going, or if maybe I should be trying a self join- either way not sure how to correct for whats happening now.
Use a JOIN instead of IN (subquery). That's typically faster, and you can use additional values from the subquery if you need to (like the total count per entity):
SELECT entity, status_code, count(*) AS status_ct
FROM (
SELECT entity -- not adding count since you don't use it, but you could
FROM plugin_status_alerts
WHERE entered BETWEEN now() - interval '14 days' AND now()
GROUP BY entitiy
ORDER BY count(*) DESC, entitiy -- as tie breaker to get stable result
LIMIT 10
) sub
JOIN plugin_status_alerts USING (entity)
WHERE updated BETWEEN now() - interval '14 days' AND now()
GROUP BY 1, 2;
Notes
If you don't have future entries by design, you can simplify:
WHERE entered > now() - interval '14 days'
Since the subquery only returns a single column (entity), which is merged with the USING clause, column names are unambiguous and we don't need table qualification here.
LIMIT 10 after you sort by the count is likely to be ambiguous. Multiple rows can tie for the 10th row. Without additional items in ORDER BY, Postgres returns arbitrary picks, which may or may not be fine. But the result of the query can change between calls without any changes to the underlying data. Typically, that's not desirable and you should add columns or expressions to the list to break ties.
count(*) is a bit faster than count(status_code) and doing the same - unless status_code can be null, in which case you would get 0 as count for this row (count() never returns null) instead of the actual row count, which is either useless or actively wrong. Use count(*) either way here.
GROUP BY 1, 2 is just syntactical shorthand. Details:
Select first row in each GROUP BY group?
When you plug your first query into the second and use it in the in clause you still return two columns when the in only wants one. Either do this:
SELECT status_code, entity, COUNT(status_code) statusCount
FROM plugin_status_alerts
where updated BETWEEN now() - INTERVAL '14 days' AND now()
AND entity IN (
SELECT id.entity
from plugin_status_alerts id
where id.updated BETWEEN now() - INTERVAL '14 days' AND now()
group by id.entity
order by count(id.entity) DESC
limit 10
)
GROUP BY status_code, entity
Or use the first query as a derived table and join with it.

Possible to use SQL to sort by date but put null dates at the back of the results set?

I have a bunch of tasks in a MySQL database, and one of the fields is "deadline date". Not every task has to have to a deadline date.
I'd like to use SQL to sort the tasks by deadline date, but put the ones without a deadline date in the back of the result set. As it is now, the null dates show up first, then the rest are sorted by deadline date earliest to latest.
Any ideas on how to do this with SQL alone? (I can do it with PHP if needed, but an SQL-only solution would be great.)
Thanks!
Here's a solution using only standard SQL, not ISNULL(). That function is not standard SQL, and may not work on other brands of RDBMS.
SELECT * FROM myTable
WHERE ...
ORDER BY CASE WHEN myDate IS NULL THEN 1 ELSE 0 END, myDate;
SELECT * FROM myTable
WHERE ...
ORDER BY ISNULL(myDate), myDate
SELECT foo, bar, due_date FROM tablename
ORDER BY CASE ISNULL(due_date, 0)
WHEN 0 THEN 1 ELSE 0 END, due_date
So you have 2 order by clauses. The first puts all non-nulls in front, then sorts by due date after that
The easiest way is using the minus operator with DESC.
SELECT * FROM request ORDER BY -date DESC
In MySQL, NULL values are considered lower in order than any non-NULL value, so sorting in ascending (ASC) order NULLs are listed first, and if descending (DESC) they are listed last.
When a - (minus) sign is added before the column name, NULL become -NULL.
Since -NULL == NULL, adding DESC make all the rows sort by date in ascending order followed by NULLs at last.