Select rows whose IDs are not found in another table under certain constraints - sql

The practical problem I'm solving is to display a list of items that have been updated recently that the user has not viewed recently.
I'm trying to return a table containing items that are not containing in an item views table for a given user (let's say for this case user number 1). I've come up with:
SELECT * FROM items i
WHERE i.updated_at > date_sub(curdate(), interval 10 day)
AND i.id NOT IN (
SELECT item_id FROM item_views v
WHERE i.created_at > date_sub(curdate(), interval 10 day)
AND i.user_id = 1
)
This seems to work fine (is it even the best way to do it?). However, I run into issues when considering an item that was viewed 8 days ago and updated 3 days ago. Such an item is clearly new, but it wouldn't show up under this query. How should I approach adding this restriction?

If you have the updated_at column in your view you could always add a line to check to make sure the updated date is not within your timeline. Better yet you could check to make sure your updated date is greater than your last viewed date.

This is not optimal solution. But fast anwser replace the AND with OR should work.

Related

Skip the day the email was opened and count 7 days after open in big query SQL

I am trying to block off a window within my script that will attribute a sale to a 7-day window. The issue that I am having is that I want the seven-day window to not include the open date so open date = 0 and the sales window begins on day 1.
Here is the current way that I am creating that window -
and oh.Order_Date >= first_open_date.first_open
and oh.Order_Date <= first_open_date.first_open + 7
If you can provide some example data I can help with a more accurate answer, but for now I hope the below will share some ideas.
Please consider the below approach, where I'm assuming your 'opens' refer to tracking whether a user has opened a marketing campaign.
select orders.*,campaigns.campaign_name
from orders_table as orders
left join
(
select distinct timestamp as open_date,campaign_name from campaign_data
) as campaigns
on orders.user_id = campaigns.user_id and campaigns.open_date < orders.order_date and campaigns.open_date >= date_sub(orders.order_date,interval 7 day)
This example is based on something similar to what I've created for work in the past, which looks at each order date in the order table and then what campaigns were opened before that date.
You may also want to consider using a window statement like row_number or dense_rank with this if you wish to pull only the first or last campaign that was opened to answer questions like "What was the last google ad a user interacted with before placing an order".
Hope this helps,
Tom

monitor the time taken for each entry in a sql table and notify using email if the time taken is more than 5 minutes

I have a table which contains the products details, If it is a new product the status will be 1.
Once it got purchased, the status will change to 2.
My requirement is to send mail to the owner if the product remains in status 1 for more than 5 minutes.
Help me out to proceed further, what are all the ways to do so.
Maybe you can add a field like "LastStatusChangedOn", which is a DateTime (or a DateTimeOffset if you need to keep account with different time zones).
And then just select all Products where the difference between the current time and the LastStatusChangedOn is greater than 5 minutes.
Without the exact database structure, it's impossible to give a complete sample, but something like this?
SELECT * FROM Products WHERE DateDiff(minute, LastStatusChangeOn, getdate()) > 5

Postgres aggregate function by date range

I have two tables. The first is named page with the schema id and name. The second table is page_counts. It has the schema id, page_id (foreign key to page table), views, and date. Basically, I'm tracking how many views some pages get every single day. Views for each day are cumulative, so it will always be equal to or greater than views for the previous day.
I want to be able to track how many views a page gets by week. This comes down to taking the most recent day's views and subtracting from that the total number of views from a week before that day. I want to be able to do this over multiple weeks as well, so finding out the total number of views for the past week, total number starting from last week and going back one more week etc.
I looked into the postgres date functions, but not much is making sense. Thanks for the help
It is hard to do it without data to test but it should be something like this
select page_id, week,
views - lag(views, 1, 0) over (partition by page_id order by week) as views
from (
select page_id, date_trunc('week', "date") as week, max(views) as views
from page_counts pc
group by 1, 2
) s
order by 1, 2 desc
The subquery just groups by page and week getting the corresponding maximum number of views.
Then the lag window function get the number of views from the previous week for that page.

Selecting records from the past three months

I have 2 tables from which i need to run a query to display number of views a user had in the last 3 months from now.
So far I have come up with: all the field types are correct.
SELECT dbo_LU_USER.USERNAME
, Count(*) AS No_of_Sessions
FROM dbo_SDB_SESSION
INNER JOIN dbo_LU_USER
ON dbo_SDB_SESSION.FK_USERID = dbo_LU_USER.PK_USERID
WHERE (((DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=0
Or (DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=1
Or (DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=2))
GROUP BY dbo_LU_USER.USERNAME;
Basically, the code above display a list of all records within the past 3 months; however, it starts from the 1st day of the month and ends on the current date, but I need it to start 3 months prior to today's date.
Also to let you know this is SQL View in MS Access 2007 code.
Thanks in advance
Depending on how "strictly" you define your 3 months rule, you could make things a lot easier and probably efficient, by trying this:
SELECT dbo_LU_USER.USERNAME, Count(*) AS No_of_Sessions
FROM dbo_SDB_SESSION
INNER JOIN dbo_LU_USER
ON dbo_SDB_SESSION.FK_USERID = dbo_LU_USER.PK_USERID
WHERE [dbo_SDB_SESSION].[SESSIONSTART] between now() and DateAdd("d",-90,now())
GROUP BY dbo_LU_USER.USERNAME;
(Please understand that my MS SQL is a bit rusty, and can't test this at the moment: the idea is to make the query scan all record whose date is between "TODAY" and "TODAY-90 days").

What is the best way to calculate page hits per day in MySQL

On my blog, I display in the right nav the 10 most popular articles in terms of page hits. Here's how I get that:
SELECT *
FROM entries
WHERE is_published = 1
ORDER BY hits DESC, created DESC
LIMIT 10
What I would like to do is show the top 10 in terms of page hits per day. I'm using MySQL. Is there a way I can do this in the database?
BTW, The created field is a datetime.
UPDATE: I think I haven't made myself clear. What I want is for the blog post with 10,000 hits that was posted 1,000 days ago to have the same popularity as the blog post with 10 hits that was posted 1 day ago. In pseudo-code:
ORDER BY hits / days since posting
...where hits is just an int that is incremented each time the blog post is viewed.
OK, here's what I'm going to use:
SELECT *, AVG(
hits / DATEDIFF(NOW(), created)
) AS avg_hits
FROM entries
WHERE is_published = 1
GROUP BY id
ORDER BY avg_hits DESC, hits DESC, created DESC
LIMIT 10
Thanks, Stephen! (I love this site...)
I'm not entirely sure you can by using the table structure you suggest in your query. The only way I can think of is to get the top 10 by way of highest average hits per day. By doing that, your query becomes:
SELECT *, AVG(hits / DATEDIFF(NOW(), created)) as avg_hits
FROM entries
WHERE is_published = 1
GROUP BY id
ORDER BY avg_hits DESC
LIMIT 10
This query assumes your created field is of a DATETIME (or similar) data type.
I guess you could have a hits_day_count column, which is incremented on each view, and a hits_day_current.
On each page-view, you check if the hits_day_current column is today. If not, reset the hit count.. Then you increment the hits_day_count column, and set hits_day_current to the current datetime.
Pseudo-code:
if article_data['hits_day_current'] == datetime.now():
article_data['hits_day_count'] ++
else:
article_data['hits_day'] = 0
article_data['hits_day_current'] = datetime.now()
The obvious problem with this is simple - timezones. The totals get reset at 00:00 wherever the server is located, which may not be useful.
A better solution would be a rolling-24-hour total.. Not quite sure how to do this neatly. The easiest (although not so elegant) way would be to parse your web-server logs periodically. Get the last 24 hours of logs, count the number of requests to each article, and put those numbers in the database.