Getting single record in subquery - sql

I have a DB2 database with multiple user records, which are related to a newsletter record through a couple intermediate tables.
My problem is that I need to get just the users where the latest newsletter they received was in the last week. I've been banging my head against this for hours and still haven't found a way to cleanly get the records I need. I thought this would be the solution, but I keep running into very generic errors that I don't really understand the cause of.
SELECT a.*, d.tech_id as newsletter FROM users a
JOIN user_profile b ON b.tech_id = a.profile_id
JOIN user_contact c ON c.user_id = b.tech_id
JOIN (
SELECT newsletters.tech_id, ROW_NUMBER() OVER (ORDER BY timestamp(tech_id) DESC) AS RN
FROM NEWSLETTERS
) d ON d.tech_id = c.newsletter_id
WHERE (timestamp(d.rn) < current_timestamp - 7 days)
Is there a better way to do this, or am I missing an obvious problem?
EDIT:
This is what I'd like to be doing, though it doesn't work right either:
SELECT a.* as newsletter FROM users a
WHERE (
SELECT MAX(timestamp(newsletters.tech_id))
FROM newsletters
WHERE newsletters.tech_id IN(
SELECT newsletter_id FROM user_contact WHERE user_contact.profile_id = a.tech_id
)
) < current_timestamp - 7 days
The structure is pretty straightforward. The users table has a foreign key of profile_id which is keyed to the user_profile.tech_id. The user_contact.user_id field is keyed to the user_profile.tech_id. And the user_contact table has a foreign key called user_contact.newsletter_id that is keyed to the newsletters.tech_id

RN is a row number generated based on the latest timestamp of the column. It should not be compared with dates in the final where clause. However, in your row_number function, you have ordered by timestamp(tech_id) which won't work as expected if tech_id is not a datetime datatype.
As per your requirement, row_number isn't needed.
Try the query below.
SELECT a.*, d.tech_id as newsletter
FROM users a
JOIN user_profile b ON b.tech_id = a.profile_id
JOIN user_contact c ON c.user_id = b.tech_id
JOIN NEWSLETTERS d ON d.tech_id = c.newsletter_id
WHERE (timestamp(d.datecolumn) < current_timestamp - 7 days)
--change the d.datecolumn to the datetime column in the table

It seems that you need to use GROUP BY. Try something like this:
SELECT a.*
FROM users a
JOIN (
SELECT d.tech_id as newsletter, MAX(d.datecol)
FROM user_profile b
JOIN user_contact c ON c.user_id = b.tech_id
JOIN NEWSLETTERS d ON d.tech_id = c.newsletter_id
WHERE (timestamp(d.datecol) < current_timestamp - 7 days)
GROUP BY d.tech_id
) e ON e.tech_id = a.profile_id

Related

How to pull the count of occurences from 2 SQL tables

I am using python on a SQlite3 DB i created. I have the DB created and currently just using command line to try and get the sql statement correct.
I have 2 tables.
Table 1 - users
user_id, name, message_count
Table 2 - messages
id, date, message, user_id
When I setup table two, I added this statement in the creation of my messages table, but I have no clue what, if anything, it does:
FOREIGN KEY (user_id) REFERENCES users (user_id)
What I am trying to do is return a list containing the name and message count during 2020. I have used this statement to get the TOTAL number of posts in 2020, and it works:
SELECT COUNT(*) FROM messages WHERE substr(date,1,4)='2020';
But I am struggling with figuring out if I should Join the tables, or if there is a way to pull just the info I need. The statement I want would look something like this:
SELECT name, COUNT(*) FROM users JOIN messages ON messages.user_id = users.user_id WHERE substr(date,1,4)='2020';
One option uses a correlated subquery:
select u.*,
(
select count(*)
from messages m
where m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
) as cnt_messages
from users u
This query would take advantage of an index on messages(user_id, date).
You could also join and aggregate. If you want to allow users that have no messages, a left join is a appropriate:
select u.name, count(m.user_id) as cnt_messages
from users u
left join messages m
on m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
group by u.user_id, u.name
Note that it is more efficient to filter the date column against literal dates than applying a function on it (which precludes the use of an index).
You are missing a GROUP BY clause to group by user:
SELECT u.user_id, u.name, COUNT(*) AS counter
FROM users u JOIN messages m
ON m.user_id = u.user_id
WHERE substr(m.date,1,4)='2020'
GROUP BY u.user_id, u.name

Finding days when users haven't created any entries

I've 2 tables: users and time_entries, time entries has a foreign key to the users table. Users may create time entries with some time amount in it. I want to write a query which could return summarized amounts of time in arbitrary dates range grouped by user and date - it's easy but I need to include also days when nobody entered any time_entry. I've tried to create an additional table called calendar with dates and left join time_entries to it but I couldn't retrieve a list of users that haven't entered any time_entry. Here is my query:
SELECT te.date, SUM(te.amount), user_name
FROM calendar c
LEFT JOIN time_entries te on c.date = te.date
RIGHT JOIN asp_net_users anu on te.user_id = anu.id
GROUP BY user_name, te.date
If you just want the days no user made any entry. you can use NOT EXISTS and a correlated subquery.
SELECT c.date
FROM calendar c
WHERE NOT EXISTS (SELECT *
FROM time_entries te
WHERE te.date = c.date);
If you want all users along with the days they haven't made any entry cross join the users and the days and then also use a NOT EXISTS.
SELECT anu.user_name,
c.date
FROM asp_net_users anu
CROSS JOIN calendar c
WHERE NOT EXISTS (SELECT *
FROM time_entries te
WHERE te.user_id = anu.id
AND te.date = c.date);
Thanks to sticky bit examples I was able to write the following query which solves my problem:
SELECT c.date, a.id, COALESCE(sum(te.amount), 0)
FROM asp_net_users a
CROSS JOIN (SELECT *
FROM calendar
WHERE date BETWEEN '2019-10-01 00:00:00'::timestamp AND '2019-10-31 00:00:00'::timestamp) c
LEFT JOIN time_entries te on a.id = te.user_id AND c.date = te.date
WHERE a.department_guid = '95b7538d-3830-48d7-ba06-ad7c51a57191'
GROUP BY c.date, a.id
ORDER BY c.date

Joining a table that get 500k+ new rows and selecting the newest in a fast query

I have a users table and a calls table there are about 500k new rows inserted into the calls table every day. Each call record has a user_id column and i have about 700 users but what i need to do is select every user and get their most recent call created_at time. What is the fastest way of doing this?
select u.*, MAX(c.created_at)
from users u
left join calls c on u.user_id = c.user_id
group by u.user_id
The GROUP BY will only work if user_id is the primary key.
Alternative answer:
Use NOT EXISTS to pick only the latest call:
select *
from users u
left join calls c on u.user_id = c.user_id
where not exists (select 1 from calls c2
where c2.created_at> c.created_at
and c2.user_id = c.user_id)

SQL join: selecting last record that meets a condition from the original table

I am new to SQL, so excuse any lapse of notation. A much simplified version of my problem is as follows. I have hospital admissions in table ADMISSIONS and need to collect the most recent outpatient claim of a certain type from table CLAIMS prior to the admission date:
SELECT a.ID , a.date, b.claim_date
FROM admissions as a
LEFT JOIN claims b on (a.ID=b.ID) and (a.date>b.claim_date)
LEFT JOIN claims c on ((a.ID=c.ID) and (a.date>c.claim_date))
and (b.claim_date<c.claim_date or b.claim_date=c.claim_date and b.ID<c.ID)
WHERE c.ID is NULL
The problem is that for some IDs I get many records with duplicate a.date, c.claim_date values.
My problem is similar to one discussed here
SQL join: selecting the last records in a one-to-many relationship
and elaborated on here
SQL Left join: selecting the last records in a one-to-many relationship
However, there is the added wrinkle of looking only for records in CLAIMS that occur prior to a.date and I think that is causing the problem.
Update
Times are not stored, just dates, and since a patient can have multiple records on the same day, it's an issue. There is another wrinkle, which is that I only want to look at a subset of CLAIMS (let's say claims.flag=TRUE). Here's what I tried last:
SELECT a.ID , a.date, b.claim_date
FROM admissions as a
LEFT JOIN (
select d.ID , max(d.claim_date) cdate
from claims as d
where d.flag=TRUE
group by d.ID
) as b on (a.ID=b.ID) and (b.claim_date < a.date)
LEFT JOIN claims c on ((a.ID=c.ID) and (c.claim_date < a.claim_date))
and c.flag=TRUE
and (b.claim_date<c.claim_date or b.claim_date=c.claim_date and b.ID<c.ID)
WHERE c.ID is NULL
However, this ran for a couple of hours before aborting (typically takes about 30 mins with LIMIT 10).
You may want to try using a subquery to solve this problem:
SELECT a.ID, a.date, b.claim_date
FROM admissions as a
LEFT JOIN claims b ON (a.ID = b.ID)
WHERE b.claim_date = (
SELECT MAX(c.claim_date)
FROM claims c
WHERE c.id = a.id -- Assuming that c.id is a foreign key to a.id
AND c.claim_date < a.date -- Claim date is less than admission date
);
An attempt to clarify with different IDs, and using an additional subquery to account for duplicate dates:
SELECT a.ID, a.patient_id, a.date, b.claim_id, b.claim_date
FROM admissions as a
LEFT JOIN claims b ON (a.patient_ID = b.patient_ID)
WHERE b.claim_id = (
SELECT MAX(c.claim_id) -- Max claim identifier (likely most recent if sequential)
FROM claims c
WHERE c.patient_ID = a.patient_ID
AND c.flag = TRUE
AND c.claim_date = (
SELECT MAX(d.claim_date)
FROM claims d
WHERE d.patient_id = c.patient_id
AND c.claim_date < a.date -- Claim date is less than admission date
AND d.flag = TRUE
)
)
b.flag = TRUE;

Best way to construct this query? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Retrieving the last record in each group
I have two tables set up similar to this (simplified for the quest):
actions-
id - user_id - action - time
users -
id - name
I want to output the latest action for each user. I have no idea how to go about it.
I'm not great with SQL, but from what I've looked up, it should look something like the following. not sure though.
SELECT `users`.`name`, *
FROM users, actions
JOIN < not sure what to put here >
ORDER BY `actions`.`time` DESC
< only one per user_id >
Any help would be appreciated.
SELECT * FROM users JOIN actions ON actions.id=(SELECT id FROM actions WHERE user_id=users.id ORDER BY time DESC LIMIT 1);
you need to do a groupwise max - please refer to examples here http://jan.kneschke.de/projects/mysql/groupwise-max/
here's an example i did for somone else which is similar to your requirements:
http://pastie.org/925108
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
select u.name, a.action, a.time
from user u, action a
where u.id = a.user_id
and a.time in (select max(time) from action where user_id = u.user_id group by user_id )
note untested - but this should be the pattern
DECLARE #Table (ID Int, User_ID, Time DateTime)
-- This gets the latest entry for each user
INSERT INTO #Table (ID, User_ID, Time)
SELECT ID, User_ID, MAX(TIME)
FROM actions z
INNER JOIN users x on x.ID = z.ID
GROUP BY z. userID
-- Join to get resulting action
SELECT z.user_ID, z.Action
FROM actions z
INNER JOIN #Table x on x.ID = z.ID
This is the greatest-n-per-group problem that comes up frequently on Stack Overflow. Follow the tag for dozens of other posts on this problem.
Here's how to do it in MySQL given your schema with no subqueries and no GROUP BY:
SELECT u.*, a1.*
FROM users u JOIN actions a1 ON (u.id = a1.user_id)
LEFT OUTER JOIN actions a2 ON (u.id = a2.user_id AND a1.time < a2.time)
WHERE a2.id IS NULL;
In other words, show the user with her action such that if we search for another action with the same user and a later time, we find none.
It seems to me that the following will be works
WITH GetMaxTimePerUser (user_id, time) (
SELECT user_id, MAX(time)
FROM actions
GROUP BY user_id
)
SELECT u.name, a.action, amax.time
FROM actions AS a
INNER JOIN users AS u ON u.id=a.user_id
INNER JOIN GetMaxTimePerUser AS u_maxtime ON u_maxtime.user_id=u.id
WHERE a.time=u_maxtime.time
Usage of temporary named result set (common table expression or CTE) without subqueries and OUTER JOIN is the way best opened for query optimization. (CTE is something like a VIEW but existing only virtual or inline)