SQL joining on 2 separate date columns - sql

I am having trouble wrapping my head around a problem and the more I think about it the further away I get.
I have two tables that log user visits into two separate applications. For simplicities sake, let's say there are only 2 columns, timestamp and userid (key). Each row counts as a visit, so technically 3 columns since there is one derived for visits.
I am trying to create a table that in each row records the userid, the day (to_date format), total visits to app 1 and total visits to app 2 for that user on that day.
The issue is, when I join the tables together on userid and date, I get missing data. For example, if a user logged into application A on day X, but did not log into application B on day X, then joining on userid and day causes this record to be omitted since the date only exists in Application A's table.
How can I set this up where the final table would have a singular date column, userid, visits to app A and visits to app B, regardless if the user visited both applications on said day?
Hope this made sense, happy to elaborate if needed. Below is sort of what my SQL looks like as of now. Any thoughts appreciated!
with app_a_visits as (
select to_date(timestamp) as date, userid, count(*) as visits
from app_a),
app_b_visits as (
select to_date(timestamp) as date, userid, count(*) as visits
from app_b)
select a.date, a.userid, a.visits as app_a_visits, b.visits as app_b_visits
from app_a_visits a
full outer join app_b_visits b on a.userid = b.user_id and a.date = b.date;

Use FULL OUTER JOIN and NVL/COALESCE
with app_a_visits(date, userid,visits) as (
select * from values
('2022-01-01'::date, 1, 100),
('2022-01-03'::date, 1, 100),
('2022-01-05'::date, 1, 100)
), app_b_visits(date, userid,visits) as (
select * from values
('2022-01-02'::date, 1, 200),
('2022-01-03'::date, 1, 200),
('2022-01-04'::date, 1, 200)
)
select
NVL(a.date, b.date) as date,
NVL(a.userid, b.userid) as userid,
a.visits as app_a_visits,
b.visits as app_b_visits
from app_a_visits a
full outer join app_b_visits b
on a.userid = b.userid and a.date = b.date
ORDER BY 1,2;
DATE
USERID
APP_A_VISITS
APP_B_VISITS
2022-01-01
1
100
null
2022-01-02
1
null
200
2022-01-03
1
100
200
2022-01-04
1
null
200
2022-01-05
1
100
null

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Vertica: Return zero if no records found

I am running this query to get the average of logins per user for the last 3 months. If the user has logged-in in the last 3 months, get its average, if not return 0.
I have tried a number of different ways but seems like if the user has not logged in during the last 3 months, there are no records and the count() does not return 0. It simply returns nothing.
1) select case count(*)
WHEN 0
THEN 0
ELSE count(creationTS) / 3
END as average
from table_name where creationTS >= add_months(now(), -3)
and userId = '110'
group by userId;
2) select COALESCE(count(creationTS)/3,0) as average
from table_name where creationTS >= add_months(now(), -3)
and userId = '110'
group by userId;
It gives correct result if a record is found for the condition 'creationTS >= add_months(now(), -3)' but no record exists, it returns nothing. How can I return 0 in that case.
Try it like this:
a) get all distinct userid-s from the base table in a full-select.
b) left join that full-select back with the base table, on equality of the user id and login date not earlier than 3 months ago
c) count the found user id-s in the base table, getting NULL by default if the join fails, and use NVL() to force a 0 in case of NULL, and group by user id
WITH
-- sample input data,not part of real query
indata(userid,login_dt) AS (
SELECT 'arthur', DATE '2021-09-15'
UNION ALL SELECT 'arthur', DATE '2021-08-27'
UNION ALL SELECT 'arthur', DATE '2021-08-01'
UNION ALL SELECT 'trillian', DATE '2021-09-27'
UNION ALL SELECT 'trillian', DATE '2021-08-15'
UNION ALL SELECT 'trillian', DATE '2021-06-27'
UNION ALL SELECT 'ford', DATE '2021-02-27'
UNION ALL SELECT 'ford', DATE '2021-04-27'
)
,
userids AS (
SELECT DISTINCT
userid
FROM indata
)
SELECT
userids.userid
, NVL(COUNT(indata.userid),0) AS login_count
FROM userids
LEFT JOIN indata
ON userids.userid=indata.userid
AND login_dt >= ADD_MONTHS(CURRENT_DATE,-3)
GROUP BY
userids.userid
;
userid | login_count
----------+-------------
arthur | 3
ford | 0
trillian | 2

Finding a min() date for one column and then using this to join with other tables that have a date LESS than this date

In short, I have two tables:
(1) pharmacy_claims (columns: user_id, date_service, claim_id, record_id, prescription)
(2) medical_claims (columns: user_id, date_service, provider, npi, cost)
I want to find user_id's in (1) that have a certain prescription value, find their earliest date_service (e.g. min(date_service)) and then use these user_id's with their earliest date of service as a cohort to pull all of their associated data from (2). Basically I want to find all of their medical_claims data PRIOR to the first time they were prescribed a given prescription in pharmacy_claims.
pharmacy_claims looks something like this:
user_id | prescription | date_service
1 a 2018-05-01
1 a 2018-02-11
1 a 2019-10-11
1 b 2018-07-12
2 a 2019-01-02
2 a 2019-03-10
2 c 2018-04-11
3 c 2019-05-26
So for instance, if I was interested in prescription = 'a', I would only want user_id 1 and 2 returned, with dates 2018-02-11 and 2019-01-02, respectively. Then I would want to pull user_id 1 and 2 from the medical_claims, and get all of their data PRIOR to these respective dates.
The way I tried to go about this was to build out a temp table in the pharmacy_claims table to query the user_id's that have a given medication, and then left join this back to the table to create a cohort of user_id's with a date_service
Here's what I did:
(1) Pulled all of the relevant data from the main pharmacy claims table:
CREATE TABLE user.temp_pharmacy_claims AS
SELECT user_id, claim_id, record_id, date_service
FROM dw.pharmacyclaims
WHERE date_service between '2018-01-01' and '2019-08-31'
This results in ~50,000 user_id's
(2) Created a table with just the user_id's a min(date_service):
CREATE TABLE user.temp_pharmacy_claims_index AS
SELECT distinct user_id, min(date_service) AS Min_Date
FROM user.temp_pharmacy_claims
GROUP BY 1
(3) Created a final table (to get the desired cohort):
CREATE TABLE user.temp_pharmacy_claims_final_index AS
SELECT a.userid
FROM user.temp_pharmacy_claims a
LEFT JOIN user.temp_pharmacy_claims_index b
ON a.user = b.user
WHERE a.date_service < Min_Date
However, this gets me 0 results when there should be a few thousand. Is this set up correctly? It's probably not the most efficient approach, but it looks sound to me, so not sure what's going on.
I think you just want a correlated subquery:
select mc.*
from medical_claims mc
where mc.date_service < (select min(pc.date)
from pharmacy_claims pc
where pc.user_id = mc.user_id and
pc.prescription = ?
);

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

Select rows in one table, adding column where MAX(Date) of rows in other, related table

I have a table containing a set of tasks to perform:
Task
ID Name
1 Washing Up
2 Hoovering
3 Dusting
The user can add one or more Notes to a Note table. Each note is associated with a task:
Note
ID ID_Task Completed(%) Date
11 1 25 05/07/2013 14:00
12 1 50 05/07/2013 14:30
13 1 75 05/07/2013 15:00
14 3 20 05/07/2013 16:00
15 3 60 05/07/2013 17:30
I want a query that will select the Task ID, Name and it's % complete, which should be zero if there aren't any notes for it. The query should return:
ID Name Completed (%)
1 Washing Up 75
2 Hoovering 0
3 Dusting 60
I've really been struggling with the query for this, which I've read is a "greatest n per group" type problem, of which there are many examples on SO, none of which I can apply to my case (or at least fully understand). My intuition was to start by finding the MAX(Date) for each task in the note table:
SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task
Annoyingly, I can't just add "Complete %" to the above query unless it's contained in a GROUP clause. Argh! I'm not sure how to jump through this hoop in order to somehow get the task table rows with the column appended to it. Here is my pathetic attempt, which fails as it only returns tasks with notes and then duplicates task records at that (one for each note, so it's a complete fail).
SELECT Task.ID,
Task.Name,
Note.Complete
FROM
Task
JOIN
(SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task) AS InnerNote
ON
Task.ID = InnerNote.ID_Task
JOIN
Note
ON
Task.ID = Note.ID_Task
Can anyone help me please?
If we assume that tasks only become more complete, you can do this with a left outer join and aggregation:
select t.ID, t.Name, coalesce(max(n.complete), 0)
from tasks t left outer join
notes n
on t.id = n.id_task
group by t.id, t.name
If tasks can become "less complete" then you want the one with the last date. For this, you can use row_number():
select t.ID, t.Name, coalesce(n.complete, 0)
from tasks t left outer join
(select n.*, row_number() over (partition by id_task order by date desc) as seqnum
from notes n
) n
on t.id = n.id_task and n.seqnum = 1;
In this case, you don't need a group by, because the seqnum = 1 performs the same role.
How about this just get the max of completed and group by taskid
SELECT t.ID_Task as ID,n.`name`,MAX(t.completed) AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
OR
SELECT t.ID_Task as ID,n.`name`,
(CASE when MAX(t.completed) IS NULL THEN '0' ELSE MAX(t.completed))AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
select a.ID,
a.Name,
isnull((select completed
from Note
where ID_Task = b.ID_Task
and Date = b.date),0)
from Task a
LEFT OUTER JOIN (select ID_Task,
max(date) date
from Note
group by ID_Task) b
ON a.ID = b.ID_Task;
See DEMO here