The nearest row in the other table - sql

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]

Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Related

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

Eliminate NULL records in distinct select statement

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002
Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.
Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.
It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.
This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid
try this
select distinct empid,date from employee where date is not null

Troubleshooting SQL Query

I have a Patient activity table that records every activity of the patient right from the time the patient got admitted to the hospital till the patient got discharged. Here is the table command
Create table activity
( activityid int PRIMARY KEY NOT NULL,
calendarid int
admissionID int,
activitydescription varchar(100),
admitTime datetime,
dischargetime datetime,
foreign key (admissionID) references admission(admissionID)
)
The data looks like this:
activityID calendarid admissionID activitydescription admitTime dischargeTime
1 100 10 Patient Admitted 1/1/2013 10:15 -1
2 100 10 Activity 1 -1 -1
3 100 10 Activity 2 -1 -1
4 100 10 Patient Discharged -1 1/4/2013 13:15
For every calendarID defined, the set of admissionid repeats. For a given calendarid, the admissionsid(s) are unique. For my analysis, I want to write a query to display admissionid, calendarid, admitTime and dischargetime.
select admissionId, calendarid, admitTime=
(select distinct admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid),
dischargeTime=
(select distinct dischargeTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid)
from activity a
where calendarid=100
When I individually assign numbers, it works, otherwise it comes up with this message:
Subquery returned more than 1 value.
What am I doing wrong?
DISTINCT does not return 1 row, it returns all distinct rows given the columns you provided in the select clause. That's why you're getting more than one value back from the subquery.
What are you looking for out of the sub-query? If you use TOP 1 instead of DISTINCT, that should work, but it might not be what you're looking for.
Your error message tells a lot. Obviously, one (or both) of your projection subqueries in the (the SELECT DISTINCT queries) return more than one value. Thus, the columns admitTime, resp. dischargeTime cannot be compared to the result.
One possibility would be to limit your subqueries to 1 row. However, this error might also indicate a structural problem in your DB design.
Try:
select top 1 admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid
or
select admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid
limit 1
This should get you what you want, with a bit less of a performance hit than subqueries:
select a1.admissionId
,a1.calendarid
,a2.admitTime
,a3.dischargeTime
from activity a1
left join activity a2
on a1.calendarid = a2.calendarid
and a2.admitTime <> -1
left join activity a3
on a1.calendarid = a3.calendarid
and a3.dischargeTime <> -1
where a1.calendarid=100
Try this !
select admissionId, calendarid, admitTime=
(select top(1) admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid),
dischargeTime=
(select top(1) dischargeTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid)
from activity a
where calendarid=100

How to loop through a table and look for adjacent rows with identical values in one field and update another column conditionally in SQL?

I have a table that has a field called ‘group_quartile’ which uses the sql ntile() function to calculate which quartile does each customer lie in on the basis of their activity scores. However using this ntile(0 function i find there are some customers which have same activity scores but are in different quartiles. I need to modify the ‘group-quartile’ column to make all customers with the same activity scores lie in the same group_quartile.
A view of the table values :
Customer_id Product Activity_Score Group_Quartile
CH002 T 2328 1
CR001 T 268 1
CN001 T 178 1
MS006 T 45 2
ST001 T 21 2
CH001 T 0 2
CX001 T 0 3
KH001 T 0 3
MH002 T 0 4
SJ003 T 0 4
CN001 S 439 1
AC002 S 177 1
SC001 S 91 2
PV001 S 69 3
TS001 S 0 4
I used CTE expression but it didnot work.
My query only updates(from the above example) :
CX001 T 0 3
modified to
CX001 T 0 2
So only the first repeating activity score is checked and that row’s group_quartile is updated to 2.
I need to update all the below rows as well.
CX001 T 0 3
KH001 T 0 3
MH002 T 0 4
SJ003 T 0 4
I cannot use DENSE_RANK() instead of quartile to segregate the records as arranging the customers per product in approximately 4 quartiels is a business requirement.
From my understanding I need to loop through the table -
Find a row which has same activity score and the same product as its predecessor but has a different group_quartile
Update the selected row's group_quartile to its predecessor's quartile value
Then againg loop through the updated table to look for any row with the above condition , and update that row similarly.
The loop continues until all rows with same activity scores (for the same product) are put in the same group_quartile.
--
THIS IS THE TABLE STRUCTURE I AM WORKING ON:
CREATE TABLE #custs
(
customer_id NVARCHAR(50),
PRODUCT NVARCHAR(50),
ACTIVITYSCORE INT,
GROUP_QUARTILE INT,
RANKED int,
rownum int
)
INSERT INTO #custs
-- adding a column to give row numbers(unique id) for each row
SELECT customer_id, PRODUCT, ACTIVITYSCORE,GROUP_QUARTILE,RANKED,
Row_Number() OVER(partition by product ORDER BY activityscore desc) N
FROM
-- rows derived form a parent table based on 'segmentation' column value
(SELECT customer_id, PRODUCT, ACTIVITYSCORE,
DENSE_RANK() OVER (PARTITION BY PRODUCT ORDER BY ACTIVITYSCORE DESC) AS RANKED,
NTILE(4) OVER(PARTITION BY PRODUCT ORDER BY ACTIVITYSCORE DESC) AS GROUP_QUARTILE
FROM #parent_score_table WHERE (SEGMENTATION = 'Large')
) as temp
ORDER BY PRODUCT
The method I used to achieve this partially is as follows :
-- The query find the rows which have activity score same as its previous row but has a different GRoup_Quartiel value.
-- I need to use a query to update this row.
-- Next, find any rows in this newly updated table that has activity score same as its previous row but a differnet group_quartile vale.
-- Continue to update the tabel in the above manner until all rows with same activity scores have been updated to have the same quartile value
I managed to find only the rows which have activity score same as its previous row but has a different Group_Quartill value but cannot loop thorugh to find new rows that may match this updated row.
select t1.customer_id,t1.ACTIVITYSCORE,t1.PRODUCT, t1.RANKED, t1.GROUP_QUARTILE, t2.GROUP_QUARTILE as modified_quartile
from #custs t1, #custs t2
where (
t1.rownum = t2.rownum + 1
and t1.ACTIVITYSCORE = t2.ACTIVITYSCORE
and t1.PRODUCT = t2.PRODUCT
and not(t1.GROUP_QUARTILE = t2.GROUP_QUARTILE))
Can anyone help with what should be the t-sql statement for the above?
Cheers!
Assuming you've already worked out a basis Group_Quartile as indicated above, you can update the table with a query similar to the following:
update a
set Group_Quartile = coalesce(topq.Group_Quartile, a.Group_Quartile)
from activityScores a
outer apply
(
select top 1 Group_Quartile
from activityScores topq
where a.Product = topq.Product
and a.Activity_Score = topq.Activity_Score
order by Group_Quartile
) topq
SQL Fiddle with demo.
Edit after comment:
I think you did a lot of the work already by getting the Group_Quartile working.
For each row in the table, the statement above will join another row to it using the outer apply statement. Only one row will be joined back to the original table due to the top 1 clause.
So each for each row, we are returning one more row. The extra row will be matched on Product and Activity_Score, and will be the row with the lowest Group_Quartile (order by Group_Quartile). Finally, we update the original row with this lowest Group_Quartile value so each row with the same Product and Activity_Score will now have the same, lowest possible Group_Quartile.
So SJ003, MH002, etc will all be matched to CH001 and be updated with the Group_Quartile value of CH001, i.e. 2.
It's hard to explain code! Another thing that might help is looking at the join without the update statement:
select a.*
, TopCustomer_id = topq.Customer_Id
, NewGroup_Quartile = topq.Group_Quartile
from activityScores a
outer apply
(
select top 1 *
from activityScores topq
where a.Product = topq.Product
and a.Activity_Score = topq.Activity_Score
order by Group_Quartile
) topq
SQL Fiddle without update.