Merging two rows of data and getting distinct count

Merging two rows of data and getting distinct count - sql

This question is in continuation to Merging every two rows of data in a column in SQL Server
My eventtable structure..
Id UserId EventId EventDateTime
1 1 A 18-06-2013 10:36
2 1 B 18-06-2013 10:40
3 1 C 18-06-2013 10:46
4 1 D 18-06-2013 10:50
5 1 A 18-06-2013 13:36
From the earlier question I got data in the following format..
UserId EventStart EventEnd
1 A B
1 B C
Now I would like to get the count of Unique 'EventStart' and 'EventEnd' and filter them by UserId and Date/s
The Report format is
EventStart EventEnd Count
A B 5
B C 3
I know that i could use the data from the previous Question query and store it in a table and try as suggested here
But it would be great if I could get the data straight from the 'eventtable' to the report format with the 'UserId' and 'Date' filters
Any help is sincerely appreciated..
Thanks..

If you just want to count the occurrences, use group by in essentially the same query:
select eventstart, eventend, count(*)
from (select userId, eventid as eventstart,
(select top 1 t2.eventid
from mytable t2
where t2.userid = t.userid and
t2.id > t.id
order by t2.id
) as eventend
from mytable t
) t
where eventend is not null
group by eventstart, eventend;

Related

How to retrieve historical data based on condition on one row?

I have a table historical_data
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
2
2011-01-12
q
q1
2
2011-02-01
d
d1
3
2011-11-01
s
s1
I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.
date>='2011-11-01' should get me
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
3
2011-11-01
s
s1
I am aware you can get this by using a CTE or a subquery like
with selected_id as (
select id from historical_data where date>='2011-11-01'
)
select hd.* from historical_data hd
inner join selected_id si on hd.id = si.id
or
select * from historical_data
where id in (select id from historical_data where date>='2011-11-01')
In both these methods I have to query/scan the table ``historical_data``` twice.
I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.
The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.
Is there any way to achieve this by only querying the table once? (I am using Snowflake)

Using QUALIFY:
SELECT *
FROM historical_data
QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;

How to fill missing dates by ID in a table in sql

I have table A which has Dates and EMPID eg below
date EMPID
8/06/19 1
8/07/19 1
8/08/19 1
8/09/19 1
8/07/19 2
8/09/19 2
8/12/19 2
I also have Table B which has a date range
date
...
8/05/19
8/06/19
8/07/19
8/08/19
8/09/19
8/10/19
8/11/19
8/12/19
8/13/19
...
My table A has missing dates and EMPID.
How can I merge the two tables to have the following table.
Date EMPID
8/05/19 1
8/06/19 1
8/07/19 1
8/08/19 1
8/09/19 1
8/10/19 1
8/11/19 1
8/12/19 1
8/13/19 1
8/05/19 2
8/06/19 2
8/07/19 2
8/08/19 2
8/09/19 2
8/10/19 2
8/11/19 2
8/12/19 2
8/13/19 2
Thanks in advance.
This is being used in a dataset(SQL) in SSRS.
P.S. I'm new to coding in SQL environment, My background is in ABAP

You can cross join the distinct empid coming from a with dates coming from b, as follows:
select b.date, a.empid
from (select distinct empid from a) a
cross join b
Or if you are looking to insert "missing" dates in a, then you can use the insert ... select syntax with a not exists condition:
insert into a (date, empid)
select b.date, a.empid
from (select distinct empid from a) a
cross join b
where not exists (select 1 from a a1 where a1.empid = a.empid and a1.date = b.date)

How do I join two tables in SQL limiting the second table data depending on the first one?

I have two tables looking something like this:
IMP DATE CAT
A 03/03/2016 1
B 04/04/2016 1
C 09/09/2016 2
D 01/01/2017 1
E 02/02/2017 1
F 03/03/2017 2
G 04/04/2017 2
===================
EXP DATE CAT
H 01/01/2016 1
I 05/05/2016 1
J 07/07/2016 2
K 11/11/2016 2
L 01/01/2017 1
M 03/03/2017 1
N 04/04/2017 2
O 05/05/2017 2
I want to join the first table to the second one but limit the lines joined from the second table by the latest date on the first table (per category).
The result I'm looking for would be every row in both tables except Item "M" (because Cat 1 in Table 1 has a latest date of February) and Item "O" (because Cat 2 in Table 1 has a latest date of April).
I've tried conditionalds within a where clause in the 2nd table but haven't got far.
Is there a simple way to do this? Any help is appreciated. I'm using SQL Server 2008 by the way.

Your description of the problem is specifically about using join. That suggests a query like this:
select . . .
from (select t1.*, max(t1.date) over (partition by t1.cat) as maxdate
from table1 t1
) t1 join
table2 t2
on t1.cat = t2.cat and t2.date <= t1.maxdate;

Desire output and its format is still not clear.
are you looking for this ?
;With CTE as
(
select *, row_number()over(partition by cat order by DATEs desc) rn
from #table1
)
--select * from cte
--where rn=1
select * from cte t1
left join #table2 t2
on t1.CAT=t2.CAT and t2.DATEs<=t1.DATEs
where rn=1

Join table by id and nearest date for every date

I have 2 tables:
TABLE 1
id date_measured value 1
1 01/01/2017 5
1 02/20/2017 6
1 04/01/2017 5
2 03/02/2017 5
2 04/02/2017 3
TABLE 2
id date_measured value 2
1 01/06/2017 5
1 03/01/2017 6
2 02/01/2017 5
2 03/09/2017 7
2 04/05/2017 4
I want to join it such that each id matches and the closest date matches so:
id date_measured1 value 1 date_measured2 value 2
1 01/01/2017 5 01/06/2017 5
1 02/20/2017 6 03/01/2017 6
2 02/01/2017 5 02/01/2017 5
2 03/02/2017 5 03/09/2017 7
2 04/02/2017 3 04/05/2017 4
etc. IE for each id for each date measured take the closest measured date in the other table and make it a row. Something closeish to
SELECT *
FROM table1 a
INNER JOIN table2 b
ON a.id = b.id
AND <date from a is closest date from b>
But I have no idea how to do the second part. Any suggestions?

In standard SQL, you can get the date using a correlated subquery:
select t1.*,
(select t2.date_measured
from table2 t2
where t2.id = t1.id
order by abs(t2.date_measured - t1.date_measured) asc
fetch first 1 row only
) as t2_date_measured
from table1 t1;
You can then join back to table2 to get additional information from that row.
The above is generic SQL (not necessarily standard SQL). Date/time functions tend to be peculiar to each database; so - may not work for the difference. Not all databases support fetch first 1 row only, but almost all support some mechanism for doing the same thing.

If you have window functions use ROW_NUMBER():
SQL DEMO I use postgresql so date function may vary on your rdbms
WITH cte as (
SELECT *,
t1.id as t1_id,
t1.date_measured as t1_date,
t1.value1,
t2.id as t2_id,
t2.date_measured as t2_date,
t2.value2,
date_part('day', age(t1.date_measured, t2.date_measured)) as days,
ROW_NUMBER() OVER (PARTITION BY t1.id, t1.date_measured
ORDER BY abs(date_part('day', age(t1.date_measured, t2.date_measured)))
) as rn
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.id
)
SELECT *
FROM cte
WHERE rn = 1
ORDER BY t1_id, t1_date

MaxMin Function within Select Statement SQL 2012

I am having a few issues making a MAX function work within the select statement See example data below:
Table 1 Table 2
Visit_ID Car_ID Move_ID Visit_ID MoveStartDate MoveEndDate
A 1 1 A 25/07/2016 27/07/2016
B 2 2 A 28/07/2016 28/07/2016
C 1 3 B 19/07/2016 22/07/2016
D 3 4 D 28/06/2016 30/06/2016
I would like my select statement to pick the min start time and Max start time based on the Visit_ID so I would be expecting:
Result
Visit_ID Car_ID StartDate EndDate
A 1 25/07/2016 28/07/2016
B 2 19/07/2016 22/07/2016
So far I have tried I already have Inner Joins in my select statement:
,(MAX (EndDate) WHERE Visit.Visit_ID = Move.Visit_ID) AS End Date
I have looked at some other queries with a second select statement within the select so you end up with something like:
Select Visit_ID, Car_ID ,(Select MAX(EndDate) FULL OUTER JOIN Table 2 ON Table 1.Visit_ID = Table 2.Visit_ID Group By Table 1.Visit_ID) AS End Date
Hope I have provided enough info currently stumped.

If you also want Car_ID = 3 in the result:
select t1.Visit_ID, t1.Car_ID, MIN(MoveStartDate), MAX(MoveEndDate)
from table1 t1
join table2 t2 on t1.Visit_ID = t2.Visit_ID
group by t1.Visit_ID, t1.Car_ID
Returns:
SQL>select t1.Visit_ID, t1.Car_ID, MIN(MoveStartDate), MAX(MoveEndDate)
SQL&from table1 t1
SQL& join table2 t2 on t1.Visit_ID = t2.Visit_ID
SQL&group by t1.Visit_ID, t1.Car_ID;
visit_id car_id
======== =========== ==================== ====================
A 1 25/07/2016 28/07/2016
B 2 19/07/2016 22/07/2016
D 3 28/06/2016 30/06/2016
3 rows found

I did not check it but your can try this
WITH cte
AS
(select Move_ID,Visit_ID,min(MoveStartDate) AS mMS,MAX(MoveEndDate) AS mME
FROM Table_2
GROUP BY Move_ID,Visit_ID)
SELECT c.Move_ID,c.Visit_ID,T1.Car_ID,c.mMS,c.mME
FROM Table_1 as T1 JOIN cte as C
ON c.Visit_ID=T1.Visit_ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merging two rows of data and getting distinct count - sql

Related

How to retrieve historical data based on condition on one row?

How to fill missing dates by ID in a table in sql

How do I join two tables in SQL limiting the second table data depending on the first one?

Join table by id and nearest date for every date

MaxMin Function within Select Statement SQL 2012

Categories

Resources