Selecting new distinct values over time (ORACLE SQL)

Selecting new distinct values over time (ORACLE SQL) - sql

I want to select new distinct values and track them over time.
I have a table where each row represents a score awarded to a particular person.
- timestamp (when the score was awarded)
- name (which person received the score)
- score (what score the person received)
I want the result to look like:
The above table should be interpreted as how many new distinct names appear in each day.
Because 6-NOV is the first day, all the names are new hence 3 new names.
On 7-NOV Michael is the only new name so the value is 1.
On 8-NOV we have 3 new names (Don, Alex, Tina)
And on 9-NOV 0 new names appear a Jimmy and Sara have both been score before.
Thanks for the help

Consider:
select t.timestamp, count(*)
from (select distinct timestamp from mytable) t
left join (select name, min(timestamp) timestamp from mytablegroup by name) n
on n.timestamp = t.timestamp
group by t.timestamp
This works by generating a list of distinct timestamps from the table, and then joining it with an aggregate query that comptes the first timestamp of each name. The final step is aggregation in the outer query.

Find the minimum timestamp for each name and then count how many names in each timestamp
select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp
order by timestamp
To include all days even without any names
select t.timestamp, nvl(new_names,0) as new_names from
(select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp) c
RIGHT OUTER JOIN (select distinct timestamp from mytable) t
ON c.timestamp = t.timestamp
order by t.timestamp
To include dates that don't appear in the table at all you need to have a list of dates from a calendar somewhere and then put that table instead of the subquery I have RIGHT OUTER JOINed to
You can do this
select t.timestamp, nvl(new_names,0) as new_names from
(select timestamp, count(*) as new_names from
(select name, min(timestamp) as timestamp from mytable
group by name)
group by timestamp) c
RIGHT OUTER JOIN (
SELECT TRUNC (SYSDATE - ROWNUM - 1) dt
FROM DUAL CONNECT BY ROWNUM < 366
) t
ON c.timestamp = t.timestamp
order by t.timestamp
But you'd have to adjust the -1 and 366 to be the date range you wanted and it's much more standard to use a calendar that already exists in your database

With MIN() window function:
select tt.firstdate, count(distinct tt.name) "new names"
from (
select t.*, min(timestamp) over (partition by name) firstdate
from tablename t
) tt
group by tt.firstdate
If you also want the dates where there are not any new names:
select t.timestamp, count(distinct tt.name) "new names"
from tablename t
left join (
select t.*, min(timestamp) over (partition by name) firstdate
from tablename t
) tt on tt.firstdate = t.timestamp
group by t.timestamp

Count only first appearances, use row_number() at first:
select timestamp, sum(frst) as new_names
from (
select timestamp,
case when row_number()
over (partition by name order by timestamp) = 1
then 1 else 0 end frst
from scores)
group by timestamp

Yet, another opetion through right joining among distinctly selected timestamps and the least values for each names. This way also non-matched rows returned with zero counts as new_names column :
SELECT NVL(t1.timestamp,t2.timestamp) AS timestamp,
SUM(NVL2(t1.timestamp,1,0)) AS new_names
FROM (SELECT name, MIN(timestamp) AS timestamp from t group by name) t1
RIGHT JOIN (SELECT DISTINCT timestamp FROM t) t2
ON t2.timestamp = t1.timestamp
GROUP BY NVL(t1.timestamp,t2.timestamp)
ORDER BY timestamp
Demo

Related

Alternative way of full outer join

I am running this query
select * from
(select name, count(distinct id) as ids, date
from table1
group by name, date ) as tt
full outer join
(select st_name as name,count(distinct id) as ids, date
from table2
group by st_name, date) as ts
on tt.name= ts.name
and tt.ids = ts.ids
It runs successfully but I want to ask if there is an alternative more efficient way to run this query.

I assume that you want to get days when the two numbers are not the same (it seems like the most reasonable thing you want from such a query). So, this addresses that question.
FULL OUTER JOIN should be fine. But an alternative is to try UNION ALL and aggregation:
select name, sum(ids_1), sum(ids_2), date
from ((select name, count(distinct id) as ids_1, NULL as ids_2, date
from table1
group by name, date
)
union all
(select st_name as name, NULL, count(distinct id) as ids_2, date
from table2
group by st_name, date
)
)
group by name, date
having sum(ids_1) = sum(ids_2)

Counting ID's for correct creation date time

I need to get the number of user ID's for each month, but they should only be counted for the month if the user's minimum month falls within that month.
So if customer A had a min(day) of 04/18 then for month and year, they would be counted.
My table looks like:
monthyear | id
02/18 A32
04/19 T39
05/19 T39
04/19 Y95
01/18 A32
12/19 I99
11/18 OPT
09/19 TT8
I was doing something like:
SELECT day, id
SUM(CASE WHEN month = min(day) THEN 1 ELSE 0)
FROM testtable
GROUP BY 1
But I'm not sure how to specify that for each user ID, so only user ID = 1, when their min(Day) = day
Goal table to be:
monthyear | count
01/18 1
02/18 0
11/18 1
04/19 2
05/19 0
09/19 1
12/19 1

Use window functions. Let me assume that your monthyear is really yearmonth, so it sorts correctly:
SELECT yearmonth, COUNT(*) as numstarts
FROM (SELECT tt.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY yearmonth) as seqnum
FROM testtable tt
) tt
WHERE seqnum = 1
GROUP BY yearmonth;
If you do have the absurd format of month-year, then you can use string manipulations. These depend on the database, but something like this:
SELECT yearmonth, COUNT(*) as numstarts
FROM (SELECT tt.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY RIGHT(monthyear, 2), LEFT(monthyear, 2) as seqnum
FROM testtable tt
) tt
WHERE seqnum = 1
GROUP BY yearmonth;

I assumed that you have a column that's a date (use of min() is necessary). You can do it by selecting a minimal date(subquery t2) for each id and then count only these rows that connect throught left join, so if there is no connection you will get zeros for these dates or monthyear as you have in your data.
select
monthyear
,count(t2.id) as cnt
from testtable t1
left join (
select
min(date) as date
,id
from testtable
group by id
) t2
on t2.date = t1.date
and t2.id = t1.id
group by monthyear

You are looking for the number of new users each month, yes?
Here is one way to do it.
Note that I had to use TO_DATE and TO_CHAR to make sure the month/year text strings sorted correctly. If you use real DATE columns that would be unnecessary.
An additional complexity was adding the empty months in (months with zero new users). Optimally that would not be done by using a SELECT DISTINCT on the base table to get all months.
create table x (
monthyear varchar2(20),
id varchar2(10)
);
insert into x values('02/18', 'A32');
insert into x values('04/19', 'T39');
insert into x values('05/19', 'T39');
insert into x values('04/19', 'Y95');
insert into x values('01/18', 'A32');
insert into x values('12/19', 'I99');
insert into x values('11/18', 'OPT');
insert into x values('09/19', 'TT8');
And the query:
with allmonths as(
select distinct monthyear from x
),
firstmonths as(
select id, to_char(min(to_date(monthyear, 'MM/YY')),'MM/YY') monthyear from x group by id
),
firstmonthcounts as(
select monthyear, count(*) cnt
from firstmonths group by monthyear
)
select am.monthyear, nvl(fmc.cnt, 0) as newusers
from allmonths am left join firstmonthcounts fmc on am.monthyear = fmc.monthyear
order by to_date(monthyear, 'MM/YY');

Show entire record from table with minimum timestamp in a group

I have been trying for about three hours to solve this problem but cannot find the solution.
How would I show the entire row (all 20 columns) for the first occurance (minimum time) of each name in my table?
For example, I would like to do something like this, which does not work:
SELECT name, MIN(time), col1, col2, col3, col4
FROM table
GROUP BY name;

You have to first get the minimum time for each name, and then join back to your original table where the name/time matches.
To get the minimum time:
SELECT name, MIN(time) AS minTime
FROM myTable
GROUP BY name;
Then, get all columns:
SELECT m.*
FROM myTable m
JOIN(
SELECT name, MIN(time) AS minTime
FROM myTable
GROUP BY name) tmp ON tmp.name = m.name AND tmp.minTime = m.time;

Most databases support ANSI standard window functions. With these, you can just do:
select t.*
from (select t.*, row_number() over (partition by name order by time) as seqnum
from table t
) t
where seqnum = 1;

Use MIN() where you cannot GROUP?

I feel pretty dumb, but I get stuck with an apparently very easy query. I have something like this, where every row is a user that watched a movie:
user_id date duration
1 01-01-01 62m
1 03-01-01 95m
2 02-01-01 58m
2 06-01-01 25m
2 08-01-01 95m
3 03-01-01 96m
Now, what I would like to have is a table where I have the first movie watched by each user and its duration. The problem is if I use MIN() then I have to GROUP both user_id and duration. But if I GROUP for duration as well, then I am basically going to have the same table back. How can I solve the problem?

You can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT rn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date ASC),
user_id, date, duration
FROM dbo.TableName
)
SELECT user_id, date, duration FROM CTE WHERE rn = 1
The advantage of ROW_NUMBER is that you can change the logic easily. For example, if you want to reverse the logic and get the row of the last watched film per user, you just have to change ORDER BY date ASC to ORDER BY date DESC.
The advantage of theCTE (common-table-expression) is that you can also use it to delete or update these records. Often used to delete or identify duplicates. So you can first select to see what will be deleted/updated before you execute it.

Try this query. I haven't tested it.
SELECT date, duration FROM tablename n
WHERE NOT EXISTS(
SELECT date, user_id FROM tablename g
WHERE n.user_id = g.user_id AND g.date < n.date
);

Assuming there can only be a single record per user per date, it'd be something like this:
select y.*
from table t
inner join (
select user_id, min(date) mindate
from table
group by user_id
) t1
on t.user_id = t1.user_id
and t.date = t1.mindate

You can use ROW_NUMBER() which is a ranking function that generates sequential number for every group based on the column that you want to sort. In this case, if there is a tie, only one record for every user is selected but if you want to select all of them, you need to use DENSE_RANK() rather than ROW_NUMBER()
SELECT user_id, date, duration
FROM
(
SELECT user_id, date, duration,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date) rn
FROM tableName
) a
WHERE rn = 1
this also assumes that the data type of column date is DATE

If you are using SQL Server 2005 or later, you can use windowing functions.
SELECT *
FROM
(
SELECT user_id, date, duration, MIN(date) OVER(PARTITION BY user_id) AS MIN_DATE
FROM MY_TABLE
) AS RESULTS
WHERE date = MIN_DATE
The over clause and partion by will "group by" the user_id and select the min date per user_id without eliminating any rows. Then you select from the table where the date is equal to the min date and you are left with the first date per user_id. This is a common trick once you know about windowing functions.

If you want the first watch_date per user, there should be no date before this date for this user:
SELECT *
FROM watched_movies wm
WHERE NOT EXISTS (
SELECT *
FROM watched_movies nx
WHERE nx.user_id = wm.user_id
AND nx.watch_date < wm.watch_date
);
Note: I replaced the date column by watch_date, since date is a reserved word (type name).

This should give you the duration of the first movie watched on the earliest date:
SELECT a.user_id, b.date, a.duration
FROM table a
INNER JOIN (SELECT user_id,min(date) date FROM table GROUP BY user_id) b ON a.user_id = b.user_id AND a.date = b.date
INNER JOIN (SELECT user_id,date,min(session_id) FROM table GROUP BY user_id, date) c ON b.user_id = c.user_id AND b.date = c.date AND a.session_id = c.session_id

Try this:
WITH TABLE1
AS (SELECT
'1' AS USER_ID,
'01-01-01' AS DT,
62 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'1' AS USER_ID,
'03-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'02-01-01' AS DT,
58 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'06-01-01' AS DT,
25 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'08-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'3' AS USER_ID,
'03-01-01' AS DT,
96 AS DURATION
FROM
DUAL)
SELECT
*
FROM
(SELECT
USER_ID,
DT,
DURATION,
RANK ( ) OVER (PARTITION BY USER_ID ORDER BY DT ASC) AS ROW_RANK
FROM
TABLE1)
WHERE
ROW_RANK = 1

Use a sub-query to get the min date then join that back to the table to get all other relevant columns.
SELECT T2.user_id
,T2.date
,T2.duration
FROM YourTable T2
INNER JOIN
(
SELECT T1.user_id
,MIN(T1.date) as first_date
FROM YourTable T1
) SQ
ON T2.user_id = sq.user_id
AND T2.date = sq.first_date

how do I query sql for a latest record date for each user

I have a table that is a collection entries as to when a user was logged on.
username, date, value
--------------------------
brad, 1/2/2010, 1.1
fred, 1/3/2010, 1.0
bob, 8/4/2009, 1.5
brad, 2/2/2010, 1.2
fred, 12/2/2009, 1.3
etc..
How do I create a query that would give me the latest date for each user?
Update: I forgot that I needed to have a value that goes along with the latest date.

This is the simple old school approach that works with almost any db engine, but you have to watch out for duplicates:
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
Using window functions will avoid any possible issues with duplicate records due to duplicate date values, so if your db engine allows it you can do something like this:
select x.username, x.date, x.value
from (
select username, date, value,
row_number() over (partition by username order by date desc) as _rn
from MyTable
) x
where x._rn = 1

Using window functions (works in Oracle, Postgres 8.4, SQL Server 2005, DB2, Sybase, Firebird 3.0, MariaDB 10.3)
select * from (
select
username,
date,
value,
row_number() over(partition by username order by date desc) as rn
from
yourtable
) t
where t.rn = 1

I see most of the developers use an inline query without considering its impact on huge data.
Simply, you can achieve this by:
SELECT a.username, a.date, a.value
FROM myTable a
LEFT OUTER JOIN myTable b
ON a.username = b.username
AND a.date < b.date
WHERE b.username IS NULL
ORDER BY a.date desc;

From my experience the fastest way is to take each row for which there is no newer row in the table.
Another advantage is that the syntax used is very simple, and that the meaning of the query is rather easy to grasp (take all rows such that no newer row exists for the username being considered).
NOT EXISTS
SELECT username, value
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS witness
WHERE witness.username = t.username AND witness.date > t.date
);
ROW_NUMBER
SELECT username, value
FROM (
SELECT username, value, row_number() OVER (PARTITION BY username ORDER BY date DESC) AS rn
FROM t
) t2
WHERE rn = 1
INNER JOIN
SELECT t.username, t.value
FROM t
INNER JOIN (
SELECT username, MAX(date) AS date
FROM t
GROUP BY username
) tm ON t.username = tm.username AND t.date = tm.date;
LEFT OUTER JOIN
SELECT username, value
FROM t
LEFT OUTER JOIN t AS w ON t.username = w.username AND t.date < w.date
WHERE w.username IS NULL

To get the whole row containing the max date for the user:
select username, date, value
from tablename where (username, date) in (
select username, max(date) as date
from tablename
group by username
)

SELECT *
FROM MyTable T1
WHERE date = (
SELECT max(date)
FROM MyTable T2
WHERE T1.username=T2.username
)

This one should give you the correct result for your edited question.
The sub-query makes sure to find only rows of the latest date, and the outer GROUP BY will take care of ties. When there are two entries for the same date for the same user, it will return the one with the highest value.
SELECT t.username, t.date, MAX( t.value ) value
FROM your_table t
JOIN (
SELECT username, MAX( date ) date
FROM your_table
GROUP BY username
) x ON ( x.username = t.username AND x.date = t.date )
GROUP BY t.username, t.date

If your database syntax supports it, then TOP 1 WITH TIES can be a lifesafer in combination with ROWNUMER.
With the example data you provided, use this query:
SELECT TOP 1 WITH TIES
username, date, value
FROM user_log_in_attempts
ORDER BY ROW_NUMBER() OVER (PARTITION BY username ORDER BY date DESC)
It yields:
username | date | value
-----------------------------
bob | 8/4/2009 | 1.5
brad | 2/2/2010 | 1.2
fred | 12/2/2009 | 1.3
Demo
How it works:
ROWNUMBER() OVER (PARTITION BY... ORDER BY...) For each username a list of rows is calculated from the youngest (rownumber=1) to the oldest (rownumber=high)
ORDER BY ROWNUMBER... sorts the youngest rows of each user to the top, followed by the second-youngest rows of each user, and so on
TOP 1 WITH TIES Because each user has a youngest row, those youngest rows are equal in the sense of the sorting criteria (all have rownumber=1). All those youngest rows will be returned.
Tested with SQL-Server.

SELECT DISTINCT Username, Dates,value
FROM TableName
WHERE Dates IN (SELECT MAX(Dates) FROM TableName GROUP BY Username)
Username Dates value
bob 2010-02-02 1.2
brad 2010-01-02 1.1
fred 2010-01-03 1.0

This is similar to one of the answers above, but in my opinion it is a lot simpler and tidier. Also, shows a good use for the cross apply statement. For SQL Server 2005 and above...
select
a.username,
a.date,
a.value,
from yourtable a
cross apply (select max(date) 'maxdate' from yourtable a1 where a.username=a1.username) b
where a.date=b.maxdate

You could also use analytical Rank Function
with temp as
(
select username, date, RANK() over (partition by username order by date desc) as rnk from t
)
select username, rnk from t where rnk = 1

SELECT MAX(DATE) AS dates
FROM assignment
JOIN paper_submission_detail ON assignment.PAPER_SUB_ID =
paper_submission_detail.PAPER_SUB_ID

SELECT Username, date, value
from MyTable mt
inner join (select username, max(date) date
from MyTable
group by username) sub
on sub.username = mt.username
and sub.date = mt.date
Would address the updated problem. It might not work so well on large tables, even with good indexing.

SELECT *
FROM ReportStatus c
inner join ( SELECT
MAX(Date) AS MaxDate
FROM ReportStatus ) m
on c.date = m.maxdate

For Oracle sorts the result set in descending order and takes the first record, so you will get the latest record:
select * from mytable
where rownum = 1
order by date desc

SELECT t1.username, t1.date, value
FROM MyTable as t1
INNER JOIN (SELECT username, MAX(date)
FROM MyTable
GROUP BY username) as t2 ON t2.username = t1.username AND t2.date = t1.date

Select * from table1 where lastest_date=(select Max(latest_date) from table1 where user=yourUserName)
Inner Query will return the latest date for the current user, Outer query will pull all the data according to the inner query result.

I used this way to take the last record for each user that I have on my table.
It was a query to get last location for salesman as per recent time detected on PDA devices.
CREATE FUNCTION dbo.UsersLocation()
RETURNS TABLE
AS
RETURN
Select GS.UserID, MAX(GS.UTCDateTime) 'LastDate'
From USERGPS GS
where year(GS.UTCDateTime) = YEAR(GETDATE())
Group By GS.UserID
GO
select gs.UserID, sl.LastDate, gs.Latitude , gs.Longitude
from USERGPS gs
inner join USER s on gs.SalesManNo = s.SalesmanNo
inner join dbo.UsersLocation() sl on gs.UserID= sl.UserID and gs.UTCDateTime = sl.LastDate
order by LastDate desc

My small compilation
self join better than nested select
but group by doesn't give you primary key which is preferable for join
this key can be given by partition by in conjunction with first_value (docs)
So, here is a query:
select
t.*
from
Table t inner join (
select distinct first_value(ID) over(partition by GroupColumn order by DateColumn desc) as ID
from Table
where FilterColumn = 'value'
) j on t.ID = j.ID
Pros:
Filter data with where statement using any column
select any columns from filtered rows
Cons:
Need MS SQL Server starting with 2012.

I did somewhat for my application as it:
Below is the query:
select distinct i.userId,i.statusCheck, l.userName from internetstatus
as i inner join login as l on i.userID=l.userID
where nowtime in((select max(nowtime) from InternetStatus group by userID));

Here's one way to return only the most recent record for each user in SQL Server:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date DESC) AS rn
FROM your_table
)
SELECT *
FROM CTE
WHERE rn = 1;
This uses a common table expression (CTE) to assign a unique rn (row number) to each record for each user, based on the user_id and sorted in descending order by date. The final query then selects only the records with rn equal to 1, which represents the most recent record for each user.

SELECT * FROM TABEL1 WHERE DATE= (SELECT MAX(CREATED_DATE) FROM TABEL1)

You would use aggregate function MAX and GROUP BY
SELECT username, MAX(date), value FROM tablename GROUP BY username, value

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting new distinct values over time (ORACLE SQL) - sql

Count only first appearances, use row_number() at first: select timestamp, sum(frst) as new_names from ( select timestamp, case when row_number() over (partition by name order by timestamp) = 1 then 1 else 0 end frst from scores) group by timestamp

Related

Alternative way of full outer join

Counting ID's for correct creation date time

Show entire record from table with minimum timestamp in a group

Use MIN() where you cannot GROUP?

how do I query sql for a latest record date for each user

Categories

Resources