Marking records with 1 on first occurence of unique value - sql

I have a table that I'd like to add a column to that shows a 1 on the first occurrence of a given value for the record within the dataset.
So, for example, if I was using the ID field as where to look for unique occurrences, I'd want a "FirstOccur" column (like the one below) putting a 1 on the first occurrence of a unique ID value in the dataset and just ignoring (leaving as null) any other occurrence:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | |
| 1 | |
| 2 | 1 |
| 2 | |
| 3 | 1 |
| 4 | 1 |
| 4 | |
I have a working 2-step approach that first applies some ranking sql that will give me something like this:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 1 |
| 4 | 2 |
..and I just apply some update SQL to null any value above 1 to get the desired result.
I was just wondering if there was a (simpler) one-hit approach.

Assuming you have a creation date or auto incremented id or something that specifies the ordering, you can do:
update t
set firstoccur = 1
where creationdate = (select min(creationdate)
from t as t2
where t2.id = t.id
);

Related

SQL create a new field sessions given the value of another field

I have problems approaching the following task.
Given a table like
| user_id | hit_id | new_session |
|---------------|--------------|--------------|
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| ... | ... | ... |
| 5 | 19 | 0 |
where
the combination of user_id and hit_id is unique
new_session is a boolean that determines if the hit started a new session or not for this particular user
I want to create a new column, session_number that splits hit_ids into sessions, taking into account that:
the first row for each user_id, once ordered by hit_id asc gets a value of 1 for the new column session_number
as long as new_session is 0, the value of session_number stays the same
when new_session is 1, I have to sum up 1 to the actual session count
the logic works over a partition by user_id ordered by hit_id asc, and therefore once the user_id changes, the session count is reset
I have created a db-fiddle with some example data
The expected output for user_id = 1 (which cover multiple corner cases) would be:
| user_id | hit_id | new_session | session_number |
|---------------|--------------|--------------|----------------|
| 1 | 1 | 0 | 1 |
| 1 | 2 | 0 | 1 |
| 1 | 3 | 1 | 2 |
| 1 | 4 | 0 | 2 |
| 1 | 5 | 0 | 2 |
| 1 | 6 | 1 | 3 |
| 1 | 7 | 0 | 3 |
| 1 | 8 | 1 | 4 |
| 1 | 8 | 1 | 5 |
I have tried with a combination of lag(), rank(), and dense_rank(), but I always find a corner case that makes all the attempts unsuccessful. Additionally, I am totally sure that there is a very easy approach for that that I am not taking into account.
You can use a cumulative sum:
select pv.*,
(1 + sum(new_session) over (partition by user_id order by hit_id)) as session_number
from pageviews pv;
Here is a db-fiddle.

Filter SQL Server data according to its max value

I have one SQL Server 2008 table like:
+------+-------+--------------------------------------+
| id | level | content |
+------+-------+--------------------------------------+
| 1 | 1 | ... |
| 2 | 2 | ... |
| 1 | 2 | ... |
| 1 | 3 | ... |
| 2 | 1 | ... |
| 1 | 4 | ... |
| 3 | 1 | ... |
+------+-------+--------------------------------------+
For every id, it may have three, two or four levels saved in table like above. How can I get the data for every id:
every id has at most three records in final table
if the max level of one id is higher than 3, the three records' level is from max to max-3;
if the max level of one id is equal or less than 3, just keep them as they are.
so the final table which I would like to get is:
+------+-------+--------------------------------------+
| id | level | content |
+------+-------+--------------------------------------+
| 1 | 1 | ... |
| 2 | 2 | ... |
| 1 | 2 | ... |
| 1 | 3 | ... |
| 2 | 1 | ... |
| 3 | 1 | ... |
+------+-------+--------------------------------------+
How can I the lines? Thanks a lot.
I think you want the 3 latest levels per id. If so, you can use window functions like so:
select *
from (
select t.*, row_number() over(partition by id order by level desc) rn
from mytable t
) t
where rn <= 3

Count rows in table that are the same in a sequence

I have a table that looks like this
+----+------------+------+
| ID | Session_ID | Type |
+----+------------+------+
| 1 | 1 | 2 |
| 2 | 1 | 4 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 3 | 2 |
| 7 | 3 | 1 |
+----+------------+------+
And I would like to count all occurences of a type that are in a sequence.
Output look some how like this:
+------------+------+-----+
| Session_ID | Type | cnt |
+------------+------+-----+
| 1 | 2 | 1 |
| 1 | 4 | 1 |
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 1 |
| 3 | 1 | 1 |
+------------+------+-----+
A simple group by like
SELECT session_id, type, COUNT(type)
FROM table
GROUP BY session_id, type
doesn't work, since I need to group only rows that are "touching".
Is this possible with a merge sql-select or will I need some sort of coding. Stored Procedure or Application side coding?
UPDATE Sequence:
If the following row has the same type, it should be counted (ordered by ID).
to determine the sequence the ID is the key with the session_ID, since I just want to group rows with the same session_ID.
So if there are 3 rows is in one session
row with the ID 1 has type 1,
and the second row has type 1
and row 3 has type 2
Input:
+----+------------+------+
| ID | Session_ID | Type |
+----+------------+------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
+----+------------+------+
The squence is Row 1 to Row 2. This three row should output
Output:
+------------+------+-------+
| Session_ID | Type | count |
+------------+------+-------+
| 1 | 1 | 2 |
| 3 | 2 | 1 |
+------------+------+-------+
You can use a difference of id and row_number() to identify the gaps and then perform your count
;with cte as
(
Select *, id - row_number() over (partition by session_id,type order by id) as grp
from table
)
select session_id,type,count(*) as cnt
from cte
group by session_id,type,grp
order by max(id)

How to count unique occurences of column in Big Query

Given a table such as:
| ID | Value |
|-------------|
| 1 | "some" |
| 1 | "some" |
| 1 | "value"|
| 2 | "some" |
| 3 | "some" |
| 3 | "value |
| 3 | "value |
How can I count the number of unique occurrences of value for each ID?
So you end up with a table such as:
| ID | Value | number |
|-------------|--------|
| 1 | "some" | 2 |
| | "value"| 1 |
| 2 | "some" | 1 |
| 3 | "some" | 1 |
| | "value | 2 |
I attempted to use OVER(PARTITION BY ID order by Value) to separate the table by IDs and count the separate values. However this counts the number of unique occurences, but then adds them together. So I end up with a table such as:
| ID | Value | number |
|-------------|--------|
| 1 | "some" | 2 |
| 1 | "some" | 2 |
| 1 | "value"| 3 |
| 2 | "some" | 1 |
| 3 | "some" | 1 |
| 3 | "value | 3 |
| 3 | "value | 3 |
Is there a way to count the unique values like the second example I gave?
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, value, COUNT(1) number
FROM `project.dataset.table`
GROUP BY id, value
with result
Row id value number
1 1 some 2
2 1 value 1
3 2 some 1
4 3 value 2
5 3 some 1

SQL generate unique ID from rolling ID

I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;