SQL create a new field sessions given the value of another field - sql

I have problems approaching the following task.
Given a table like
| user_id | hit_id | new_session |
|---------------|--------------|--------------|
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| ... | ... | ... |
| 5 | 19 | 0 |
where
the combination of user_id and hit_id is unique
new_session is a boolean that determines if the hit started a new session or not for this particular user
I want to create a new column, session_number that splits hit_ids into sessions, taking into account that:
the first row for each user_id, once ordered by hit_id asc gets a value of 1 for the new column session_number
as long as new_session is 0, the value of session_number stays the same
when new_session is 1, I have to sum up 1 to the actual session count
the logic works over a partition by user_id ordered by hit_id asc, and therefore once the user_id changes, the session count is reset
I have created a db-fiddle with some example data
The expected output for user_id = 1 (which cover multiple corner cases) would be:
| user_id | hit_id | new_session | session_number |
|---------------|--------------|--------------|----------------|
| 1 | 1 | 0 | 1 |
| 1 | 2 | 0 | 1 |
| 1 | 3 | 1 | 2 |
| 1 | 4 | 0 | 2 |
| 1 | 5 | 0 | 2 |
| 1 | 6 | 1 | 3 |
| 1 | 7 | 0 | 3 |
| 1 | 8 | 1 | 4 |
| 1 | 8 | 1 | 5 |
I have tried with a combination of lag(), rank(), and dense_rank(), but I always find a corner case that makes all the attempts unsuccessful. Additionally, I am totally sure that there is a very easy approach for that that I am not taking into account.

You can use a cumulative sum:
select pv.*,
(1 + sum(new_session) over (partition by user_id order by hit_id)) as session_number
from pageviews pv;
Here is a db-fiddle.

Related

Count rows in table that are the same in a sequence

I have a table that looks like this
+----+------------+------+
| ID | Session_ID | Type |
+----+------------+------+
| 1 | 1 | 2 |
| 2 | 1 | 4 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 3 | 2 |
| 7 | 3 | 1 |
+----+------------+------+
And I would like to count all occurences of a type that are in a sequence.
Output look some how like this:
+------------+------+-----+
| Session_ID | Type | cnt |
+------------+------+-----+
| 1 | 2 | 1 |
| 1 | 4 | 1 |
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 1 |
| 3 | 1 | 1 |
+------------+------+-----+
A simple group by like
SELECT session_id, type, COUNT(type)
FROM table
GROUP BY session_id, type
doesn't work, since I need to group only rows that are "touching".
Is this possible with a merge sql-select or will I need some sort of coding. Stored Procedure or Application side coding?
UPDATE Sequence:
If the following row has the same type, it should be counted (ordered by ID).
to determine the sequence the ID is the key with the session_ID, since I just want to group rows with the same session_ID.
So if there are 3 rows is in one session
row with the ID 1 has type 1,
and the second row has type 1
and row 3 has type 2
Input:
+----+------------+------+
| ID | Session_ID | Type |
+----+------------+------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
+----+------------+------+
The squence is Row 1 to Row 2. This three row should output
Output:
+------------+------+-------+
| Session_ID | Type | count |
+------------+------+-------+
| 1 | 1 | 2 |
| 3 | 2 | 1 |
+------------+------+-------+
You can use a difference of id and row_number() to identify the gaps and then perform your count
;with cte as
(
Select *, id - row_number() over (partition by session_id,type order by id) as grp
from table
)
select session_id,type,count(*) as cnt
from cte
group by session_id,type,grp
order by max(id)

Semi-transposing a table in Oracle

I am having trouble semi-transposing the table below based on the 'LENGTH' column. I am using an Oracle database, sample data:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | LENGTH | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 4 | 1 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I would like to lengthen this table based on the LENGTH row; basically duplicating the row for each value in the LENGTH column.
See the desired output table below:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | NUMBER | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 1 |
| 1 | 1 | 4 | 1 |
| 1 | 2 | 1 | 0 |
| 1 | 2 | 2 | 0 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 3 | 1 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I typically work in Posgres so Oracle is new to me.
I've found some solutions using the connect by statement but they seem overly complicated, particularly when compared to the simple generate_series() command from Posgres.
A recursive CTE subtracting 1 from length until 1 is reached should work. (In Postgres too, BTW, should you need something working cross platform.)
WITH cte (person_id,
period_id,
number_,
flag)
AS
(
SELECT person_id,
period_id,
length number_,
flag
FROM elbat
UNION ALL
SELECT person_id,
period_id,
number_ - 1 number_,
flag
FROM cte
WHERE number_ > 1
)
SELECT *
FROM cte
ORDER BY person_id,
period_id,
number_;
db<>fiddle

SQL generate unique ID from rolling ID

I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;

How to select if similar field count is the maximum in the table?

I want to select from a table if row counts of similar filed is maximum depends on other columns.
As example
| user_id | team_id | isOk |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 2 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
| 10 | 3 | 1 |
| 11 | 3 | 0 |
So i want to select team 1 and 2 because they all have 1 value at isOk Column,
i tried to use this query
SELECT Team
FROM _Table1
WHERE isOk= 1
GROUP BY Team
HAVING COUNT(*) > 3
But still i have to define a row count which can be maximum or not.
Thanks in advance.
Is this what you are looking for?
select team
from _table1
group by team
having min(isOk) = 1;

Marking records with 1 on first occurence of unique value

I have a table that I'd like to add a column to that shows a 1 on the first occurrence of a given value for the record within the dataset.
So, for example, if I was using the ID field as where to look for unique occurrences, I'd want a "FirstOccur" column (like the one below) putting a 1 on the first occurrence of a unique ID value in the dataset and just ignoring (leaving as null) any other occurrence:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | |
| 1 | |
| 2 | 1 |
| 2 | |
| 3 | 1 |
| 4 | 1 |
| 4 | |
I have a working 2-step approach that first applies some ranking sql that will give me something like this:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 1 |
| 4 | 2 |
..and I just apply some update SQL to null any value above 1 to get the desired result.
I was just wondering if there was a (simpler) one-hit approach.
Assuming you have a creation date or auto incremented id or something that specifies the ordering, you can do:
update t
set firstoccur = 1
where creationdate = (select min(creationdate)
from t as t2
where t2.id = t.id
);