SQL Identify Unique User from multiple columns - sql

Is there anyway to create a unique identifier column (user) for a user so that if Email, User_ID or Subscription_ID is a match between any of the rows then I will be able to group by and aggregate by this unique identifier later? The unique identifier doesn't have to be incrementing but I thought that could be a way of implementing it.
Data which could call subscriptions table:
|Email |User_ID | Subscription_ID |
|--------------|-----------|--------------------|
|12#gmail.com |20 | 56 |
|12#gmail.com |30 | 86 |
|13#gmail.com |20 | 96 |
|14#gmail.com |22 | 96 |
|15#gmail.com |80 | 12 |
Desired Result:
|Email |User_ID | Subscription_ID | User |
|--------------|-----------|--------------------|------|
|12#gmail.com |20 | 56 | 1 |
|12#gmail.com |30 | 86 | 1 |
|13#gmail.com |20 | 96 | 1 |
|14#gmail.com |22 | 96 | 1 |
|15#gmail.com |80 | 12 | 2 |

Related

Find how many x-day intervals have passed since outset and another date

In SQL, how can I calculate how many x-day intervals have passed since outset date?
Consider 2023-01-11 as an example for such an "outset date". For any subsequent date, I want to know how many 4-day intervals have passed since the outset date.
For example:
💚 2023-01-13 should return 1, because it's the first 4-day interval.
💙 2023-01-18 should return 2, because it's the second 4-day interval.
💜 2023-02-02 should return 6, because it's the sixth 4-day interval.
## # January 2023
## Su Mo Tu We Th Fr Sa
## -----------------------------------------------------------------------
## |1 |2 |3 |4 |5 |6 |7 |
## | | | | | | | |
## | | | | | | | |
## | | | | | | | |
## | | | | | | | |
## -----------------------------------------------------------------------
## |8 |9 |10 |11 |12 |13 |14 |
## | | | | | | | |
## | | | |*outset* | |💚 | |
## | | | |<<=======|=========|=========|=======>>|
## | | | | | | | |
## -----------------------------------------------------------------------
## |15 |16 |17 |18 |19 |20 |21 |
## | | | | | | | |
## | | | |💙 | | | |
## |<<+++++++|+++++++++|+++++++++|+++++++>>|<<#######|#########|#########|
## | | | | | | | |
## -----------------------------------------------------------------------
## |22 |23 |24 |25 |26 |27 |28 |
## | | | | | | | |
## | | | | | | | |
## |#######>>|<<#######|#########|#########|#######>>|<<*******|*********|
## | | | | | | | |
## -----------------------------------------------------------------------
## |29 |30 |31 |1 |2 |3 |4 |
## | | | | | | | |
## | | | | |💜 | | |
## |*********|*******>>|<<~~~~~~~|~~~~~~~~~|~~~~~~~~~|~~~~~~~>>| |
## | | | | | | | |
## -----------------------------------------------------------------------
So if I have the corresponding SQL table:
CREATE TABLE my_tbl (outset_date DATE, date_of_interest DATE);
INSERT INTO my_tbl (outset_date, date_of_interest)
VALUES ('2023-01-11', '2023-01-13'),
('2023-01-11', '2023-01-18'),
('2023-01-11', '2023-02-02');
How can I write a select statement to get the desired output:
-- desired output
-- +──────────────+───────────────────+─────────────────────────────────+
-- | outset_date | date_of_interest | how_many_intervals_have_passed |
-- +──────────────+───────────────────+─────────────────────────────────+
-- | 2023-01-11 | 2023-01-13 | 1 |
-- | 2023-01-11 | 2023-01-18 | 2 |
-- | 2023-01-11 | 2023-02-02 | 6 |
-- +──────────────+───────────────────+─────────────────────────────────+
If there isn't an "idimoatic" SQL syntax for this, I'd opt for either MySQL or PostgreSQL. Thanks!
To count the difference between two dates in days, you need to subtract the oldest date from the earliest one, i.e. "2023-01-13" - "2023-01-11" = 2
In your case, you need the number of days between the two dates including the first and last dates, this means you need to add 1 day to the difference in days, i.e. "2023-01-13" - "2023-01-11" + 1 = 3
To get the 4 days interval in which a date lies, simply add 3 to the calculated date difference then perform integer division by 4. i.e. for differences (1, 2, 3, 4) it will be (4/4, 5/4, 6/4, 7/4) which equals to 1 for all.
For Postgres try the following:
select *,
(date_of_interest - outset_date + 4) / 4 as expected
from my_tbl
The + 4 here is +1 to calculate the difference between the two dates inclusively as mentioned above, and +3 to perform the integer division.
See demo.
For MySQL, it will be (datediff(date_of_interest, outset_date) + 4) div 4, where div operator is used to perform the integer division.
The basic solution for MySQL:
SELECT
outset_date,
date_of_interest,
CEIL(DATEDIFF(date_of_interest, outset_date) / 4) how_many_intervals_have_passed
FROM my_tbl;
test SQL here
PostgreSQL solution below:
SELECT
outset_date,
date_of_interest,
CEIL((date_of_interest - outset_date)::numeric / 4) how_many_intervals_have_passed
FROM my_tbl;

How can I get row index in sql

Table Contents are:
|Emp_ID | Name |
|-------|-------|
|1 | xyz |
|23 | pqq |
|22 | wdd |
|12 | fdv |
Here I want the row index where Emp_ID is greater than 15.
Result should return 2 and 3

Need a simple query to calculate sequence length in SQL Server

I have this view that represent the status of connections for each user to a system inside table as below:
---------------------------------------
|id | date | User | Connexion |
|1 | 01/01/2018 | A | 1 |
|2 | 02/01/2018 | A | 0 |
|3 | 03/01/2018 | A | 1 |
|4 | 04/01/2018 | A | 1 |
|5 | 05/01/2018 | A | 0 |
|6 | 06/01/2018 | A | 0 |
|7 | 07/01/2018 | A | 0 |
|8 | 08/01/2018 | A | 1 |
|9 | 09/01/2018 | A | 1 |
|10 | 10/01/2018 | A | 1 |
|11 | 11/01/2018 | A | 1 |
---------------------------------------
The target output would be to get the count of succeeded and failed connection order by date so the output would be like that
---------------------------------------------------------------
|StartDate EndDate User Connexion Length|
|01/01/2018 | 01/01/2018 | A | 1 | 1 |
|02/01/2018 | 02/01/2018 | A | 0 | 1 |
|03/01/2018 | 04/01/2018 | A | 1 | 2 |
|05/01/2018 | 07/01/2018 | A | 0 | 3 |
|08/01/2018 | 11/01/2018 | A | 1 | 4 |
---------------------------------------------------------------
This is what is called a gaps-and-islands problem. The best solution for your version is a difference of row numbers:
select user, min(date), max(date), connexion, count(*) as length
from (select t.*,
row_number() over (partition by user order by date) as seqnum,
row_number() over (partition by user, connexion order by date) as seqnum_uc
from t
) t
group by user, connexion, (seqnum - seqnum_uc);
Why this works is a little tricky to explain. Generally, I find that if you stare at the results of the subquery, you'll see how the difference is constant for the groups that you care about.
Note: You should not use user or date for the names of columns. These are keywords in SQL (of one type or another). If you do use them, you have to clutter up your SQL with escape characters, which just makes the code harder to write, read, and debug.

Stored Procedure IF Exist Update Else Insert, if match

I got a lot of help form different articles, for which i thank you a lot.
But, right now i have a case were i need your support namely related with an SQL procedure which not only is checking if data exists (for update or insert it, but i have to match it with other 2 tables to check if data is matching then insert the same row with different values for some columns on a separate table).
I hope to be more explicit, so the example below I hope to help
Main row data : db.Table1
|rowID|PurchaseDate|ProducID | ProductName | CustomerID | Qty | UnitType|
|row1 |09.09.2018 |206 | Prod1 | 1 | 10 | bl. |
|row2 |09.09.2018 |207 | Prod2 | 2 | 15 | bl. |
|row3 |12.09.2018 |203 | Prod5 | 5 | 5 | lk. |
|row4 |15.09.2018 |207 | Prod2 | 6 | 10 | lk. |
|row5 |20.09.2018 |207 | Prod2 | 8 | 3 | Pk. |
|row6 |20.09.2018 |203 | Prod5 | 8 | 6 | Pk. |
|row7 |20.09.2018 |205 | Prod0 | 2 | 5 | J. |
to match with: db.Table2
|CustomerID| CustomerName|
|1 | Customer1 |
|2 | Customer2 |
|3 | Customer3 |
|4 | Customer4 |
|5 | Customer5 |
to match with: db.Table3
|ProducID| ProductName| SubProdNAME| |
|205 | Prod0 | Prod101 |
|205 | Prod0 | Prod202 |
|204 | Prod01 | Prod1001 |
|204 | Prod01 | Prod2002 |
to final table: db.TableFIN
|rowID| PurchaseDate|ProducID|ProductName|CustomerID|Qty|UnitType |Stage|
|row1 | 09.09.2018 | 206 | Prod1 | 1 |10 | bl. | DONE|
|row1 | 09.09.2018 | 206 | Prod1 | 1 |10 | bl. | NONE|
|row2 | 09.09.2018 | 207 | Prod2 | 2 |15 | bl. | DONE|
|row2 | 09.09.2018 | 207 | Prod2 | 2 |15 | bl. | NONE|
|row3 | 12.09.2018 | 203 | Prod5 | 5 |5 | lk. | DONE|
|row3 | 12.09.2018 | 203 | Prod5 | 5 |5 | lk. | NONE|
|row4 | 15.09.2018 | 207 | Prod2 | 6 |10 | lk. | DONE|
|row4 | 15.09.2018 | 207 | Prod2 | 6 |0 | lk. | NONE|
|row5 | 20.09.2018 | 207 | Prod2 | 8 |3 | Pk. | DONE|
|row5 | 20.09.2018 | 207 | Prod2 | 8 |0 | Pk. | NONE|
|row6 | 20.09.2018 | 203 | Prod5 | 8 |6 | Pk. | DONE|
|row6 | 20.09.2018 | 203 | Prod5 | 8 |0 | Pk. | NONE|
|row7 | 20.09.2018 | 205 | Prod101 | 3 |5 | bundle| DONE|
|row7 | 20.09.2018 | 205 | Prod101 | 3 |5 | bundle| NONE|
|row7 | 20.09.2018 | 205 | Prod202 | 3 |5 | bundle| DONE|
|row7 | 20.09.2018 | 205 | Prod202 | 3 |5 | bundle| NONE|
So, basically what i need is to insert data by row depending on Stage, one row with stage DONE and second with NONE - plus, in case the consumerID it matches then Qty value it's equal in both cases, otherwise for NONE value = 0 and for DONE the original value.
FOR ProducID if it matches the product, then we have to insert 4 rows. as on above table. Again matching consumerID & Prod updating/inserting stage/values.
YOur support, is highly appreciated.
Thank you in advance!

Derive and Update Column Value based on Row Value SQL Server

So I have a Request History table that I would like to flag its versions (version is based on end of cycle); I was able to mark the end of the cycle, but somehow I couldn't update the values of each associated with each cycle. Here is an example:
|history_id | Req_id | StatID | Time |EndCycleDate |
|-------------|---------|-------|---------- |-------------|
|1 | 1 |18 | 3/26/2017 | NULL |
|2 | 1 | 19 | 3/26/2017 | NULL |
|3 | 1 |20 | 3/30/2017 | NULL |
|4 |1 | 23 |3/30/2017 | NULL |
|5 | 1 |35 |3/30/2017 | 3/30/2017 |
|6 | 1 |33 |4/4/2017 | NULL |
|7 | 1 |34 |4/4/2017 | NULL |
|8 | 1 |39 |4/4/2017 | NULL |
|9 | 1 |35 |4/4/2017 | 4/4/2017 |
|10 | 1 |33 |4/5/2017 | NULL |
|11 | 1 |34 |4/6/2017 | NULL |
|12 | 1 |39 |4/6/2017 | NULL |
|13 | 1 |35 |4/7/2017 | 4/7/2017 |
|14 | 1 |33 |4/8/2017 | NULL |
|15 | 1 | 34 |4/8/2017 | NULL |
|16 | 2 |18 |3/28/2017 | NULL |
|17 | 2 |26 |3/28/2017 | NULL |
|18 | 2 |20 |3/30/2017 | NULL |
|19 | 2 |23 |3/30/2017 | NULL |
|20 | 2 |35 |3/30/2017 | 3/30/2017 |
|21 | 2 |33 |4/12/2017 | NULL |
|22 | 2 |34 |4/12/2017 | NULL |
|23 | 2 |38 |4/13/2017 | NULL |
Now what I would like to achieve is to derive a new column, namely VER, and update its value like the following:
|history_id | Req_id | StatID | Time |EndCycleDate | VER |
|-------------|---------|-------|---------- |-------------|------|
|1 | 1 |18 | 3/26/2017 | NULL | 1 |
|2 | 1 | 19 | 3/26/2017 | NULL | 1 |
|3 | 1 |20 | 3/30/2017 | NULL | 1 |
|4 |1 | 23 |3/30/2017 | NULL | 1 |
|5 | 1 |35 |3/30/2017 | 3/30/2017 | 1 |
|6 | 1 |33 |4/4/2017 | NULL | 2 |
|7 | 1 |34 |4/4/2017 | NULL | 2 |
|8 | 1 |39 |4/4/2017 | NULL | 2 |
|9 | 1 |35 |4/4/2017 | 4/4/2017 | 2 |
|10 | 1 |33 |4/5/2017 | NULL | 3 |
|11 | 1 |34 |4/6/2017 | NULL | 3 |
|12 | 1 |39 |4/6/2017 | NULL | 3 |
|13 | 1 |35 |4/7/2017 | 4/7/2017 | 3 |
|14 | 1 |33 |4/8/2017 | NULL | 4 |
|15 | 1 | 34 |4/8/2017 | NULL | 4 |
|16 | 2 |18 |3/28/2017 | NULL | 1 |
|17 | 2 |26 |3/28/2017 | NULL | 1 |
|18 | 2 |20 |3/30/2017 | NULL | 1 |
|19 | 2 |23 |3/30/2017 | NULL | 1 |
|20 | 2 |35 |3/30/2017 | 3/30/2017 | 1 |
|21 | 2 |33 |4/12/2017 | NULL | 2 |
|22 | 2 |34 |4/12/2017 | NULL | 2 |
|23 | 2 |38 |4/13/2017 | NULL | 2 |
One method that comes really close is a cumulative count:
select t.*,
count(endCycleDate) over (partition by req_id order by history_id) as ver
from t;
However, this doesn't get the value when the endCycle date is defined exactly right. And the value starts at 0. Most of these problems are fixed with a windowing clause:
select t.*,
(count(endCycleDate) over (partition by req_id
order by history_id
rows between unbounded preceding and 1 preceding) + 1
) as ver
from t;
But that misses the value on the first row first one. So, here is a method that actually works. It enumerates the values backward and then subtracts from the total to get the versions in ascending order:
select t.*,
(1 + count(*) over (partition by req_id) -
(count(endCycleDate) over (partition by req_id
order by history_id desc)
) as ver
from t;