Need a simple query to calculate sequence length in SQL Server - sql

I have this view that represent the status of connections for each user to a system inside table as below:
---------------------------------------
|id | date | User | Connexion |
|1 | 01/01/2018 | A | 1 |
|2 | 02/01/2018 | A | 0 |
|3 | 03/01/2018 | A | 1 |
|4 | 04/01/2018 | A | 1 |
|5 | 05/01/2018 | A | 0 |
|6 | 06/01/2018 | A | 0 |
|7 | 07/01/2018 | A | 0 |
|8 | 08/01/2018 | A | 1 |
|9 | 09/01/2018 | A | 1 |
|10 | 10/01/2018 | A | 1 |
|11 | 11/01/2018 | A | 1 |
---------------------------------------
The target output would be to get the count of succeeded and failed connection order by date so the output would be like that
---------------------------------------------------------------
|StartDate EndDate User Connexion Length|
|01/01/2018 | 01/01/2018 | A | 1 | 1 |
|02/01/2018 | 02/01/2018 | A | 0 | 1 |
|03/01/2018 | 04/01/2018 | A | 1 | 2 |
|05/01/2018 | 07/01/2018 | A | 0 | 3 |
|08/01/2018 | 11/01/2018 | A | 1 | 4 |
---------------------------------------------------------------

This is what is called a gaps-and-islands problem. The best solution for your version is a difference of row numbers:
select user, min(date), max(date), connexion, count(*) as length
from (select t.*,
row_number() over (partition by user order by date) as seqnum,
row_number() over (partition by user, connexion order by date) as seqnum_uc
from t
) t
group by user, connexion, (seqnum - seqnum_uc);
Why this works is a little tricky to explain. Generally, I find that if you stare at the results of the subquery, you'll see how the difference is constant for the groups that you care about.
Note: You should not use user or date for the names of columns. These are keywords in SQL (of one type or another). If you do use them, you have to clutter up your SQL with escape characters, which just makes the code harder to write, read, and debug.

Related

Find how many x-day intervals have passed since outset and another date

In SQL, how can I calculate how many x-day intervals have passed since outset date?
Consider 2023-01-11 as an example for such an "outset date". For any subsequent date, I want to know how many 4-day intervals have passed since the outset date.
For example:
💚 2023-01-13 should return 1, because it's the first 4-day interval.
💙 2023-01-18 should return 2, because it's the second 4-day interval.
💜 2023-02-02 should return 6, because it's the sixth 4-day interval.
## # January 2023
## Su Mo Tu We Th Fr Sa
## -----------------------------------------------------------------------
## |1 |2 |3 |4 |5 |6 |7 |
## | | | | | | | |
## | | | | | | | |
## | | | | | | | |
## | | | | | | | |
## -----------------------------------------------------------------------
## |8 |9 |10 |11 |12 |13 |14 |
## | | | | | | | |
## | | | |*outset* | |💚 | |
## | | | |<<=======|=========|=========|=======>>|
## | | | | | | | |
## -----------------------------------------------------------------------
## |15 |16 |17 |18 |19 |20 |21 |
## | | | | | | | |
## | | | |💙 | | | |
## |<<+++++++|+++++++++|+++++++++|+++++++>>|<<#######|#########|#########|
## | | | | | | | |
## -----------------------------------------------------------------------
## |22 |23 |24 |25 |26 |27 |28 |
## | | | | | | | |
## | | | | | | | |
## |#######>>|<<#######|#########|#########|#######>>|<<*******|*********|
## | | | | | | | |
## -----------------------------------------------------------------------
## |29 |30 |31 |1 |2 |3 |4 |
## | | | | | | | |
## | | | | |💜 | | |
## |*********|*******>>|<<~~~~~~~|~~~~~~~~~|~~~~~~~~~|~~~~~~~>>| |
## | | | | | | | |
## -----------------------------------------------------------------------
So if I have the corresponding SQL table:
CREATE TABLE my_tbl (outset_date DATE, date_of_interest DATE);
INSERT INTO my_tbl (outset_date, date_of_interest)
VALUES ('2023-01-11', '2023-01-13'),
('2023-01-11', '2023-01-18'),
('2023-01-11', '2023-02-02');
How can I write a select statement to get the desired output:
-- desired output
-- +──────────────+───────────────────+─────────────────────────────────+
-- | outset_date | date_of_interest | how_many_intervals_have_passed |
-- +──────────────+───────────────────+─────────────────────────────────+
-- | 2023-01-11 | 2023-01-13 | 1 |
-- | 2023-01-11 | 2023-01-18 | 2 |
-- | 2023-01-11 | 2023-02-02 | 6 |
-- +──────────────+───────────────────+─────────────────────────────────+
If there isn't an "idimoatic" SQL syntax for this, I'd opt for either MySQL or PostgreSQL. Thanks!
To count the difference between two dates in days, you need to subtract the oldest date from the earliest one, i.e. "2023-01-13" - "2023-01-11" = 2
In your case, you need the number of days between the two dates including the first and last dates, this means you need to add 1 day to the difference in days, i.e. "2023-01-13" - "2023-01-11" + 1 = 3
To get the 4 days interval in which a date lies, simply add 3 to the calculated date difference then perform integer division by 4. i.e. for differences (1, 2, 3, 4) it will be (4/4, 5/4, 6/4, 7/4) which equals to 1 for all.
For Postgres try the following:
select *,
(date_of_interest - outset_date + 4) / 4 as expected
from my_tbl
The + 4 here is +1 to calculate the difference between the two dates inclusively as mentioned above, and +3 to perform the integer division.
See demo.
For MySQL, it will be (datediff(date_of_interest, outset_date) + 4) div 4, where div operator is used to perform the integer division.
The basic solution for MySQL:
SELECT
outset_date,
date_of_interest,
CEIL(DATEDIFF(date_of_interest, outset_date) / 4) how_many_intervals_have_passed
FROM my_tbl;
test SQL here
PostgreSQL solution below:
SELECT
outset_date,
date_of_interest,
CEIL((date_of_interest - outset_date)::numeric / 4) how_many_intervals_have_passed
FROM my_tbl;

How to group a set of records (i.e. into a invoice / billing cycle)

For a set of invoice stage records by project I'm trying to determine a billing cycle using the ID of the starting invoice stage.
Here's the table - InvoiceStages
|ID| Project | StageDate | InvoiceStage | StageFlag | BillCycle |
|1 | abc123 | 10-May-18 | Finance | S | 1 |
|2 | abc123 | 15-May-18 | Review Draft | | 1 |
|4 | abc123 | 19-May-18 | Approved - NO Changes | | 1 |
|7 | abc123 | 21-May-18 | Final Invoice | E | 1 |
|9 | abc123 | 05-Jun-18 | Finance | S | 9 |
|12| abc123 | 07-Jun-18 | Review Draft | | 9 |
|15| abc123 | 09-Jun-18 | Approved - With Changes | | 9 |
|21| abc123 | 10-Jun-18 | Review Draft | | 9 |
|25| abc123 | 12-Jun-18 | Approved - NO Changes | | 9 |
|40| abc123 | 13-Jun-18 | Final Invoice | E | 9 |
|3 | xyz789 | 15-May-18 | Finance | S | 3 |
|5 | xyz789 | 19-May-18 | Review Draft | | 3 |
|6 | xyz789 | 20-May-18 | Approved - NO Changes | | 3 |
|8 | xyz789 | 22-May-18 | Final Invoice | E | 3 |
|10| xyz789 | 06-Jun-18 | Finance | S | 10 |
|11| xyz789 | 07-Jun-18 | Review Draft | | 10 |
|18| xyz789 | 09-Jun-18 | Approved - NO Changes | | 10 |
|22| xyz789 | 11-Jun-18 | Final Invoice | E | 10 |
I've looked at LAG / LEAD but wasn't sure if that would be the best option.
Select
ID
, Project
, StageDate
, InvoiceStage
, StageFlag
, ?? As BillCycle
From InvoiceStages
I expect the output for BillCycle to be the ID of the first record where StageFlag = 'S' for all records up to and including the end stage 'E'. Then the next set will start with the ID starting with 'S' again.
You can assign a group using a cumulative sum and then use a window function to get the value:
select i.*,
max(id) over (partition by project, grp) as invoice_id
from (select i.*,
sum(case when i.stageflag = 'S' then 1 else 0 end) over (partition by i.project order by i.stagedate) as grp
from InvoiceStages i
) i;
If the ids increase along with the date, you can do this without a subquery:
select i.*,
max(case when stageflag = 'S' then id end) over (partition by project) as invoice_id
from invoicestages i;

Selecting all rows in which id is distinct

Hi i need some advice on how to do a select statement on selecting all rows in which the phone number acts as a measure of "distinction".
Example of what i have.
|ID |Name |Phone Number| Address |
| | | | |
|1 |John | 1234567 | A.Road 1 |
|1 |John | 1234567 | B.Road 2 |
|2 |Jane | 7654321 | C.Road 3 |
|3 |Jim | 7654321 | C.road 3 |
Example of what i want:
|ID |Name |Phone Number| Address |
| | | | |
|1 |John | 1234567 | A.Road 1 |
|2 |Jane | 7654321 | C.Road 3 |
Regarding on which of the rows SQL chooses to pic on the result doesn't matter only that the whole row is available and that it makes a selection of distinct phone numbers. Hope you understand what i'm trying to do here.
ANSI SQL supports the row_number() function, which is a typical solution:
select t.*
from (select t.*,
row_number() over (partition by phone_number order by id) as seqnum
from t
) t
where seqnum = 1;

Change in foreign table affect the previous report

Actually I am stuck in one issue. I have a table:
tbl_color
+------------+
|id | name |
|---|--------|
|1 | Red |
|---|--------|
|2 | Blue |
|---|--------|
|3 | Black |
+------------+
tbl_clothes
+----------------+
|id | name |
| 1 | Pant |
| 2 | Shirt |
| 3 | T-shirt |
+----------------+
tb_sales
+---------------------------------------+
|id | id_cloth | id_color | sales_date |
|---|----------|-----------|------------|
|1 | 1 | 1 | 2016/1/1 |
|---|----------|-----------|------------|
|2 | 1 | 3 | 2016/1/1 |
|---|----------|-----------|------------|
|3 | 1 | 1 | 2016/2/2 |
+---------------------------------------+
So when I change one row of tbl_color to
tbl_color
+---------------------------+
|id | name | modified_on |
|----|--------|-------------|
|1 | Orange | 2016/3/2 |
|----|--------|-------------|
|2 | Blue | 2016/1/2 |
|----|--------|-------------|
|3 | Black | 2016/1/2 |
+---------------------------+
So when I want to get report of sales on 2016/1/1
SELECT * from table tb_sales
JOIN tbl_clothes ON tbl_clothes.id = tbl_sales.id_cloth
JOIN tbl_sales ON tbl_color.id = tbl_sales.id_color
where sales_date = '2016/1/1'
I get the report that have been modified no the original sales
How can I handle this issue?

Select distinct from multiple columns and its matching values sum

I have this table named Orders.
Each row of the table represents an order made by a customer.
prod means product
+-----------------------------------------------------------------------------------+
| prod_1 | prod_1_qty | prod_2 |prod_2_qty | prod_3 | prod_3_qty |
|-------------------------------------------------------|---------------------------|
| chair | 3 | board |9 | bed |4 |
| board | 8 | door |2 | desk |2 |
| chair | 2 | window |1 | door |6 |
| desk | 4 | chair |3 | sofa |1 |
I would like to write a query that returns the quantity of each product ordered like this:
+---------------------------+
| product | product_qty |
|---------------------------|
| chair | 8 |
| board | 17 |
| door | 8 |
| window | 1 |
| sofa | 1 |
| bed | 4 |
| desk | 6 |
Is there any way to achieve this using T-SQL, and if so, what is the query one would use to do this?
SELECT x.prod
, SUM(x.prod_qty) AS total
FROM (
SELECT prod1 AS prod, prod_qty AS total FROM table
UNION ALL
SELECT prod2 AS prod, prod_qty AS total FROM table
UNION ALL
SELECT prod3 AS prod, prod_qty AS total FROM table
) x
GROUP BY x.prod