SQL group by but order is important - sql

is there any option to group by items but order of grouping is important?
Let's assume I have table with hardware and it's assigned to some users. And this hardware has some states like broken, ok, service. I want to group this table to have information, how long user had this item, but state is not important.
What I have:
+----+-------+--------+------------+------------+
| id | owner | state | from | to |
+----+-------+--------+------------+------------+
| 1 | ow1 | ok | 01.02.2019 | 04.06.2019 |
| 2 | ow1 | broken | 04.06.2019 | 12.06.2019 |
| 3 | srvc | fixing | 12.06.2019 | 17.06.2019 |
| 4 | ow1 | ok | 17.06.2019 | null | -- null - still has
+----+-------+--------+------------+------------+
But I want to have:
+-------+------------+------------+
| owner | from | to |
+-------+------------+------------+
| ow1 | 01.02.2019 | 12.06.2019 | -- here we have min and max dates before state changed
| srvc | 12.06.2019 | 17.06.2019 |
| ow1 | 17.06.2019 | null | -- null - still has
+-------+------------+------------+
How to write query to achieve this result?

This looks like a gaps and islands problem. One solution is follows:
Mark rows where owner changes (different from previous row) with a value 1
Group all 1s and subsequent 0s together
I usually do this:
WITH cte1 AS (
SELECT *
, CASE WHEN owner = LAG(owner) OVER (PARTITION BY hardware_id ORDER BY [from]) THEN 0 ELSE 1 END AS chg
FROM t
), cte2 AS (
SELECT *
, SUM(chg) OVER (PARTITION BY hardware_id ORDER BY [from]) AS grp
FROM cte1
)
SELECT owner
, hardware_id
, grp
, MIN([from])
, MAX([to])
FROM cte2
GROUP BY owner, hardware_id, grp
I have assumed that you want separate results per every piece of hardware, remove the hardware column if that is not the case.
Demo on db<>fiddle

Try this below option with union all.
SELECT owner,from,to
FROM your_table
WHERE to IS NULL
UNION ALL
SELECT owner,MIN(from),MAX(to)
FROM your_table
WHERE to IS NOT NULL
GROUP BY owner

Related

How to compare the current row with all the others in PostgreSQL?

I have a table like this
| id | state | updatedate |
|:--------|:---------------|:------------|
| 1 | state_review | 1668603529 |
| 1 | state_review | 1668601821 |
| 1 | state_review_2 | 1668601821 |
| 2 | state_review | 1668601709 |
| 2 | state_review | 1668600822 |
| 2 | state_review_2 | 1668600747 |
| 3 | state_review | 1668559849 |
| 3 | state_review_2 | 1668539849 |
| 3 | state_review | 1668529849 |
| 3 | state_review_2 | 1661599849 |
| 3 | state_review | 1668599849 |
I'm trying to find how to count first occurance of changed state for all ids based on provided values, i have two incoming states from(state_review) to(state_review_2)
in this particular case there would be only three changed states that are going
from state_review -> state_review_2
resulting table would look like this
| amount |
|:--------|
| 3 |
I suspect window function might help with this but i'm not sure how to compare current state with all the others, states have to be ordered by id
Was trying to use this query, but that doesn't seem to work, instead of counting the latest unique transitions it counts all of them, if the first found transition doesn't match given states then skip the entire section for a certain id
SELECT
COUNT(DISTINCT (
CASE
WHEN
(
q.state = 'state_review'
AND 'state_review' != 'state_review_2'
)
THEN
ID
END
)) AS amount
FROM
(
SELECT
id,
state
FROM
states_table
WHERE
updatedate >= 1668603529
AND updatedate <= 1671599849
AND
(
state = 'state_review'
OR state = 'state_review_2'
)
ORDER BY
id, updatedate DESC
)
AS q
Transitions between 2 predefined states can be obtained with a LAG function.
Example with state_review and state_review_2
SELECT *
FROM(
SELECT ID, LAG(State) OVER (PARTITION BY ID ORDER BY updatedate) As FromState, State, updatedate
FROM States_table
) T
WHERE FromState = 'state_review' AND state = 'state_review2'
You can do variations of the above:
To avoid double-counting when an id transitioned from state S1 to state S2 several times, change the sub-query with DISTINCT and without updatedate like so: SELECT DISTINCT ID, LAG(State) OVER (PARTITION BY ID ORDER BY updatedate) As FromState, State
And of course, do SELECT COUNT(*) instead if all you want is the count.

postgresql aggregate by max string length

I have a one to many relationship. In this case, it's a pipelines entity that can have many segments. The segments entity has a column to list the wells associated with this pipeline. This column is purely informational, and is only updated from a regulatory source as a comma separated list, so the data type is text.
What I want to do is to list all the pipelines and show the segment column that has the most associated wells. Each well is identified with a standardized land location (text is the same length for each well). I am also doing other aggregate functions on the segments, so my query looks something like this (I have to simplify it because it's pretty large):
SELECT pipelines.*, max(segments.associated_wells), min(segments.days_without_production), max(segments.production_water_m3)
FROM pipelines
JOIN segments ON segments.pipeline_id = pipelines.id
GROUP BY pipelines.id
This selects the associated_wells that has the highest alphabetical value, which makes sense, but is not what I want.
max(length(segments.associated_wells)) will select the record I want, but only show the length. I need to show the column value.
How can I aggregate based on the string length but show the value?
Here's an example of what I am expecting:
Segment entity:
| id | pipeline_id | associated_wells | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1 | 1 | 'location1', 'location2' | 30 | 2.3 |
| 2 | 1 | 'location1' | 15 | 1.4 |
| 3 | 2 | 'location1' | 20 | 1.8 |
Pipeline entity:
| id | name |
|----|-------------|
| 1 | 'Pipeline1' |
| 2 | 'Pipeline2' |
| | |
Desired Query Result:
| id | name | associated_wells | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1 | 'Pipeline1' | 'location1', 'location2' | 15 | 2.3 |
| 2 | 'Pipeline2' | 'location1' | 20 | 1.8 |
| | | | | |
If I understand correctly, you want DISTINCT ON:
SELECT DISTINCT ON (p.id) p.*, s.*
FROM pipelines p JOIN
segments s
ON s.pipeline_id = p.id
ORDER BY p.id, LENGTH(s.associated_wells) DESC;
Keep normalising and verticalise the locations/associated wells, by cross joining with a series of integers, and then group twice:
WITH
segment(seg_id,pipeline_id,associated_wells,days_without_production,production_water_m3) AS (
SELECT 1,1,'location1, location2',30,2.3
UNION ALL SELECT 2,1,'location1',15,1.4
UNION ALL SELECT 3,2,'location1',20,1.8
)
,
pipeline(pipeline_id,name) AS (
SELECT 1,'Pipeline1'
UNION ALL SELECT 2,'Pipeline2'
)
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
location AS (
SELECT
seg_id
, i AS loc_id
, SPLIT_PART(associated_wells,', ',i) AS location
FROM segment CROSS JOIN i
WHERE SPLIT_PART(associated_wells,',' ,i) <>''
)
,
pregroup AS (
SELECT
segment.pipeline_id
, location.location
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3) AS production_water_m3
FROM segment
JOIN pipeline USING(pipeline_id)
JOIN location USING(seg_id)
GROUP BY 1,2
)
SELECT
pipeline_id
, STRING_AGG(location,',') AS locations
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3) AS production_water_m3
FROM pregroup
GROUP BY 1;
pipeline_id | locations | days_without_production | production_water_m3
-------------+---------------------+-------------------------+---------------------
1 | location1,location2 | 15 | 2.3
2 | location1 | 20 | 1.8

Calculate a field in Sub Query using fields of Main Query

I'm trying to calculate a lapse in products with a SQL:
SELECT DISTINCT ID_NO, START_DT, END_DT, TERM_NO
FROM DB1.TABLE1
ORDER BY ID_NO, START_DT;
I want to calculate the LAPSE when a ID has a second term. It would be the number of days between the END_DT of Term 1 and START_DT of Term 2.
I can do this easily in excel. But I'm new writing any advanced SQL. Can I get some direction or any sample to achieve this? I tried to google, but I'm having hard time trying to come up with correct search phrases.
You can use lag():
select t1.*,
nullif((start_dt - lag(end_dt) over (partition by id_no order by start_dt)), 0) as lapse
from table1 t1;
You can do it with a simple left self join:
select
t.*, t.start_dt - tt.end_dt as lapse
from tablename t left join tablename tt
on tt.id_no = t.id_no and tt.term_no = 1 and t.term_no = 2
You may change the ON clause to:
on tt.id_no = t.id_no and t.term_no - tt.term_no = 1
if there are other values also in the column termo_no like 1, 2, 3, 4....
See the demo.
Results:
ID_NO | START_DT | END_DT | TERM_NO | LAPSE
:------- | :-------- | :-------- | ------: | ----:
48965787 | 13-DEC-17 | 13-DEC-18 | 1 |
48965787 | 30-DEC-18 | 13-DEC-19 | 2 | 17
57896248 | 17-JAN-18 | 17-JAN-19 | 1 |
57896248 | 17-JAN-19 | 17-JAN-20 | 2 | 0
78515698 | 16-JUN-18 | 16-JUN-19 | 1 |
78515698 | 01-AUG-19 | 16-JUN-20 | 2 | 46

How to display output like this in SQL?

I have a table like the one shown below:
+----------------+-------+----------+---------+
| Name | Model | system | ItemTag |
+----------------+-------+----------+---------+
| Alarm Id | T58 | ASC | |
+----------------+-------+----------+---------+
| Door Lock | F48 | ASC | |
+----------------+-------+----------+---------+
| AlarmSounder | T58 | ASC | |
+----------------+-------+----------+---------+
| Card Reader | K12 | ASC | |
+----------------+-------+----------+---------+
| Magnetic Lock | F48 | ASC | |
+----------------+-------+----------+---------+
| T2 Card Reader | K12 | ASC | |
+----------------+-------+----------+---------+
| Power Supply | Null | ASC | |
+----------------+-------+----------+---------+
| Battery | Null| ASC | |
+----------------+-------+----------+---------+
Now I want to display the data like this:
+-------------+-------+--------+--------+
| Name | Model | system | count |
+-------------+-------+--------+--------+
| Alarm | T58 | ASC | 2 |
+-------------+-------+--------+--------+
| Door Lock | F58 | ASC | 2 |
+-------------+-------+--------+--------+
| Card Reader | K12 | ASC | 2 |
+-------------+-------+--------+--------+
|Power supply | Null | ASC | 1 |
+-------------+-------+--------+--------+
| Battery | Null | ASC | 1 |
+-------------+-------+--------+--------+
How to do it in SQL?
Updated
I also included null column as my second update.
You could use windowed functions:
SELECT Name, Model, system, cnt AS count
FROM (SELECT *, COUNT(*) OVER(PARTITION BY Model) AS cnt,
ROW_NUMBER() OVER(PARTITION BY Model ORDER BY ...) AS rn
FROM your_tab) AS sub
WHERE rn = 1;
Rextester Demo
Keep in mind that you need a column to sort so (id/timestamp) should be used to get first value in a group.
EDIT:
As i have different Name relating to null column. how can i seperate it out
SELECT Name, Model, system, cnt AS count
FROM (SELECT *, COUNT(*) OVER(PARTITION BY Model) AS cnt,
ROW_NUMBER() OVER(PARTITION BY Model ORDER BY id) AS rn
FROM my_tab
WHERE Model IS NOT NULL) AS sub
WHERE rn = 1
UNION ALL
SELECT Name, Model, system, 1
FROM my_tab
WHERE Model IS NULL;
RextesterDemo 2
You can have a simple query as below
SELECT MIN(Name) Name,
Model,
system,
COUNT(*) [count]
FROM yourtable
GROUP BY Model, system
Result
Name Model system count
Door Lock F58 ASC 2
Card Reader K12 ASC 2
Alarm Id T58 ASC 2
lad2025's solution simplified, calculate both NULL and NOT NULL in a single step and add some logic for the NULL rows:
SELECT Name, Model, system,
CASE WHEN Model IS NULL THEN 1 ELSE cnt END AS count
FROM
(
SELECT *,
COUNT(*) OVER(PARTITION BY Model) AS cnt,
ROW_NUMBER() OVER(PARTITION BY Model ORDER BY Name) AS rn
FROM my_tab
) AS sub
WHERE rn = 1 -- one row per model
OR Model IS NULL; -- all rows for the NULL model

SQL - How to find which page is the first for users?

I have a table like this:
+----------+-------------------------------------+----------------------------------+
| user_id | time | url |
+----------+-------------------------------------+----------------------------------+
| 1 | 02.04.2017 8:56 | www.landingpage.com/ |
| 1 | 02.04.2017 8:57 | www.landingpage.com/about-us |
| 1 | 02.04.2017 8:58 | www.landingpage.com/faq |
| 2 | 02.04.2017 6:34 | www.landingpage.com/about-us |
| 2 | 02.04.2017 6:35 | www.landingpage.com/how-to-order |
| 3 | 03.04.2017 9:11 | www.landingpage.com/ |
| 3 | 03.04.2017 9:12 | www.landingpage.com/contact |
| 3 | 03.04.2017 9:13 | www.landingpage.com/about-us |
| 3 | 03.04.2017 9:14 | www.landingpage.com/our-legacy |
| 3 | 03.04.2017 9:15 | www.landingpage.com/ |
+----------+-------------------------------------+----------------------------------+
I want to figure out which page is the first for most users (first page a user see when he comes to the site) and count the number of times it is viewed as the first page.
Is there a way to write a query to do this? I guess I need to use
MIN(time)
in conjunction with grouping but I don't know how.
So regarding the sample I provided it should be like:
url url_count
---------------------------------------------------
www.landingpage.com/ 2
www.landingpage.com/about-us 1
Thanks!
You're correct, you'll need to use the min() aggregate function within a subselect.
select
my_table.url
from
my_table
where
my_table.time = (
select
min(t.time)
from
my_table t
where
t.user_id = my_table.user_id
)
replace my_table with whatever your table is actually named.
To include how many pages the user has seen, you'll need something like this:
select
my_table.url
, (
select
count(t.url)
from
my_table t
where
t.user_id = my_table.user_id
) as url_count
from
my_table
where
my_table.time = (
select
min(t.time)
from
my_table t
where
t.user_id = my_table.user_id
)
SELECT *
FROM my_table
WHERE time IN
(
SELECT min(time)
FROM my_table
GROUP BY url
);
You can query as below:
Select top (1) with ties *
from yourtable
order by row_number() over(partition by user_id order by [time])
You can use outer query to get the same as below:
Select * from (
Select *, RowN = row_number() over(partition by user_id order by [time]) from yourtable) a
Where a.RowN = 1