How to compare two adjacent rows in SQL?

How to compare two adjacent rows in SQL? - sql

In SQL is there a way to compare two adjacent rows. In other words if C2 = BEM and C3 = Compliance or if C4 = Compliance and C5 = BEM, then return true. But consecutive rows are identical like in C6 = BEM and C7 = BEM, then return fail.

Check out the lead() and lag() functions.
They do work best (most reliable) with a sorting field... Your sample does not appear to contain such a field. I added a sorting field in my second solution.
The coalesce() function handles the first row that does not have a preceeding row.
Solution 1 without sort field
create table data
(
A nvarchar(10)
);
insert into data (A) values
('BEM'),
('Compliance'),
('BEM'),
('Compliance'),
('BEM'),
('Compliance'),
('Compliance'),
('Compliance');
select d.A,
coalesce(lag(d.A) over(order by (select null)), '') as lag_A,
case
when d.A <> coalesce(lag(d.A) over(order by (select null)), '')
then 'Ok'
else 'Fail'
end as B
from data d;
Solution 2 with sort field
create table data2
(
Sort int,
A nvarchar(10)
);
insert into data2 (Sort, A) values
(1, 'BEM'),
(2, 'Compliance'),
(3, 'BEM'),
(4, 'Compliance'),
(5, 'BEM'),
(6, 'Compliance'),
(7, 'Compliance'),
(8, 'Compliance');
select d.A,
case
when d.A <> coalesce(lag(d.A) over(order by d.Sort), '')
then 'Ok'
else 'Fail'
end as B
from data2 d
order by d.Sort;
Fiddle with results.

As a starter: a SQL table represents an unordered set of rows; there is no inherent ordering. Assuming that you have a column that defines the ordering of the rows, say id, and that your values are stored in column col, you can use lead() and a case expression as follows:
select col,
case when col = lead(col, 1, col) over(order by id)
then 'Fail' else 'OK'
end as status
from mytable t

Assuming you have some sort of column that you can use to determine the row order then you can use the LEAD window function to get the next value.
SELECT
[A],
CASE
WHEN [A] = LEAD([A], 1, [A]) OVER (ORDER BY SomeSortIndex) THEN 'Fail'
ELSE 'Ok'
END [B]
FROM src
The additional parameters in the LEAD function specify the row offset and default value in case there is no additional row. By using the current value as the default it will cause the condition to be true and display Fail like the last result in your example.

Related

How to turn a column of timestamps into two columns depending on separate column without splitting rows

I have a table that has a column for timestamps and another column that has statuses. I want to grab the timestamp when the status is checked-in as well as the timestamp when it is completed into one row. When I try to use a case statement I end up with it split into two rows. I am wanting it to return one row with the values in each timestamp column rather than two rows one for each with the null in the other.
CASE WHEN aud.STATUS_DESCRIPTION = 'CHECKED_IN' THEN aud.STATUS_DATETIME
END AS "Check-In Time",
CASE WHEN aud.STATUS_DESCRIPTION = 'COMPLETED' THEN aud.STATUS_DATETIME
END AS "Completed Time",
Table with statuses and timestamps
What my case statement is returning
Thank you for any help

I'd start with self-join.
SELECT
chcecked.STATUS_DATETIME as CHECKED_IN_TIME,
completed.STATUS_DATETIME as COMPLETED_TIME
FROM
yourtable as checked
JOIN
yourtable as completed
ON ....

This is just an example of how to use pivot which is an addition to Simeon's answer.Using sample data from the image provided.
Table creation and data insertion:
create or replace temporary table _temp (
ts timestamp_ntz,
_status varchar
);
insert into _temp
values ('2021-12-11 11:12:03','created'),
('2021-12-11 11:12:03','checked_in'),
('2021-12-11 11:22:49','progress'),
('2021-12-11 11:55:03','completed');
Pivot query:
select *
from _temp
pivot(max(ts) for _status in ('checked_in', 'completed')) as p;
Result:
'checked_in' 'completed'
2021-12-11 11:12:03.000 2021-12-11 11:55:03.000
Note that I've used MAX aggregate function which can be replaced by other aggregate functions. This would always return a single row if there are only 2 columns, to get a better sense of pivot have another column and take a look at examples provided in Pivot's doc.

This is happening because your data is across many rows.
You ether need to do some form of aggregation, so GROUP BY and then using an aggregate function like MIN/MAX
OR You need to classify the data you want, and then use a PIVOT to do the aggregation for you.
The first might look like:
SELECT
some_column_a,
some_column_b,
MAX(IFF( aud.status_description = 'CHECKED_IN', aud.status_datetime, null)) as check_in_time
MAX(IFF( aud.status_description = 'COMPLETED', aud.status_datetime, null)) as complete_time
FROM table
GROUP BY some_column_a, some_column_b
ORDER BY some_column_a, some_column_b;
So adding a working example
WITH data AS (
SELECT to_date(column1) as STATUS_DATETIME,
column2 as STATUS_DESCRIPTION,
column3 as customer_id
FROM VALUES
('2021-12-11 11:12:03','CREATED', 1),
('2021-12-11 11:12:03','CHECKED_IN', 1),
('2021-12-11 11:22:49','PROGRESS', 1),
('2021-12-11 11:55:03','COMPLETED', 1),
('2021-10-11 11:55:03','COMPLETED', 0)
)
SELECT
aud.customer_id,
MAX(IFF( aud.status_description = 'CHECKED_IN', aud.status_datetime, null)) as check_in_time,
MAX(IFF( aud.status_description = 'COMPLETED', aud.status_datetime, null)) as complete_time
FROM data as aud
GROUP BY 1
ORDER BY 1;
This example works well if you have many customer_id's and many entries per customer_id. If how every your table size is small, and you never have two records in the "completed" state then the join can work.
WITH data AS (
SELECT to_date(column1) as STATUS_DATETIME,
column2 as STATUS_DESCRIPTION,
column3 as customer_id
FROM VALUES
('2021-12-11 11:12:03','CREATED', 1),
('2021-12-11 11:12:03','CHECKED_IN', 1),
('2021-12-11 11:22:49','PROGRESS', 1),
('2021-12-11 11:55:03','COMPLETED', 1),
('2021-10-11 11:55:03','COMPLETED', 0)
)
SELECT
checked.customer_id,
checked.status_datetime as check_in_time,
completed.status_datetime as complete_time
FROM data as checked
JOIN data as completed
ON checked.customer_id = completed.customer_id
AND checked.STATUS_DESCRIPTION = 'CHECKED_IN'
AND completed.STATUS_DESCRIPTION = 'COMPLETED'
;
The place the join does not work is if you do not have both "completed" and "checked_in". For the above SQL there is no row for customer_id 0. Because there is only one
So for that you need a full outer join, and then it makes sense to move the filters to a CTE (or sub select), like so:
WITH data AS (
SELECT to_date(column1) as STATUS_DATETIME,
column2 as STATUS_DESCRIPTION,
column3 as customer_id
FROM VALUES
('2021-12-11 11:12:03','CREATED', 1),
('2021-12-11 11:12:03','CHECKED_IN', 1),
('2021-12-11 11:22:49','PROGRESS', 1),
('2021-12-11 11:55:03','COMPLETED', 1),
('2021-10-11 11:55:03','COMPLETED', 0)
), completed_data AS (
SELECT STATUS_DATETIME, STATUS_DESCRIPTION, customer_id
FROM data
WHERE STATUS_DESCRIPTION = 'COMPLETED'
), checked_in_data AS (
SELECT STATUS_DATETIME, STATUS_DESCRIPTION, customer_id
FROM data
WHERE STATUS_DESCRIPTION = 'CHECKED_IN'
)
SELECT
COALESCE(checked.customer_id, completed.customer_id) AS customer_id,
checked.status_datetime as check_in_time,
completed.status_datetime as complete_time
FROM checked_in_data as checked
FULL OUTER JOIN completed_data as completed
ON checked.customer_id = completed.customer_id
ORDER BY 1,2;
;
which gives the output:
CUSTOMER_ID
CHECK_IN_TIME
COMPLETE_TIME
0
2021-10-11
1
2021-12-11
2021-12-11

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019

You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo

Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);

My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

How can I test whether all of the rows in a table are duplicated (except for one column)

I am working with a datawarehouse table that has can be split into claimed rows, and computed rows.
I suspect that the computed rows are perfect duplicates of the claimed row (with the exception of the claimed/computed column).
I tried to test this using the except clause:
But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.
Is there a way to compare the records which will compare nulls to nulls?
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'clm'
EXCEPT
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'cmp'
But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.
Is there a way to compare the records which will compare nulls to nulls?
edit: the solution should work with an arbitrary number of fields, with varying types. In this case, I have ~100 fields, 2/3 of which may have null values. This is a data warehouse, and some degree of denormalization must be expected.
edit: I tested the query while limiting myself to non-null columns, and I got the result I expected (nothing).
But, I would still like to compare fields which potentially contain null values.

Your supposition would appear to be false. You might try this:
select a, b, c,
sum(case when clm_cmp_cd = 'clm' then 1 else 0 end) as num_clm,
sum(case when clm_cmp_cd = 'cmp' then 1 else 0 end) as num_cmp
from t
group by a, b, c;
This will show you the values of the three columns and the number of matches of each type.
Your problem is probably that values that look alike are not exactly the same. This could be due to slight differences in floating point number or due to unmatched characters in the string, such as leading spaces.

Let's look how Db2 works with NULL values in GROUP BY and INTERSECT:
with t(a, b, clm_cmp_cd) as (values
( 1, 1, 'clm')
, ( 1, 1, 'cmp')
, (null, 1, 'clm')
, (null, 1, 'cmp')
, ( 2, 1, 'cmp')
)
select a, b
from t
where clm_cmp_cd='clm'
intersect
select a, b
from t
where clm_cmp_cd='cmp';
with t(a, b, clm_cmp_cd) as (values
( 1, 1, 'clm')
, ( 1, 1, 'cmp')
, (null, 1, 'clm')
, (null, 1, 'cmp')
, ( 2, 1, 'cmp')
)
select a, b
from t
where clm_cmp_cd in ('clm', 'cmp')
group by a, b
having count(1)>1;
Both queries return the same result:
A B
-- --
1 1
<null> 1
NULL values are treated as the same by these operators.
If you have too many columns in your table to specify them manually in your query, you may produce the column list with the following query:
select listagg(colname, ', ')
from syscat.columns
where tabschema='MYSCHEMA' and tabname='TABLE' and colname<>'CLM_CMP_CD';

SQL Server UDF array inputs and outputs

I have a set of columns CODE_1-10, which contain diagnostic codes. I want to create a set of variables CODE_GROUP_1-17, which indicate whether or not one of some particular set of diagnostic codes matches any of the CODE_1-10 variables. For example, CODE_GROUP_1 = 1 if any of CODE_1-10 match either '123' or '456', and CODE_GROUP_2 = 1 if any of CODE_1-10 match '789','111','333','444' or 'foo'.
Here's an example of how you could do this using values constructors.
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('123', '456')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_1,
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('789','111','333','444','foo')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_2
I am wondering if there is another way to do this that is more efficient. Is there a way to make a CLR UDF that takes an array of CODE_1-10, and outputs a set of columns CODE_GROUP_1-17?

You could at least avoid the repetition of FROM (VALUES ...) like this:
SELECT
CODE_GROUP_1 = COUNT(DISTINCT CASE WHEN val IN ('123', '456') THEN 1 END),
CODE_GROUP_2 = COUNT(DISTINCT CASE WHEN val IN ('789','111','333','444','foo') THEN 1 END),
...
FROM
(
VALUES
(CODE_1),
(CODE_2),
(CODE_3),
(CODE_4),
(CODE_5),
(CODE_6),
(CODE_7),
(CODE_8),
(CODE_9),
(CODE_10)
) AS value(val)
If CODE_1, CODE_2 etc. are column names, you can use the above query as a derived table in CROSS APPLY:
SELECT
...
FROM
dbo.atable -- table containing CODE_1, CODE_2 etc.
CROSS APPLY
(
SELECT ... -- the above query
) AS x
;

Can you create 2 new tables with the columns appended as rows? So one table would be dxCode with a source column if you need to retain the 1-10 value and the dx code and whatever key field(s) you need, the other table would be dxGroup with your 17 groups, the source groupID if you need it, and your target dx values.
Then to determine which codes are in which groups, you can join on your dx fields.

SQL Server - How to delete some rows of columns without disrupting the rest of the record

I have it
-- -- -- --
01 A1 B1 99
01 A1 B1 98
02 A2 B2 97
02 A2 B2 96
I need this
-- -- -- --
01 A1 B1 99
98
02 A2 B2 97
96
------------
I can not repeat the data that I will present in a excel,
My result needs to be just so.
In my actual table, the last column are responses of forms and the first columns (those that can not repeat) are customer data as (phone, name ...).
The end result of this "query" will populate a "DataTable" and will be presented in a file "xlsx".
Thanks for sharing knowledge ^^

If you have SQL2012+
SELECT
ISNULL(NULLIF(Column1,LAG(Column1) OVER(ORDER BY Column1)),'')
,ISNULL(NULLIF(Column2,LAG(Column2) OVER(ORDER BY Column1,Column2)),'')
,ISNULL(NULLIF(Column3,LAG(Column3) OVER(ORDER BY Column1,Column2,Column3)),'')
,Column4
FROM #mytable
ORDER BY Column1,Column2,Column3,Column4 DESC

It's a little messy, but you can do it in the database. You basically make a subquery that gets the smallest value, and then join that to the regular table and blank out values that don't match. I created your sample set like this:
CREATE TABLE mytable (N1 VARCHAR(2), A VARCHAR(2), B VARCHAR(2), N2 VARCHAR(2))
INSERT INTO mytable VALUES
('01', 'A1', 'B1', '99'),
('01', 'A1', 'B1', '98'),
('02', 'A2', 'B2', '97'),
('02', 'A2', 'B2', '96')
And then was able to get the result like this:
SELECT
CASE WHEN O.N2 = I.N2 THEN O.N1 ELSE '' END,
CASE WHEN O.N2 = I.N2 THEN O.A ELSE '' END,
CASE WHEN O.N2 = I.N2 THEN O.B ELSE '' END,
O.N2
FROM
(SELECT MAX(N2) AS N2, N1, A, B FROM mytable GROUP BY N1, A, B) I
INNER JOIN mytable O
ON O.A = I.A AND O.B = I.B AND O.N1 = I.N1
ORDER BY O.N1 ASC

we can use ROW_NUMBER to get the sequence and substitute '' for all rows where sequence is greater than 1
with CTE
AS
( SELECT ID, ColumnA, ColumnB, value,ROW_NUMBER() over ( PARTITION by id order by id) as seq
FROM tableA
)
, CTE1
AS
(
select id, ColumnA, ColumnB, value, seq from CTE where seq =1
UNION
SELECT id ,'','', value , seq from CTE where seq >1
)
SELECT case when seq >1 THEN NULL ELSE id END as id, columnA, columnB, value from CTE1

You can achieve what you want using a query.
You haven't provided DDL so I am going to asume your columns are called a, b, c and d respectively
; WITH cte AS (
SELECT a
, b
, c
, d
, Row_Number() OVER (PARTITION BY a, b, c ORDER BY d) As sequence
FROM your_table
)
SELECT CASE WHEN sequence = 1 THEN a ELSE '' END As a
, CASE WHEN sequence = 1 THEN b ELSE '' END As b
, CASE WHEN sequence = 1 THEN c ELSE '' END As c
, d
FROM cte
ORDER
BY a
, b
, c
, d
The idea is to assign an incremental counter to each row, that restarts after each change of a + b + c.
We then use a conditional statement to show a value or not (basically only show on the first instance of each group)

The analytic ROW_NUMBER() function is good for this. I've made up column names because you didn't supply any. To assign a row number by customer, use something like this:
SELECT
Name,
Phone,
Address,
Response,
ROW_NUMBER() OVER (PARTITION BY Name, Phone, Address ORDER BY Response) AS CustRow
FROM myTable
That will assign row number within each customer. Try it yourself and I think it will make sense.
You can put it into a subquery or CTE from there and only show customer ID information like name, phone, and address when you're on the first row for each customer:
SELECT
CASE WHEN CustRow = 1 THEN Name ELSE '' END AS Name,
CASE WHEN CustRow = 1 THEN Phone ELSE '' END AS Phone,
CASE WHEN CustRow = 1 THEN Address ELSE '' END AS Address,
Response
FROM (
SELECT
Name,
Phone,
Address,
Response,
ROW_NUMBER() OVER (PARTITION BY Name, Phone, Address ORDER BY Response) AS CustRow
FROM myTable) custSubquery
ORDER BY Name, Phone, Address
The custSubquery on the second-to-last line is because SQL Server requires all subqueries to be aliased, even if the alias isn't used.
The most important thing is to determine how your last column will be ordered for display and to make sure that it's consistent in the ROW_NUMBER() function as well as the final ORDER BY.
If you need more help, please supply table and column names, and specify how results are ordered within each customer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to compare two adjacent rows in SQL? - sql

In SQL is there a way to compare two adjacent rows. In other words if C2 = BEM and C3 = Compliance or if C4 = Compliance and C5 = BEM, then return true. But consecutive rows are identical like in C6 = BEM and C7 = BEM, then return fail.

Related

How to turn a column of timestamps into two columns depending on separate column without splitting rows

Group by absorb NULL unless it's the only value

How can I test whether all of the rows in a table are duplicated (except for one column)

SQL Server UDF array inputs and outputs

SQL Server - How to delete some rows of columns without disrupting the rest of the record

Categories

Resources