Comparing Column Values and returning ERROR or OK - sql

I'm needing to verify a source system with a destination system and ensure the values are matching between them. The problem is the source system is a total mess and is proving hard to validate.
I've got the following sample data where they should all be OK, but they're showing as ERROR. Does anyone know a way of doing a comparison that would result as an OK for all for the below?
CREATE TABLE #testdata (
ID INT
,ValueSource VARCHAR(800)
,ValueDestination VARCHAR(800)
,Value_Varchar_Check AS (
CASE
WHEN coalesce(ValueSource, '0') = coalesce(ValueDestination, '0')
THEN 'OK'
ELSE 'ERROR'
END
)
)
INSERT INTO #testdata (
ID
,ValueSource
,ValueDestination
)
SELECT 1
,'hepatitis c,other (specify)' 'hepatitis c, other (specify)'
UNION ALL
SELECT 2
,'lung problems / asthma,lung problems / asthma'
,'lung problems / asthma'
UNION ALL
SELECT 3
,'lung problems / asthma,diabetes'
,'diabetes, lung problems / asthma'
UNION ALL
SELECT 4
,'seizures/epilepsy,hepatitis c,seizures/epilepsy'
,'hepatitis c, seizures/epilepsy'

I don't think you can write this as a generated column as it is quite a tricky thing to compute. If you are using SQL Server 2016 or later, you can use STRING_SPLIT to convert the ValueSource and ValueDestination values into tables and then sort them alphabetically using a query like this:
SELECT DISTINCT ID, TRIM(value) AS value,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueSource, ',')
For ValueSource, this produces:
ID value rn
1 hepatitis c 1
1 other (specify) 2
2 lung problems / asthma 1
3 diabetes 1
3 lung problems / asthma 2
4 hepatitis c 1
4 seizures/epilepsy 2
You can then FULL OUTER JOIN those two tables on ID, value and rn, and detect an error when there are null values from either side (since that implies that the values for a given ID and rn don't match):
WITH t1 AS (
SELECT DISTINCT ID, TRIM(value) AS value,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueSource, ',')
),
t2 AS (
SELECT DISTINCT ID, TRIM(value) AS value,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueDestination, ',')
)
SELECT COALESCE(t1.ID, t2.ID) AS ID,
CASE WHEN COUNT(CASE WHEN t1.value IS NULL OR t2.value IS NULL THEN 1 END) > 0 THEN 'Error'
ELSE 'OK'
END AS Status
FROM t1
FULL OUTER JOIN t2 ON t2.ID = t1.ID AND t2.rn = t1.rn AND t2.value = t1.value
GROUP BY COALESCE(t1.ID, t2.ID)
Output (for your sample data):
ID Status
1 OK
2 OK
3 OK
4 OK
Demo on SQLFiddle
You can then use the entire query above as a CTE (call it t3) to update your original table:
UPDATE t
SET t.Value_Varchar_Check = t3.Status
FROM testdata t
JOIN t3 ON t.ID = t3.ID
Output:
ID ValueSource ValueDestination Value_Varchar_Check
1 hepatitis c,other (specify) hepatitis c, other (specify) OK
2 lung problems / asthma,lung problems / asthma lung problems / asthma OK
3 lung problems / asthma,diabetes diabetes, lung problems / asthma OK
4 seizures/epilepsy,hepatitis c,seizures/epilepsy hepatitis c, seizures/epilepsy OK
Demo on SQLFiddle

Related

How to add rows to a specific number multiple times in the same query

I already asked for help on a part of my problem here.
I used to get 10 rows no matter if there are filled or not. But now I'm facing something else where I need to do it multiple times in the same query result.
WITH NUMBERS AS
(
SELECT 1 rowNumber
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 9
UNION ALL
SELECT 10
)
SELECT DISTINCT sp.SLC_ID, c.rowNumber, c.PCE_ID
FROM SELECT_PART sp
LEFT JOIN (
SELECT b.*
FROM NUMBERS
LEFT OUTER JOIN (
SELECT a.*
FROM (
SELECT SELECT_PART.SLC_ID, ROW_NUMBER() OVER (ORDER BY SELECT_PART.SLC_ID) as
rowNumber, SELECT_PART.PCE_ID
FROM SELECT_PART
WHERE SELECT_PART.SLC_ID = (must be the same as sp.SLC_ID and can''t hardcode it)
) a
) b
ON b.rowNumber = NUMBERS.rowNumber
) c ON c.SLC_ID = sp.SLC_ID
ORDER BY sp.SLC_ID, c.rowNumber
It works fine for the first 10 lines, but next SLC_ID only got 1 empty line
I need it to be like that
SLC_ID rowNumer PCE_ID
1 1 0001
1 2 0002
1 3 NULL
1 ... ...
1 10 NULL
2 1 0011
2 2 0012
2 3 0013
2 ... ...
2 10 0020
3 1 0021
3 ... ...
Really need it that way to build a report.
Instead of manually building a query-specific number list where you have to include every possible number you need (1 through 10 in this case), create a numbers table.
DECLARE #UpperBound INT = 1000000;
;WITH cteN(Number) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id]) - 1
FROM sys.all_columns AS s1
CROSS JOIN sys.all_columns AS s2
)
SELECT [Number] INTO dbo.Numbers
FROM cteN WHERE [Number] <= #UpperBound;
CREATE UNIQUE CLUSTERED INDEX CIX_Number ON dbo.Numbers([Number])
WITH
(
FILLFACTOR = 100, -- in the event server default has been changed
DATA_COMPRESSION = ROW -- if Enterprise & table large enough to matter
);
Source: mssqltips
Alternatively, since you can't add data, use a table that already exists in SQL Server.
WITH NUMBERS AS
(
SELECT DISTINCT Number as rowNumber FROM master..spt_values where type = 'P'
)
SELECT DISTINCT sp.SLC_ID, c.rowNumber, c.PCE_ID
FROM SELECT_PART sp
LEFT JOIN (
SELECT b.*
FROM NUMBERS
LEFT OUTER JOIN (
SELECT a.*
FROM(
SELECT SELECT_PART.SLC_ID, ROW_NUMBER() OVER (ORDER BY SELECT_PART.SLC_ID) as
rowNumber, SELECT_PART.PCE_ID
FROM SELECT_PART
WHERE SELECT_PART.SLC_ID = (must be the same as sp.SLC_ID and can''t hardcode it)
) a
) b
ON b.rowNumber = NUMBERS.rowNumber
) c ON c.SLC_ID = sp.SLC_ID
ORDER BY sp.SLC_ID, c.rowNumber
NOTE: Max value for this solution is 2047

Consolidate information (time serie) from two tables

MS SQL Server
I have two tables with different accounts from the same customer:
Table1:
ID
ACCOUNT
FROM
TO
1
A
01.10.2019
01.12.2019
1
A
01.02.2020
09.09.9999
and table2:
ID
ACCOUNT
FROM
TO
1
B
01.12.2019
01.01.2020
As result I want a table that summarize the story of this costumer and shows when he had an active account and when he doesn't.
Result:
ID
FROM
TO
ACTIV Y/N
1
01.10.2019
01.01.2020
Y
1
02.01.2020
31.01.2020
N
1
01.02.2020
09.09.9999
Y
Can someone help me with some ideas how to proceed?
This is the typical gaps and island problem, and it's not usually easy to solve.
You can achieve your goal using this query, I will explain it a little bit.
You can test on this db<>fiddle.
First of all... I have unified your two tables into one to simplify the query.
-- ##table1
select 1 as ID, 'A' as ACCOUNT, convert(date,'2019-10-01') as F, convert(date,'2019-12-01') as T into ##table1
union all
select 1 as ID, 'A' as ACCOUNT, convert(date,'2020-02-01') as F, convert(date,'9999-09-09') as T
-- ##table2
select 1 as ID, 'B' as ACCOUNT, convert(date,'2019-12-01') as F, convert(date,'2020-01-01') as T into ##table2
-- ##table3
select * into ##table3 from ##table1 union all select * from ##table2
You can then get your gaps and island using, for example, a query like this.
It combines recursive cte to generate a calendar (cte_cal) and lag and lead operations to get the previous/next record information to build the gaps.
with
cte_cal as (
select min(F) as D from ##table3
union all
select dateadd(day,1,D) from cte_cal where d < = '2021-01-01'
),
table4 as (
select t1.ID, t1.ACCOUNT, t1.F, isnull(t2.T, t1.T) as T, lag(t2.F, 1,null) over (order by t1.F) as SUP
from ##table3 t1
left join ##table3 t2
on t1.T=t2.F
)
select
ID,
case when T = D then F else D end as "FROM",
isnull(dateadd(day,-1,lead(D,1,null) over (order by D)),'9999-09-09') as "TO",
case when case when T = D then F else D end = F then 'Y' else 'N' end as "ACTIV Y/N"
from (
select *
from cte_cal c
cross apply (
select t.*
from table4 t
where t.SUP is null
and (
c.D = t or
c.D = dateadd(day,1,t.T)
)
) t
union all
select F, * from table4 where T = '9999-09-09'
) p
order by 1
option (maxrecursion 0)
Dates like '9999-09-09' must be treated like exceptions, otherwise I would have to create a calendar until that date, so the query would take long time to resolve.

How to unnest and pivot two columns in BigQuery

Say I have a BQ table containing the following information
id
test.name
test.score
1
a
5
b
7
2
a
8
c
3
Where test is nested. How would I pivot test into the following table?
id
a
b
c
1
5
7
2
8
3
I cannot pivot test directly, as I get the following error message at pivot(test): Table-valued function not found. Previous questions (1, 2) don't deal with nested columns or are outdated.
The following query looks like a useful first step:
select a.id, t
from `table` as a,
unnest(test) as t
However, this just provides me with:
id
test.name
test.score
1
a
5
1
b
7
2
a
8
2
c
3
Conditional aggregation is a good approach. If your tables are large, you might find that this has the best performance:
select t.id,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'a') as a,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'b') as b,
(select max(tt.score) from unnest(t.score) tt where tt.name = 'c') as c
from `table` t;
The reason I recommend this is because it avoids the outer aggregation. The unnest() happens without shuffling the data around -- and I have found that this is a big win in terms of performance.
One option could be using conditional aggregation
select id,
max(case when test.name='a' then test.score end) as a,
max(case when test.name='b' then test.score end) as b,
max(case when test.name='c' then test.score end) as c
from
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id
Below is generic/dynamic way to handle your case
EXECUTE IMMEDIATE (
SELECT """
SELECT id, """ ||
STRING_AGG("""MAX(IF(name = '""" || name || """', score, NULL)) AS """ || name, ', ')
|| """
FROM `project.dataset.table` t, t.test
GROUP BY id
"""
FROM (
SELECT DISTINCT name
FROM `project.dataset.table` t, t.test
ORDER BY name
)
);
If to apply to sample data from your question - output is
Row id a b c
1 1 5 7 null
2 2 8 null 3

Getting Number of Common Values from 2 comma-seperated strings

I have a table that contains comma-separated values in a column In Postgres.
ID PRODS
--------------------------------------
1 ,142,10,75,
2 ,142,87,63,
3 ,75,73,2,58,
4 ,142,2,
Now I want a query where I can give a comma-separated string and it will tell me the number of matches between the input string and the string present in the row.
For instance, for input value ',142,87,', I want the output like
ID PRODS No. of Match
------------------------------------------------------------------------
1 ,142,10,75, 1
2 ,142,87,63, 2
3 ,75,73,2,58, 0
4 ,142,2, 1
Try this:
SELECT
*,
ARRAY(
SELECT
*
FROM
unnest(string_to_array(trim(both ',' from prods), ','))
WHERE
unnest = ANY(string_to_array(',142,87,', ','))
)
FROM
prods_table;
Output is:
1 ,142,10,75, {142}
2 ,142,87,63, {142,87}
3 ,75,73,2,58, {}
4 ,142,2, {142}
Add the cardinality(anyarray) function to the last column to get just a number of matches.
And consider changing your database design.
Check This.
select T.*,
COALESCE(No_of_Match,'0')
from TT T Left join
(
select ID,count(ID) No_of_Match
from (
select ID,unnest(string_to_array(trim(t.prods, ','), ',')) A
from TT t)a
Where A in ('142','87')
group by ID
)B
On T.Id=b.id
Demo Here
OutPut
If you install the intarray extension, this gets quite easy:
select id, prods, cardinality(string_to_array(trim(prods, ','), ',')::int[] & array[142,87])
from bad_design;
Otherwise it's a bit more complicated:
select bd.id, bd.prods, m.matches
from bad_design bd
join lateral (
select bd.id, count(v.p) as matches
from unnest(string_to_array(trim(bd.prods, ','), ',')) as l(p)
left join (
values ('142'),('87') --<< these are your input values
) v(p) on l.p = v.p
group by bd.id
) m on m.id = bd.id
order by bd.id;
Online example: http://rextester.com/ZIYS97736
But you should really fix your data model.
with data as
(
select *,
unnest(string_to_array(trim(both ',' from prods), ',') ) as v
from myTable
),
counts as
(
select id, count(t) as c from data
left join
( select unnest(string_to_array(',142,87,', ',') ) as t) tmp on tmp.t = data.v
group by id
order by id
)
select t1.id, t1.prods, t2.c as "No. of Match"
from myTable t1
inner join counts t2 on t1.id = t2.id;

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'
In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings
The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.
From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.
Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.