Getting Results from inner join differences on same row - sql

Currently have sql returning a result set as below
WORKFLOWID UNMATCHEDVALUE MATCHEDADDRESS EXCEPTIONREASON
1001 UNIQUE ADDRESS1 (null)
1001 UNIQUE ADDRESS2 Some Value
What I am looking for is a result like this
WORKFLOWID UNMATCHEDVALUE MATCHEDADDRESS EXCEPTIONREASON MATCHEDADDRESS2 EXCEPTIONREASON2
1001 UNIQUE ADDRESS1 (null) ADDRESS2 Some Value
So the "variant" columns are MatchedAddress and Exception Reason, the other columns will be the same for each record. Note that for each workflow_id, will always have 2 rows coming back.
I have also created a fiddle to show the schema.
http://sqlfiddle.com/#!6/f7cde/3

Try this:
;WITH CTE AS
(
SELECT ws.id as WorkflowStepId,
ws.workflow_id as WorkflowId,
sg.unmatchValue as UnmatchedValue,
geo_address as MatchedAddress,
ws.exception_Value as ExceptionReason,
ROW_NUMBER() OVER(PARTITION BY ws.workflow_id ORDER BY ws.id) as RN
FROM workflow_step as ws
INNER JOIN workflow as gw
ON ws.workflow_id = gw.id
INNER JOIN super_group as sg
ON gw.super_group_id = sg.id
INNER JOIN alias on
ws.id = alias.workflow_step_id
)
SELECT WorkflowId,
UnmatchedValue,
MIN(CASE WHEN RN = 1 THEN MatchedAddress END) MatchedAddress,
MIN(CASE WHEN RN = 1 THEN ExceptionReason END) ExceptionReason,
MIN(CASE WHEN RN = 2 THEN MatchedAddress END) MatchedAddress2,
MIN(CASE WHEN RN = 2 THEN ExceptionReason END) ExceptionReason2
FROM CTE
GROUP BY WorkflowId,
UnmatchedValue
ORDER BY workflowId
Here is the modified sqlfiddle.
The results are:
╔════════════╦════════════════╦════════════════╦═════════════════╦═════════════════╦══════════════════╗
║ WORKFLOWID ║ UNMATCHEDVALUE ║ MATCHEDADDRESS ║ EXCEPTIONREASON ║ MATCHEDADDRESS2 ║ EXCEPTIONREASON2 ║
╠════════════╬════════════════╬════════════════╬═════════════════╬═════════════════╬══════════════════╣
║ 1001 ║ UNIQUE ║ ADDRESS1 ║ (null) ║ ADDRESS2 ║ Some Value ║
╚════════════╩════════════════╩════════════════╩═════════════════╩═════════════════╩══════════════════╝

Try this:
SELECT ws.workflow_id as WorkflowId, sg.unmatchValue as UnmatchedValue,
MAX(CASE WHEN ws.id = 1 THEN geo_address END) as MatchedAddress1,
MAX(CASE WHEN ws.id = 2 THEN geo_address END) as MatchedAddress2,
MAX(CASE WHEN ws.id = 1 THEN ws.exception_Value END) as ExceptionReason1,
MAX(CASE WHEN ws.id = 2 THEN ws.exception_Value END) as ExceptionReason2
FROM workflow_step as ws
INNER JOIN workflow as gw on ws.workflow_id = gw.id
INNER JOIN super_group as sg on gw.super_group_id = sg.id
inner JOIN alias on ws.id = alias.workflow_step_id
GROUP BY ws.workflow_id, sg.unmatchValue
SQL FIDDLE DEMO

Since I can't comment, I just wanted to point out that the answer given by Lamak is using a Common Table Expression. These are generally your best option for solving a recursion problem in sql.

This assumes you only have 2 address types. If you have more I would recommend creating a pivot table.
select a.*, MATCHEDADDRESS2,EXCEPTIONREASON2
from
(Select WORKFLOWID,UNIQUEVALUE,MATCHEDADDRESS,EXCEPTIONREASON
from "Your Table"
where MATCHEDADDRESS='ADDRESS1') a
join
(Select WORKFLOWID,UNIQUEVALUE,MATCHEDADDRESS as MATCHEDADDRESS2,EXCEPTIONREASON as XCEPTIONREASON2
from "Your Table"
where MATCHEDADDRESS='ADDRESS2') b
on a.WORKFLOWID=b.WORKFLOWID
and a.UNMATCHEDVALUE = b.UNMATCHEDVALUE

Related

Returning MIN and MAX values and ignoring nulls - populate null values with preceding non-null value

Using a table of events, I need to return the date and type for:
the first event
the most recent (non-null) event
The most recent event could have null values, which in that case needs to return the most recent non-null value
I found a few articles as well as posts here on SO that are similar (maybe even identical) but am not able to decode or understand the solution - i.e.
Fill null values with last non-null amount - Oracle SQL
https://www.itprotoday.com/sql-server/last-non-null-puzzle
https://koukia.ca/common-sql-problems-filling-null-values-with-preceding-non-null-values-ad538c9e62a6
Table is as follows - there are additional columns, but I am only including 3 for the sake of simplicity. Also note that the first Type and Date could be null. In this case returning null is desired.
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ Update ║ 2019-04-02 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
The output should be:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Update ║ 2019-04-02 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
The first method I tried was to join the table to itself using a subquery that finds the MIN and MAX dates using case statements:
select
Email,
max(case when T1.Date = T2.Min_Date then T1.Type end) as FirstType,
max(case when T1.Date = T2.Min_Date then T1.Date end) as FirstDate,
max(case when T1.Date = T2.Max_Date then T1.Type end) as LastType,
max(case when T1.Date = T2.Max_Date then T1.Date end) as LastDate,
from
T1
join
(select
EmailAddress,
max(Date) as Max_Date,
min(Date) as Min_Date
from
Table1
group by
Email
) T2
on
T1.Email = T2.Email
group by
T1.Email
This seemed to work for the MIN values, but the MAX values would return null.
To solve the problem of returning the last non-value I attempted this:
select
EmailAddress,
max(Date) over (partition by EmailAddress rows unbounded preceding) as LastDate,
max(Type) over (partition by EmailAddress rows unbounded preceding) as LastType
from
T1
group by
EmailAddress,
Date,
Type
However, this gives a result of 3 rows, instead of 1.
I'll admit I don't quite understand analytic functions since I have not had to deal with them at length. Any help would be greatly appreciated.
Edit:
The aforementioned example is an accurate representation of what the data could look like, however the below example is the exact sample data that I am using.
Sample:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
Desired Outcome:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
Additional Use-Case:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ null ║ null ║
║ A ║ Create ║ 2019-04-01 ║
╚═══════╩════════╩════════════╝
Desired Outcome:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ null ║ null ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
Use window functions and conditional aggregation:
select t.email,
max(case when seqnum = 1 then type end) as first_type,
max(case when seqnum = 1 then date end) as first_date,
max(case when seqnum_nonull = 1 and type is not null then type end) as last_type,
max(case when seqnum_nonull = 1 and type is not null then date end) as last_date
from (select t.*,
row_number() over (partition by email order by date) as seqnum,
row_number() over (partition by email, (case when type is null then 1 else 2 end) order by date) as seqnum_nonull
from t
) t
group by t.email;
As Spark SQL window functions support NULLS LAST|FIRST syntax you could use that then specify a pivot with multiple aggregates for rn values 1 and 2. I could do with seeing some more sample data but this work for your dataset:
%sql
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp;
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date), MAX(type) FOR rn In ( 1, 2 ) )
Rename the columns by supplying your required parts in the query, eg
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
Alternately supply a column list, eg
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
), cte2 AS
(
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
)
SELECT *
FROM cte2 AS (Email, FirstDate, FirstType, LastDate, LastType)
This simple query uses ROW_NUMBER to assign a row number to the dataset ordered by the date column, but using the NULLS LAST syntax to ensure null rows appear last in the numbering. The PIVOT then converts the rows to columns.

Merge 1st and Second Row, 3rd and 4th Row and so on

Lets say I have table with rows,
Id Value
----------
1 a
1 b
1 c
1 d
1 e
1 f
and the expected result should be,
Id Value1 Value2
-------------------
1 a b
1 c d
1 e f
I am very confused here.
Ok, there's definitely a simpler way to do this, but this works:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Value)
FROM dbo.YourTable
)
SELECT Id,
MIN(CASE WHEN RN % 2 = 1 THEN Value END) Value1,
MIN(CASE WHEN RN % 2 = 0 THEN Value END) Value2
FROM CTE
GROUP BY Id,
RN - ((RN - 1) % 2);
This is the result:
╔════╦════════╦════════╗
║ Id ║ Value1 ║ Value2 ║
╠════╬════════╬════════╣
║ 1 ║ a ║ b ║
║ 1 ║ c ║ d ║
║ 1 ║ e ║ f ║
╚════╩════════╩════════╝
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY value) AS RowNum
FROM YourTable
)
SELECT
c1.id
, c1.value as Value1
, c2.value as Value2
FROM cte c1
LEFT JOIN cte c2 ON c1.rownum = c2.rownum - 1
WHERE c1.RowNum % 2 = 1
select Id
,min(Value) as Value1
,max(Value) as Value2
from (select Id,Value
,(row_number () over
(partition by Id order by Value)+1)/2 as group_id
from mytable as t
) t
group by Id
,group_id

How to write the query?

I have one table that contains customers (The goal of this table was to be able to add fields without DB-Update). The table looks like this:
CustId Property PropertyValue
1 Name Smith
1 Email smith#gmail.com
2 Name Donalds
2 Email donalds#gmail.com
3 Name john
(The customer 3 has no entry for "Email" in the table)
Expected result: I want to get one line per client (Mail) and if the customer has no email, display still one line with NULL.
CustId Property PropertyValue
1 Email smith#gmail.com
2 Email donalds#gmail.com
3 Email NULL
Has someone the solution ?
Query 1
Select t1.CustId
, ISNULL(t2.Property ,'Email') AS Property
, t2.PropertyValue
FROM TableName t1
LEFT JOIN TableName t2 ON t1.CustId = t2.CustId
AND t2.Property = 'Email'
WHERE t1.Property = 'Name'
Result Set 1
╔════════╦══════════╦═══════════════════╗
║ CustId ║ Property ║ PropertyValue ║
╠════════╬══════════╬═══════════════════╣
║ 1 ║ Email ║ smith#gmail.com ║
║ 2 ║ Email ║ donalds#gmail.com ║
║ 3 ║ Email ║ NULL ║
╚════════╩══════════╩═══════════════════╝
Query 2
Another query for a more readable result set should look something like....
Select t1.CustId
, t1.PropertyValue [CustomerName]
, t2.PropertyValue [CustomerEmail]
FROM TableName t1
LEFT JOIN TableName t2 ON t1.CustId = t2.CustId
AND t2.Property = 'Email'
WHERE t1.Property = 'Name'
Result Set 2
╔════════╦══════════════╦═══════════════════╗
║ CustId ║ CustomerName ║ CustomerEmail ║
╠════════╬══════════════╬═══════════════════╣
║ 1 ║ Smith ║ smith#gmail.com ║
║ 2 ║ Donalds ║ donalds#gmail.com ║
║ 3 ║ john ║ NULL ║
╚════════╩══════════════╩═══════════════════╝
DECLARE #t TABLE (
CustId INT,
Property VARCHAR(50),
PropertyValue VARCHAR(50)
)
INSERT INTO #t (CustId, Property, PropertyValue)
VALUES
(1, 'Name', 'Smith'),
(1, 'Email', 'smith#gmail.com'),
(2, 'Name', 'Donalds'),
(2, 'Email', 'donalds#gmail.com'),
(3, 'Name', 'john')
SELECT CustId
, Name = 'Email'
, Value = MAX(CASE WHEN Property = 'Email' THEN PropertyValue END)
FROM #t
GROUP BY CustId
You can do it using a derived table containing all possible ID's , and then left joining only to the Emails on the original table:
SELECT t.custID,'EMAIL',s.PropertyValue
FROM(SELECT DISTINCT custID
FROM YourTable) t
LEFT OUTER JOIN YourTable s
ON(t.custID = s.custID and s.property = 'Email')
Can also be done with a correlated query:
SELECT DISTINCT t.CustID,'EMAIL',
(SELECT s.PropertyValue
FROM YourTable s
WHERE s.custID = t.custID and s.Property = 'Email')
FROM YourTable t
Self join with same table, property passed via variable
DECLARE #prop nvarchar(max) = 'Email'
SELECT DISTINCT c.CustId, #prop as Property, c1.PropertyValue
FROM yourtable c
LEFT JOIN yourtable c1
ON c.CustId = c1.CustId and c1.Property = #prop
Output will be as you posted in your question.
SELECT CustId
, MIN(CASE WHEN Property IS NULL THEN 'Email' ELSE Property END) Property
, MIN(PropertyValue) PropertyValue
FROM TableName
GROUP BY CustId
HAVING Property = 'Email';

select data that has at least P and R

I have a table named Table1 as shown below:
ID AccountNo Trn_cd
1 123456 P
2 123456 R
3 123456 P
4 12345 P
5 111 R
6 111 R
7 5625 P
I would like to display those records that accountNo appears more than one time (duplicate) and trn_cd has at least both P and R.
In this case the output should be at this way:
ID AccountNo Trn_cd
1 123456 P
2 123456 R
3 123456 P
I have done this sql but not the result i want:
select * from Table1
where AccountNo IN
(select accountno from table1
where trn_cd = 'P' or trn_cd = 'R'
group by AccountNo having count(*) > 1)
Result as below which AccountNo 111 shouldn't appear because there is no trn_cd P for 111:
ID AccountNo Trn_cd
1 123456 P
2 123456 R
3 123456 P
5 111 R
6 111 R
Any idea?
Use aggregation for this. To get the account numbers:
select accountNo
from table1
having count(*) > 1 and
sum(case when trn_cd = 'P' then 1 else 0 end) > 0 and
sum(case when trn_cd = 'N' then 1 else 0 end) > 0
To get the account information, use a join or in statement:
select t.*
from table1 t
where t.accountno in (select accountNo
from table1
having count(*) > 1 and
sum(case when trn_cd = 'P' then 1 else 0 end) > 0 and
sum(case when trn_cd = 'N' then 1 else 0 end) > 0
)
This problem is called Relational Division.
This can be solved by filtering the records which contains P and R and counting the records for every AccountNo returned, and filtering it again using COUNT(DISTINCT Trn_CD) = 2.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT AccountNo
FROM TableName
WHERE Trn_CD IN ('P','R')
GROUP BY AccountNo
HAVING COUNT(DISTINCT Trn_CD) = 2
) b ON a.AccountNO = b.AccountNo
SQLFiddle Demo
SQL of Relational Division
OUTPUT
╔════╦═══════════╦════════╗
║ ID ║ ACCOUNTNO ║ TRN_CD ║
╠════╬═══════════╬════════╣
║ 1 ║ 123456 ║ P ║
║ 2 ║ 123456 ║ R ║
║ 3 ║ 123456 ║ P ║
╚════╩═══════════╩════════╝
For faster performance, add an INDEX on column AccountNo.

Group by does not show all the rows

I have a table tblPersonaldata and tblStudentsadmitted
tblPersonalData
UID Name Gender
------------------------
E1 xyz M
E2 pqr M
E3 mno M
tblStudentsadmitted
UID Status Stage
----------------------
E1 Y 1
E2 Y 2
E3 Y 1
Now I want the data like this:
Gender Stage1 Stage2
M 2 1
But in this case I dont get the data for female gender. I want the data for female gender even if it is null
I have tried this:
select
case
when gender='M' then 'Male'
when gender='F' then 'Female'
end as Gender,
sum(case when Stage=1 then 1 else 0) end as Stage1,
sum(case when Stage=2 then 1 else 0) end as Stage2
from tblPersonaldata A inner join
tblStudentsadmitted B on A.UID=B.UID
where B.Status='Y'
group by Gender
SELECT CASE WHEN a.Gender = 'M' THEN 'Male' ELSE 'FEMALE' END Gender,
SUM(CASE WHEN Stage = 1 THEN 1 ELSE 0 END) Stage1,
SUM(CASE WHEN Stage = 2 THEN 1 ELSE 0 END) Stage2
FROM personal a
LEFT JOIN studentadmitted b
ON a.UID = b.UID AND b.Status = 'Y'
GROUP BY a.Gender
SQLFiddle Demo
SELECT CASE WHEN c.Gender = 'M' THEN 'Male' ELSE 'Female' END Gender,
SUM(CASE WHEN Stage = 1 THEN 1 ELSE 0 END) Stage1,
SUM(CASE WHEN Stage = 2 THEN 1 ELSE 0 END) Stage2
FROM (SELECT 'F' Gender UNION SELECT 'M' Gender) c
LEFT JOIN personal a
ON a.Gender = c.Gender
LEFT JOIN studentadmitted b
ON a.UID = b.UID AND b.Status = 'Y'
GROUP BY c.Gender
SQLFiddle Demo
OUTPUT
╔════════╦════════╦════════╗
║ GENDER ║ STAGE1 ║ STAGE2 ║
╠════════╬════════╬════════╣
║ Female ║ 0 ║ 0 ║
║ Male ║ 2 ║ 1 ║
╚════════╩════════╩════════╝
In SQL Server, you can use the PIVOT function to generate the result:
select gender,
Stage1,
Stage2
from
(
select
c.gender,
'Stage'+cast(stage as varchar(10)) Stage
from (values ('F'),('M')) c (gender)
left join tblpersonaldata p
on c.gender = p.gender
left join tblStudentsadmitted s
on p.uid = s.uid
and s.Status='Y'
)src
pivot
(
count(stage)
for stage in (Stage1, Stage2)
) piv
See SQL Fiddle with Demo.
Since you are using SQL Server 2008 this query uses the VALUES to generate the list of the genders that you want in the final result set
from (values ('F'),('M')) c (gender)
Then by using a LEFT JOIN on the other tables the final result will return a row for both the M and F values.
This can also be written using a UNION ALL to generate the list of genders:
select gender,
Stage1,
Stage2
from
(
select
c.gender,
'Stage'+cast(stage as varchar(10)) Stage
from
(
select 'F' gender union all
select 'M' gender
) c
left join tblpersonaldata p
on c.gender = p.gender
left join tblStudentsadmitted s
on p.uid = s.uid
and s.Status='Y'
)src
pivot
(
count(stage)
for stage in (Stage1, Stage2)
) piv
See SQL Fiddle with Demo
The result of both is:
| GENDER | STAGE1 | STAGE2 |
----------------------------
| F | 0 | 0 |
| M | 2 | 1 |
This is also working. Using Left joins with a new table (a table with two records for genders M & F).
Fiddle demo
select t.g Gender,
isnull(sum(case when Stage = 1 then 1 end),0) Stage1,
isnull(sum(case when Stage = 2 then 1 end),0) Stage2
from (values ('M'),('F')) t(g)
left join personal a on t.g = a.gender
left join studentadmitted b on a.uid = b.uid and b.Status = 'Y'
group by t.g
order by t.g
| GENDER | STAGE1 | STAGE2 |
----------------------------
| F | 0 | 0 |
| M | 2 | 1 |
SELECT GENDER, 0 AS 'STAGE 0', 1 AS 'STAGE 1', 2 AS 'STAGE 2'
FROM
(
SELECT P.ID, GENDER,CASE WHEN STAGE IS NULL THEN 0 ELSE STAGE END STAGE
FROM tblPersonaldata P
LEFT JOIN tblStudentsadmitted S ON P.UID = S.UID
) AS A
PIVOT
(
COUNT (ID) FOR STAGE IN ([0],[1],[2])
)P