SQL CASE statement returns duplicate values - sql

Here is how my data looks
title value
------------
t1 v1
t2 v2
t3 v3
Now I want t1 and t2 to be inferred as the same value t12. So, I do:
SELECT
CASE
WHEN title = 't1' OR title = 't2'
THEN 't12'
ELSE title
END AS inferred_title,
COUNT(value)
FROM
my_table
GROUP BY
inferred_title;
I expected the output to be:
inferred title values
-----------------------
t12 2
t3 1
But what I end up getting is:
inferred title values
--------------------------
t12 1
t12 1
t3 1
How do I make it behave the way I want it to? I don't want the duplicated rows.

The problem is scoping. You must have an inferred_title in the table. Either give a new column alias or repeat the expression:
SELECT (CASE WHEN title IN ('t1', 't2') THEN 't12'
ELSE title
END) AS inferred_title,
COUNT(value)
FROM my_table
GROUP BY (CASE WHEN title IN ('t1', 't2') THEN 't12'
ELSE title
END);

Do the "merge" case in a derived table (sub-query), group by its result:
SELECT inferred_title, COUNT(value)
FROM
(
SELECT CASE WHEN title = 't1' OR title = 't2' THEN 't12'
ELSE title
END AS inferred_title,
value
FROM my_table
) dt
GROUP BY inferred_title;
This saves you some typing, is less error prone and easier to maintain - and is
ANSI SQL compliant!

Select Title, COUNT(Title) AS Totals
From my_table
Group By Title
Having COUNT(Title)>1
Order By 2 desc

Related

SQL Select a specific value in the group

I have this following table
Dept---------- Sub_Dept---- Dept Type
Sales.............Advertising........A
Sales.............Marketing......... B
Sales.............Analytics.......... C
Operations.....IT..................... C
Operations.....Settlement........C
And the result should be if a department got a department type as A then change all record of that department to A, else keep it same
Dept---------- Sub_Dept---- Dept Type
Sales.............Advertising........A
Sales.............Marketing......... A
Sales.............Analytics.......... A
Operations.....IT..................... C
Operations.....Settlement........C
Anybody can give a suggestion on this? I thought of using the GROUP BY but have to output the Sub Department as well
Thanks a lot
I would do:
update t
set depttype = 'a'
where exists (select 1 from t t2 where t2.dept = t.dept and t2.dept = 'a') and
t.dept <> 'a';
If you just want a select, then do:
select t.*,
(case when sum(case when depttype = 'a' then 1 else 0 end) over (partition by dept) > 1
then 'a'
else depttype
end) as new_depttype
from t;
Use below query
select a11.dept, a12.Sub_Dept, (case when a12.min_dep_type='A' then 'A' else a11.dep_type) as dep_type
from tab a11
JOIN (select dept, min(dep_type) min_dep_type from tab group by dept) a12
on a11.dept = a12.dept
Try this:
update table
set depttype= case when dept in (select dept from table where depttype='a') then 'a' else depttype end
This should work:
select a.dept, a.sub_dept,
case when b.dept is not null then 'A' else dept_type end as dept_type
from aTable a
left join(
select distinct Dept from aTable where dept_type = 'A'
)
b on b.dept = a.dept
You could use analytic functions to check whether exists the specific value in the group.
Try below query:
SELECT t.Dept,
t.Sub_Dept,
NVL(MIN(CASE WHEN t.Dept_Type = 'A'
THEN Dept_Type END) OVER (PARTITION BY t.Dept), t.Dept_Type) AS Dept_Type
FROM table_1 t
Using the analytic function MIN(), you can search for the value of 'A' (if it does exist inside the group). MIN works for non-null values only, so if you don't have any 'A' in the group, the result will be NULL.
At this point, you can use NVL to choose whether to print the value found in the group or the actual dept_type of the row.

Comparing two tables that doesn't have unique key

I need to compare two tables data and check which attributed are mismatching, tables have same table definition, but the problem is i dint have a unique key to compare. I tried to use
CONCAT(CONCAT(CONCAT(table1.A, Table1.B))
=CONCAT(CONCAT(CONCAT(table2.A, Table2.B))
but still facing duplicate rows also tried NVL on few columns but didn't work
SELECT
UT.cat,
PD.cat
FROM
EM UT, EM_63 PD
WHERE
NVL(UT.cat, 1) = NVL(PD.cat, 1) AND
NVL(UT.AT_NUMBER, 1) = NVL(PD.AT_NUMBER, 1) AND
NVL(UT.OFFSET, 1) = NVL(PD.OFFSET, 1) AND
NVL(UT.PROD, 1) = NVL(PD.PROD, 1)
;
There are 34k records in one table 35k records in another table, but if I run the above query, the count of rows is 3 millions.
Columns in table:
COUNTRY
CATEGORY
TYPE
DESCRIPTION
Sample data :
Table 1 :
COUNTRY CATEGORY TYPE DESCRIPTION
US C T1 In
IN A T2 OUT
B C T2 IN
Y C T1 INOUT
Table 2:
COUNTRY CATEGORY TYPE DESCRIPTION
US C T2 In
IN B T2 Out
Q C T2 IN
Expected output:
column Matched unmatched
COUNTRY 2 1
CATEGORY 2 1
TYPE 2 1
DESCRIPTION 3 0
In the most general case (when you may have duplicate rows, and you want to see which rows exist in one table but not in the other, and ALSO which rows may exist in both tables, but the row exists 3 times in the first table but 5 times in the other):
This is a very common problem with a settled "best solution" which for some reason it seems most people are still not aware of, even though it was developed on AskTom many years ago and has been presented numerous times.
You do NOT need a join, you do not need a unique key of any kind, and you don't need to read either table more than once. The idea is to add two columns to show from which table each row comes, do a UNION ALL, then GROUP BY all the columns except the "source" columns and show the count for each table. Something like this:
select count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from (
select 'x' as t_1, null as t_2, col1, col2, ...
from table_1
union all
select null as t_1, 'x' as t_2, col1, col2, ...
from table_2
)
group by col1, col2, ...
having count(t_1) != count(t_2)
;
Start with this query to check if these 4 columns form a key.
select occ_total,occ_ut,occ_pd
,count(*) as records
from (select count (*) as occ_total
,count (case tab when 'UT' then 1 end) as occ_ut
,count (case tab when 'PD' then 1 end) as occ_pd
from select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD from EM
union all select 'PD' ,cat,AT_NUMBER,OFFSET,PROD from EM_63 PD
) t
group by cat,AT_NUMBER,OFFSET,PROD
) t
group by occ_total,occ_ut,occ_pd
order by records desc
;
After you have chosen your "key",you can use the following query to see the attributes' values
select count (*) as occ_total
,count (case tab when 'UT' then 1 end) as occ_ut
,count (case tab when 'PD' then 1 end) as occ_pd
,count (distinct att1) as cnt_dst_att1
,count (distinct att2) as cnt_dst_att2
,count (distinct att3) as cnt_dst_att3
,...
,listagg (case tab when 'UT' then att1 end) within group (order by att1) as att1_vals_ut
,listagg (case tab when 'PD' then att1 end) within group (order by att1) as att1_vals_pd
,listagg (case tab when 'UT' then att2 end) within group (order by att2) as att2_vals_ut
,listagg (case tab when 'PD' then att2 end) within group (order by att2) as att2_vals_pd
,listagg (case tab when 'UT' then att3 end) within group (order by att3) as att3_vals_ut
,listagg (case tab when 'PD' then att3 end) within group (order by att3) as att3_vals_pd
,...
from select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from E M
union all select 'PD' ,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from EM_63 PD
) t
group by cat,AT_NUMBER,OFFSET,PROD
;
The problem with CONCATis, that you could get invalid matches, if your data looks similar to this:
table1.A = '123'
table1.B = '456'
concatenates to: '123456'
table2.A = '12'
table2.B = '3456'
concatenates also to: '123456'
You have to compare the fields individually: table1.A = table2.A AND table1.B = table2.B

Select rows having the same features than others

I've the following table with 3 columns: Id, FeatureName and Value:
Id FeatureName Value
-- ----------- -----
1 AAA 10
1 ABB 12
1 BBB 12
2 AAA 15
2 ABB 12
2 ACD 7
3 AAA 10
3 ABB 12
3 CCC 12
.............
Each Id has different features and each Feature has a value for that Id.
I need to write a query which gives me the Ids that have exactly the same features and values than a given one, but only taking into account those whose name starts with 'A'. For example, in the top table, I can use that query to search for all the Ids that have the same features. For example, features with values where Id=1 would result Id=3 with same features starting with 'A' and same values for these features.
I found a couple of different ways to do this, but all of them go very slow when the table has lots of rows (more than hundred of thousands)
The way I obtain the best performance is using the next query:
select a2.Id
from (select a.FeatureName, a.Value
from Table1 a
where a.Id = 1) a1,
(select a.Id, a.FeatureName, a.Value
from Table1 a
where a.FeatureName like 'A%') a2
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2
intersect
select a.Id
from Table1 a
where a.FeatureName like 'A%'
group by a.Id
having count(*)= 2
where #nFeatures is the number of features starting by 'A' in Id=1. I counted them before calling this query. I make the intersection to avoid results that have the same parameters than Id=1 but also some others whose name starts with 'A'.
I think that the slowest part is the second subquery:
select a.Id, a.FeaureName, a.Value
from MyTable a
where a.FeatureName = 'A%'
but I don't know how to make it faster. Maybe I will have to play with the indexes.
Any idea of how could I write a fast query for this purpose?
So you want all rows where the combination of FeatureName and Value is not unique? You can use EXISTS:
SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
WHERE t.Id <> t2.ID
AND t.FeatureName = t2.FeatureName
AND t.Value = t2.Value)
Demo
how could I write a fast query for this purpose?
If it's not fast enough create an index on FeatureName + Value.
I tried to eliminate the join with MyTable again to select the data for the ID's that have matching FeatureName and Value values. Here's the query:
with joined_set as
(
SELECT
mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
from
(
select *
from mytable
where featurename like 'A%'
) mt1
left join
(
select *
from mytable
where featurename like 'A%'
) mt2
on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in
(select id
from joined_set
group by id
having SUM(
CASE
WHEN mt2_id is null THEN 1
ELSE 0
END
) <> 0
);
Here is the SQL Fiddle demo. It has an extra condition in the inline view mt2, to perform this search only for id = 1.
I'm a little dense this morning, I'm not sure if you wanted just the ID's or...
Here's my take on it...
You could probably move the where FeatureName like 'A%' into the inner query to filter the data on the initial table scan.
with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'
order by FeatureName, Value, Id
A general solution is
With Rows As (
select id
, FeatureName
, Value
, rows = Count(id) OVER (PARTITION BY id)
FROM test
WHERE FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM Rows a
INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName
and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id
to limit the solution to a group just add a WHERE condition on the main query for a.ID. The CTE is needed to get the correct number of rows for each id
SQLFiddle demo, in the demo I changed little the test data to have a another couple of ID with only one of the FeatureName of 1 and 3

Replace NULL with values

Here is my challenge:
I have a log table which every time a record is changed adds a new record but puts a NULL value for each non-changed value in each record. In other words only the changed value is set, the rest unchanged fields in each row simply has a NULL value.
Now I would like to replace each NULL value with the value above it that is NOT a NULL value like below:
Source table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue NULL NULL
3 NULL NULL F
4 Frank Admission T
5 NULL NULL F
6 NULL NULL T
Desired output table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue Registrar T
3 Sue Registrar F
4 Frank Admission T
5 Frank Admission F
6 Frank Admission T
How do I write a query which will generate the desired output table?
One the new windowed function of SQLServer 2012 is FIRST_VALUE, wich have quite a direct name, it can be partitioned through the OVER clause, before using it is necessary to divide every column in data block, a block for a column begin when a value is found.
With Block As (
Select ID
, Owner
, OBlockID = SUM(Case When Owner Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Status
, SBlockID = SUM(Case When Status Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Flag
, FBlockID = SUM(Case When Flag Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
From Task_log
)
Select ID
, Owner = FIRST_VALUE(Owner) OVER (PARTITION BY OBlockID ORDER BY ID)
, Status = FIRST_VALUE(Status) OVER (PARTITION BY SBlockID ORDER BY ID)
, Flag = FIRST_VALUE(Flag) OVER (PARTITION BY FBlockID ORDER BY ID)
FROM Block
SQLFiddle demo
The UPDATE query is easily derived
As I mentioned in my comment, I would try to fix the process that is creating the records rather than fixing the junk data. If that is not an option, the code below should get you pointed in the right direction.
UPDATE t1
set t1.owner = COALESCE(t1.owner, t2.owner),
t1.Status = COALESCE(t1.status, t2.status),
t1.Flag = COALESCE(t1.flag, t2.flag)
FROM Task_log as t1
INNER JOIN Task_log as t2
ON t1.id = (t1.id + 1)
where t1.owner is null
OR t1.status is null
OR t1.flag is null
I can think of several approaches.
You could use a combination of COALESCE with an array aggregate function. Unfortunately it doesn't look like SQL Server supports array_agg natively (although some nice people have developed some workarounds).
You could also use a subselect for each column.
SELECT id,
(SELECT TOP 1 FROM (SELECT owner FROM ... WHERE id = outer_id AND owner IS NOT NULL order by ID desc )) AS owner,
-- other columns
You could probably do something with window functions, too.
A vanilla solution would be:
select id
, owner
, coalesce(owner, ( select owner from t t2
where id = (select max(id) from t t3
where id < t1.id and owner is not null))
) as new_owner
, flag
, coalesce(flag, ( select flag from t t2
where id = (select max(id) from t t3
where id < t1.id and flag is not null))
) as new_flag
from t t1
Rather inefficient, but should work on most DBMS

Joining two select queries from the same table

The table contains an ID column, valueHeading column and a value column. I want to separate the value column into two new columns called valueHeading1 and valueHeading2 depending on which type of valueHeading the value has.
So I want to join this select:
Edit: Full join
SELECT ID
,valueHeading
,value as 'valueHeading1'
FROM table1
WHERE valueHeading = 'valueHeading1'
With This select:
SELECT ID
,value as 'valueHeading2'
FROM table1
WHERE valueHeading = 'valueHeading2'
on their respective ID's. How do I do this?
Edit to illustrate what I want to do:
Original table:
ID valueHeading value
0 valueHeading1 a
0 valueHeading2 a
1 valueHeading1 ab
1 valueHeading2 NULL
2 valueHeading1 abcd
2 valueHeading2 abc
New Table:
ID valueHeading1 valueHeading2
0 a a
1 ab NULL
2 abcd abc
If you need only join use this. Using case when is elegant way if you don't need join.
SELECT * FROM
(SELECT ID
,valueHeading
,value as 'valueHeading1'
FROM table1
WHERE valueHeading = 'valueHeading1') AS TAB_1,
(SELECT ID
,value as 'valueHeading2'
FROM table1
WHERE valueHeading = 'valueHeading2') AS TAB_2
WHERE TAB_1.ID = TAB_2.ID
Try something like :
SELECT ID
, CASE WHEN valueHeading = 'valueHeading1' THEN value ELSE NULL END AS valueHeading1
, CASE WHEN valueHeading = 'valueHeading2' THEN value ELSE NULL END AS valueHeading2
FROM table1
WHERE valueHeading IN ('valueHeading1', 'valueHeading2')
If you want to regroup all values on one row for each ID, you can try :
SELECT ID
, MAX(CASE WHEN valueHeading = 'valueHeading1' THEN value ELSE NULL END) AS valueHeading1
, MAX(CASE WHEN valueHeading = 'valueHeading2' THEN value ELSE NULL END) AS valueHeading2
FROM table1
WHERE valueHeading IN ('valueHeading1', 'valueHeading2')
GROUP BY ID
HAVING MAX(CASE WHEN valueHeading = 'valueHeading1' THEN value ELSE NULL END) IS NOT NULL
OR MAX(CASE WHEN valueHeading = 'valueHeading2' THEN value ELSE NULL END) IS NOT NULL
See SQLFiddle. I also tried on Oracle 11g and MSSQL 2012, and it works each time.
In SQLServer2005+ possible use PIVOT
SELECT ID, valueHeading1, valueHeading2
FROM
(
SELECT *
FROM dbo.test28
WHERE valueHeading IN ('valueHeading1', 'valueHeading2')
) x
PIVOT
(
MAX(value)
FOR valueHeading IN ([valueHeading1], [valueHeading2])
) p
Demo on SQLFiddle
self join could be a simple solution
SELECT DISTINCT t1.ID, t1.value as valueHeading1, t2.value as valueHeading2,
FROM table1 t1
INNER JOIN table1 t2 ON t1.ID = t2.ID
WHERE t1.valueHeading <> t2.valueHeading