Remove slightly different duplicates - sql

The goal is to not show duplicate with slight differences. Here is an example:
I want to remove what is in green
This is the query I am currently using, but not quite accurate.
SELECT Package
FROM myTable
WHERE Package NOT IN (SELECT Package FROM myTable WHERE Package NOT LIKE '%_AC')

With just your sample data to go on you could try using a not exists criteria, such as
select *
from t
where not exists (
select * from t t2
where t.package like Concat(t2.package,'%')
and t.package != t2.package
);

A couple of ideas. If you have a unique column, you could use a NOT EXISTS:
SELECT P
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable e
WHERE (YT.P LIKE e.P + '%'
OR e.P LIKE YT.P + '%')
AND YT.ID != e.ID);
If not, then you could use a CTE and a windowed COUNT:
WITH CTE AS(
SELECT YT1.*,
COUNT(YT2.P) OVER (PARTITION BY YT1.P) AS C
FROM dbo.YourTable YT1
JOIN dbo.YourTable YT2 ON YT1.P LIKE YT2.P + '%'
OR YT2.P LIKE YT1.P + '%') --Will always JOIN to itself
SELECT P
FROM CTE
WHERE C = 1;
db<>fiddle

Related

SQL Server Join - With INFO_SCHEMA information

I have the first table:
select COLUMN_NAME
from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = N'tbl_Client_List_Pricing'
Don't mind the numbering in the Column_Name. I was doing this while testing because I need the order to remain as they are in the table. Not by ASC, DESC.
Anyhow, I don't know how to use the row numbers on the left that the system provides to JOIN another table without a condition.
Here is Table 2:
You can see that the left row numbers are my linking value but I don't know how to use that system index value as a condition in my JOIN.
Or if there is another way to join these two tables without a condition while keeping the Table 1 information in it's correct position and not affecting it by ORDER would be much appreciated.
Thank you!
-Chase
I guess you are looking for row_number. Use row_number to order result of two queries then join by matching order nums. Your query would be something like
with query_1 as (
select COLUMN_NAME
, rn = row_number() over (order by cast(left(COLUMN_NAME, 3) as int))
from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = N'tbl_Client_List_Pricing'
)
, query_2 as (
select
*, rn = row_number() over (order by (select null))
from
Table_2
)
select
*
from
query_1 q1
join query_2 q2 on q1.rn = q2.rn
select COLUMN_NAME from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
inner join with Table_2 on Num=cast(LEFT(COLUMN_NAME,CHARINDEX('-', COLUMN_NAME)) AS int)
where TABLE_NAME = N'tbl_Client_List_Pricing'
You could also use sys.all_columns object which could able to state the index for your desired column & JOIN them with table2
SELECT *
FROM sys.all_columns c
INNER JOIN Table2 t ON t.Num = c.column_id
WHERE OBJECT_NAME(object_id) = 'tbl_Client_List_Pricing'

Count distinct records in one column with multiple values in another column

I'm pretty sure this is an easy question, but I'm having trouble wording it.
I need to count the total number of values in one column based on distinct criteria in another column.
Example:
A CD
B ABC
C AD
D A
Would yield:
A 3
B 1
C 2
D 2
First, you shouldn't be storing lists of things in a string.
But, sometimes one is stuck with this format. In your example, you seem to have a table with all possible values. If so you can use a join:
select e.col1, count(e2.col2)
from example e left join
example e2
on charindex(e.col1, e2.col2) > 0
group by e.col1;
Note: this counts rows containing the value rather. If multiple values appear in a single row, the query is a bit more complicated.
Here is how you can do it:
DECLARE #t TABLE ( c1 CHAR(1), c2 VARCHAR(5) )
INSERT INTO #t
VALUES ( 'A', 'CD' ),
( 'B', 'ABC' ),
( 'C', 'AD' ),
( 'D', 'A' )
SELECT t.c1 ,
SUM(count) AS count
FROM #t t
CROSS APPLY ( SELECT LEN(c2) - LEN(REPLACE(c2, t.c1, '')) AS count
FROM #t
WHERE c2 LIKE '%' + t.c1 + '%'
) ca
GROUP BY t.c1
Assuming table is called yourtable and fields are like soo.
fielda fieldb
A CD
B ABC
C AD
D A
Code
SELECT a.fielda, (SELECT COUNT(b.fieldb)
FROM yourtable b
WHERE b.fieldb LIKE '%a.fielda%' AND b.fielda = a.fielda) AS counter
FROM yourtable a
You can use a correlated subquery with LIKE
Sample Data
with cte(a,b) as
(
select 'A','CD'
union all select 'B','ABC'
union all select'C','AD'
union all select'D','A'
)
Query
select a,(select count(*) from cte c2 where b like '%' + c1.a +'%')
from cte c1
group by a
Output
A 3
B 1
C 2
D 2
Use a correlated sub-query when counting. Use LIKE to find rows to find rows to count.
select t1.col1, (select count(*) from tablename t2
where t2.col2 like '%' || t1.col1 ||'%')
from tablename t1
|| is ANSI SQL concatenation. Some products use concat(), or + instead.
Looks like you need a self join there but the trick would be to use a pattern match on the join rather than an equi-join...
create table x1(c1 char(1) primary key, c2 varchar(5) not null);
select x1.c1, count(*)
from x1 x1
join x1 x2 on x2.c2 like '%' || x1.c1 || '%'
group by x1.c1
order by 1;

Written a subquery that can return more than one field without using the Exists

The query below is supposed to pull records for fields with the max date.
I am getting an error
You have written a subquery that can return more than one field without using EXISTS reserved word in the Main query's FROM clause. Revise the SELECT statement of the subquery to request only one column.
Code:
SELECT *
FROM TableName
WHERE (((([Project_Name], [Date])) IN (SELECT Project_Name, MAX(Date)
FROM TableName
GROUP BY Project)));
Your probably thinking of a nested subquery used as a table, like the below:
select a.*, b.1, b.2
from FirstTable A
join (Select Id, firstcolumn as 1, secondcolumn as 2
from SecondTable) B on b.ID = a.ID
Works pretty much like a regular join except you are using a subquery. Hope that helps,
SELECT A.*
FROM TableName A
INNER JOIN (select Project_Name, max(Date) MaxDate
from TableName
group by Project) B
ON A.[Project_Name] = B.[Project_Name]
AND A.[Date] = B.MaxDate
A version using EXISTS() looks like this:
SELECT *
FROM TableName AS A
WHERE EXISTS(
SELECT * FROM (
SELECT B.Project_Name, MAX( B.Date ) AS MaxDate
FROM TableName AS B
GROUP BY B.Project_Name ) AS C
WHERE C.Project_Name = A.Project_Name AND C.MaxDate = A.Date
);
Although I have the feeling this will have poorer performance than a JOIN because the GROUP BY statement might have to be executed for each record and each call to the EXISTS() function...

Select rows having dstinct values for two fields

Pardon me for the title. I have a table like this:
There will be thousands of rows and now I want to select the rows having the same group_id but vr_debit and vr_credit values must not be equal: ie;, in the image shown, none of the rows satisfy this criteria. If there is are two rows, say, (6,500.000,0) and(6,0,600.000), I want them as the result. Hope you get the idea.
Thank you.
Calculate each group using SUM() which is an aggregate function and filter them using HAVING clause.
SELECT GROUP_ID, SUM(vr_debit) totalDebit, SUM(vr_credit) totalCredit
FROM TableName
GROUP BY GROUP_ID
HAVING SUM(vr_debit) <> SUM(vr_credit)
if you want to get the uncalculated rows, you can join it on the subquery.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT GROUP_ID
FROM TableName
GROUP BY GROUP_ID
HAVING SUM(vr_debit) <> SUM(vr_credit)
) b ON a.GROUP_ID = b.GROUP_ID
SQLFiddle Demo (for both queries)
Perhaps:
SELECT group_ID,
vr_debit,
vr_credit
FROM
dbo.TableName T1
WHERE
EXISTS(
SELECT 1 FROM dbo.TableName T2
WHERE T1.group_ID = T2.group_ID
AND T1.vr_debit <> T2.vr_debit
AND T1.vr_credit<> T2.vr_credit
AND T1.vr_debit <> T2.vr_credit
)
Also you can use this option
SELECT *
FROM dbo.test64 t
WHERE EXISTS (
SELECT 1
FROM dbo.test64 t2
WHERE t.group_id = t2.group_id
HAVING SUM(t2.vr_debit) - SUM(t2.vr_credit) != 0
)
Demo on SQLFiddle

Finding unique combinations of columns

I'm trying to write a select query but am having trouble, probably because I'm not familiar with SQL Server (usually use MySQL).
Basically what I need to do is find the number of unique combinations of 2 columns, one a Varchar and one a Double.
There are less rows in one than another, so I've been trying to figure out the right way to do this.
Essentially pretend Table.Varchar has in it:
Table.Varchar
--------------
apple
orange
and Table.Float has in it:
Table.Float
--------------
1
2
3.
How could I write a query which returns
QueryResult
-------------
apple1
apple2
apple3
orange1
orange2
orange3
Long day at work and I think I'm just overthinking this what I've tried so far is to concat the two columns and then count but it's not working. Any ideas to better go about this?
Select T1.VarcharField + CAST(T2.FloatField as Varchar(10)) as [Concat]
from Table.Varchar T1
CROSS JOIN Table.Float T2
this way, you are generating the fields
so, then group by and use Count
select T.Concat, count(*) from
(Select T1.VarcharField + CAST(T2.FloatField as Varchar(10)) as [Concat]
from Table.Varchar T1
CROSS JOIN Table.Float T2) T
group by T.Concat order by count(*) asc
If they are in the same table:
SELECT a.Field1, b.Field2
FROM [Table] a
CROSS JOIN [Table] b
or if they are in seperate tables:
SELECT a.Field1, b.Field2
FROM [Table1] a
CROSS JOIN [Table2] b
Keep in mind that the above queries will match ALL records from the first table with ALL records from the second table, creating a cartesian product.
This will eliminate duplicates:
DECLARE #Varchar TABLE(v VARCHAR(32));
DECLARE #Float TABLE(f FLOAT);
INSERT #Varchar SELECT 'apple'
UNION ALL SELECT 'orange'
UNION ALL SELECT 'apple';
INSERT #Float SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3;
SELECT v.v + CONVERT(VARCHAR(12), f.f)
FROM #Varchar AS v
CROSS JOIN #Float AS f
GROUP BY v.v, f.f;
A cross join is a join where each record in one table is combined with each record of the other table. Select the distinct values from the table and join them.
select x.Varchar, y.Float
from (select distinct Varchar from theTable) x
cross join (select distinct Float from theTable) y
To find the number of combinations you don't have to actually return all combinations, just count them.
select
(select count(distinct Varchar) from theTable) *
(select count(distinct Float) from theTable)
Try This
Possible Cominations.
SELECT
DISTINCT T1.VarField+CONVERT(VARCHAR(12),T2.FtField) --Get Unique Combinations
FROM Table1 T1 CROSS JOIN Table2 T2 --From all possible combinations
WHERE T1.VarField IS NOT NULL AND T2.FtField IS NOT NULL --Making code NULL Proof
and to just get the Possible Cominations Count
SELECT Count(DISTINCT T1.VarcharField + CONVERT(VARCHAR(12), T2.FloatField))
FROM Table1 T1
CROSS JOIN Table2 T2
WHERE T1.VarcharField IS NOT NULL AND T2.FloatField IS NOT NULL