I have a sql table with some values and a lot of filters
ID | Name | Filter1 | Filter2 | Filter3 | Filter4 ... and so on...
As now the filters have been set as int and I am running a query as follows to get the data required
select Name
from tblABC
where Filter1=1 and Filter2 = 7 and Filter3 = 33 ... and so on...'
My issue is that I want a filter column to hold multiple numbers. eg:- row no 3 will have numbers 8 and 13 in Filter1 cell, so that when I run a query for 8 or 13 I get the same result.
ie I want both the below queries to return the same result.
select... where Filter1=8
select... where Filter1=13
How can this be done? I tried converting the Filter columns to nvarchar and entering data as .8.13. where '.' where was used as separators. After this, running a query 'select... where Filter1 LIKE '%.8.%' is working for me.. But there are like 12 Filter columns and when such a string search is run in large volumes, wouldn't it make the query slow. What would be a more efficient way of doing this?
I am using Microsoft SQL 2014
A more efficient way, hmm. Separating tblABC from the filters would be my suggested way to go, even if it's not the most efficient way it will make up for it in maintenance (and it sure is more efficient than using like with wildcards for it).
tblABC ID Name
1 Somename
2 Othername
tblABCFilter ID AbcID Filter
1 1 8
2 1 13
3 1 33
4 2 5
How you query this data depends on your required output of course. One way is to just use the following:
SELECT tblABC.Name FROM tblABC
INNER JOIN tblABCFilter ON tblABC.ID = tblABCFilter.AbcID
WHERE tblABCFilter.Filter = 33
This will return all Name with a Filter of 33.
If you want to query for several Filters:
SELECT tblABC.Name FROM tblABC
INNER JOIN tblABCFilter ON tblABC.ID = tblABCFilter.AbcID
WHERE tblABCFilter.Filter IN (33,7)
This will return all Name with Filter in either 33 or 7.
I have created a small example fiddle.
I'm going to post a solution I use. I use a split function ( there are a lot of SQL Server split functions all over the internet)
You can take as example
CREATE FUNCTION [dbo].[SplitString]
(
#List NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim
) AS y
);
and run your query like this
select Name
from tblABC
where Filter1 IN (
SELECT * FROM SplitString(#DatatoFilter,',') and
Filter2 (IN (
SELECT * FROM SplitString(#DatatoFilter,',') and
..so on.
If you have hunderds of thousands of records it may not perform very well. But it should work.
My personal aproch would be a stored procedure and temp tables. Create a temp table with all the values you want to use as filter
SELECT *
INTO #Filter1
FROM SplitString(#DatatoFilter,',')
SELECT *
INTO #Filter2
FROM SplitString(#DatatoFilter,',')
then the final select
SELECT * FROM yourtable
WHERE Filter1 IN (SELECT DISTINCT Part FROM #Filter1) and
Filter2 IN (SELECT DISTINCT Part FROM #Filter2)
I don't think it makes any big difference from the first query, but it is easier to read.
Another solution which you can try is to convert the columns to XML. Its better than converting the columns to VARCHAR. You can use .exist to get only the records matching your criteria. Something like this.
DECLARE #table1 TABLE
(
[ID] int, [Name] varchar(9),Filter1 XML
)
INSERT INTO #table1
([ID], [Name],Filter1)
VALUES
(1, 'Somename','<Filter>8</Filter>'),
(2, 'Othername','<Filter>8</Filter><Filter>13</Filter>'),
(3, 'Thirdname','<Filter>25</Filter>')
DECLARE #FilterValue INT = 8
SELECT Filter1.query('/Filter'),*
FROM #table1
WHERE Filter1.exist('/Filter[. = sql:variable("#FilterValue")]') = 1
EDIT
You can even use the XML column to store all 12 of your filters. So this filter xml column which store all your filters and their multiple values.
DECLARE #table1 TABLE
(
[ID] int, [Name] varchar(9),Filter XML
)
INSERT INTO #table1
([ID], [Name],Filter)
VALUES
(1, 'Somename','<Filter ID = "1"><FilterVal>8</FilterVal></Filter><Filter ID = "2"><FilterVal>3</FilterVal><FilterVal>12</FilterVal></Filter>'),
(2, 'Othername','<Filter ID = "1"><FilterVal>8</FilterVal><FilterVal>13</FilterVal></Filter><Filter ID = "2"><FilterVal>8</FilterVal><FilterVal>13</FilterVal></Filter>'),
(3, 'Thirdname','<Filter ID = "2"><FilterVal>12</FilterVal><FilterVal>25</FilterVal></Filter><Filter ID = "3"><FilterVal>33</FilterVal></Filter>')
DECLARE #Filter1Value INT = 8
DECLARE #Filter2Value INT = 12
SELECT *
FROM #table1
WHERE Filter.exist('/Filter[#ID = 1]/FilterVal[. = sql:variable("#Filter1Value")]') = 1
AND Filter.exist('/Filter[#ID = 2]/FilterVal[. = sql:variable("#Filter2Value")]') = 1
Related
I have such a table (simplified for exhibit) with SQLServer 2012:
ParentId
Val
11111
1
11111
2
22222
1
22222
2
22222
3
33333
1
Fiddle here : http://sqlfiddle.com/#!18/67a210/1
I'm using a SP with a parameter #filterIds, which contains a string like 1, 2, 3 (note the delimiter is comma and space).
I need a request to get all the lines with a same ParentId which has a Val of all values of #filtersIds.
If #filtersId is 1, 2, 3, result must be 22222 because it's the only one which has lines with Val = 1, Val = 2 and Val = 3.
If #filtersId is 1, 2, results must be 11111 and 22222 because they both have lines with Val = 1 and Val = 2.
If #filtersId is 1, 4, there's no result at all because there's no ParentId with Val = 1 and Val = 4.
I tried with some JOIN but it seems over-complicated for a such a simple request. Is there some quick-and-easy solution I haven't think about ?
You can build a filter table and find all the parentID values that don't have a filter criteria that isn't met.
Note that you can bypass the first step (where I build the filter list cteFilter from values in the data) using the function STRING_SPLIT if your SQL is new enough, but I showed the version you'll need for older SQL.
Also note that using integer values matched against a string filter presents an unusual problem - you can have a value match a part of the intended filter value, i.e. if your filter is '100, 250' you need to take steps to insure a value of 25 does not match. Adding delimiters around both the filter string and the strings generated from the values will allow you to test for only a complete match.
Thanks for providing sample data in Fiddle, it made it easier to answer your question.
DECLARE #Filter nvarchar(50) = '1, 2, 3';
--DECLARE #Filter nvarchar(50) = '1, 2';
create table #Sample(parentId int not null, Val int not null)
insert into #Sample(parentId, Val)
values (11111, 1), (11111, 2), (22222, 1), (22222, 2), (22222, 3), (33333, 1)
;with cteVals as ( --Don't apply functions more than necessary
SELECT DISTINCT Val FROM #Sample
), cteFilter as (--NOTE: For QL Server 2016 (13.x) and later, use string_split to generate this list!
SELECT Val --Otherwise, build a list by finding all Val values in your filter string
FROM cteVals --Next line is complicated to insure matching start and end values
--even if values are of different lengths, we don't want 25 to match filter '100, 250'!
WHERE CHARINDEX(CONCAT(', ', FORMAT(Val, '#'),', '), CONCAT(', ', #Filter, ', ')) > 0
)SELECT DISTINCT parentId --Find all of the ParentIDs
FROM #Sample as S
WHERE NOT EXISTS ( --That don't have any filter requirements ...
SELECT * FROM cteFilter as F
WHERE NOT EXISTS ( --such that the filter
SELECT * FROM #Sample as S2 --isn't in the sample data
WHERE S2.Val = F.Val --matching the filter value
AND S2.parentId = S.parentId --and also matching the parentID in question
)
)
DROP TABLE #Sample
EDIT: Credit to Joe Celko, I used his article https://www.red-gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/divided-we-stand-the-sql-of-relational-division/ in troubleshooting my solution, which originally did not work
This is a case of Relational Division With Remainder. There are a number of solutions.
An implementation that is usually more efficient than the other answer gave (a doubly-nested NOT EXISTS) is to join the input list, group by ParentId and ensure that you have as many matches as there are rows in the input list.
You should pass your data in as a Table Valued Parameter. As a hack you can convert to a table variable using STRING_SPLIT, but I'd advise you not to if possible.
Let us assume you have a Table Parameter #input of the form:
CREATE TYPE dbo.IntList TABLE (Value int PRIMARY KEY /* must be unique */);
Then you can do as follows:
DECLARE #count int = (SELECT COUNT(*) FROM #input);
SELECT
t.ParentID
FROM MyTable t
JOIN #input i ON i.Value = t.val
GROUP BY
t.ParentID
HAVING COUNT(*) = #count;
db<>fiddle
You may also want to take a look at other Relational Division techniques.
I have a situation where I need to compare the value of a column with 2 columns from my settings table.
Currently I have this query which works
declare #t int = 3
select 1
where #t = (select s.RelationGDGMID from dbo.tblSettings s )
or
#t = (select s.RelationGTTID from dbo.tblSettings s )
But I wonder if I can make this without reading tblSettings 2 times, and then I tried this
declare #t int = 3
select 1
where #t in (select s.RelationGDGMID, s.RelationGTTID from dbo.tblSettings s )
and this does not compiles, it returns
Only one expression can be specified in the select list when the
subquery is not introduced with EXISTS
So how can I do this without reading tblSettings 2 times, well one solution would be using the EXISTS like the error hints me
declare #t int = 3
select 1
where exists (select 1 from dbo.tblSettings s where s.RelationGDGMID = #t or s.RelationGTTID = #t)
and yes that works, only reads tblSettings once, so I can use this.
But I still wonder if there is a way to make it work with the IN operator
After all, when I do this
declare #t int = 3
select 1
where #t in (3, 1)
that works without problems,
so why does
where #t in (select s.RelationGDGMID, s.RelationGTTID from dbo.tblSettings s )
not works, when in fact it also returns (3, 1) ?
One way to do it would be to use UNION if the columns are of the same type.
where #t in (select s1.RelationGDGMID from dbo.tblSettings s1 UNION
select s2.RelationGTTID from dbo.tblSettings s2)
The reason this works is because it is returning one value set (1 column with values). The reason where #t in (3, 1) works is because this the same, it is returning one value set (value 3 and value 1).
That said I would prefer the EXISTS over IN as this could produce a better query plan.
Table1
ID Name Tags
----------------------------------
1 Customer1 Tag1,Tag5,Tag4
2 Customer2 Tag2,Tag6,Tag4,Tag11
3 Customer5 Tag6,Tag5,Tag10
and Table2
ID Name Tags
----------------------------------
1 Product1 Tag1,Tag10,Tag6
2 Product2 Tag2,Tag1,Tag5
3 Product5 Tag1,Tag2,Tag3
what is the best way to join Table1 and Table2 with Tags column?
It should look at the tags column which coma seperated on table 2 for each coma seperated tag on the tags column in the table 1
Note: Tables are not full-text indexed.
The best way is not to have comma separated values in a column. Just use normalized data and you won't have trouble with querying like this - each column is supposed to only have one value.
Without this, there's no way to use any indices, really. Even a full-text index behaves quite different from what you might thing, and they are inherently clunky to use - they're designed for searching for text, not meaningful data. In the end, you will not get much better than something like
where (Col like 'txt,%' or Col like '%,txt' or Col like '%,txt,%')
Using a xml column might be another alternative, though it's still quite a bit silly. It would allow you to treat the values as a collection at least, though.
I don't think there will ever be an easy and efficient solution to this. As Luaan pointed out, it is a very bad idea to store data like this : you lose most of the power of SQL when you squeeze what should be individual units of data into a single cell.
But you can manage this at the slight cost of creating two user-defined functions. First, use this brilliant recursive technique to split the strings into individual rows based on your delimiter :
CREATE FUNCTION dbo.TestSplit (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn AS SplitIndex,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS SplitPart
FROM Pieces
)
Then, make a function that takes two strings and counts the matches :
CREATE FUNCTION dbo.MatchTags (#a varchar(512), #b varchar(512))
RETURNS INT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM dbo.TestSplit(',', #a) a
INNER JOIN dbo.TestSplit(',', #b) b
ON a.SplitPart = b.SplitPart)
END
And that's it, here is a test roll with table variables :
DECLARE #A TABLE (Name VARCHAR(20), Tags VARCHAR(100))
DECLARE #B TABLE (Name VARCHAR(20), Tags VARCHAR(100))
INSERT INTO #A ( Name, Tags )
VALUES
( 'Customer1','Tag1,Tag5,Tag4'),
( 'Customer2','Tag2,Tag6,Tag4,Tag11'),
( 'Customer5','Tag6,Tag5,Tag10')
INSERT INTO #B ( Name, Tags )
VALUES
( 'Product1','Tag1,Tag10,Tag6'),
( 'Product2','Tag2,Tag1,Tag5'),
( 'Product5','Tag1,Tag2,Tag3')
SELECT * FROM #A a
INNER JOIN #B b ON dbo.MatchTags(a.Tags, b.Tags) > 0
I developed a solution as follows:
CREATE TABLE [dbo].[Table1](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table2](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
get sample data for Table1, it will insert 28000 records
INSERT INTO Table1
SELECT CustomerID,CompanyName, (FirstName + ',' + LastName)
FROM AdventureWorks.SalesLT.Customer
GO 3
sample data for Table2.. i need same tags for Table2
declare #tag1 nvarchar(50) = 'Donna,Carreras'
declare #tag2 nvarchar(50) = 'Johnny,Caprio'
get sample data for Table2, it will insert 9735 records
INSERT INTO Table2
SELECT ProductID,Name, (case when(right(ProductID,1)>=5) then #tag1 else #tag2 end)
FROM AdventureWorks.SalesLT.Product
GO 3
My Solution
create TABLE #dt (
Id int IDENTITY(1,1) PRIMARY KEY,
Tag nvarchar(250) NOT NULL
);
I've create temp table and i will fill with Distinct Tag-s in Table1
insert into #dt(Tag)
SELECT distinct Tag
FROM Table1
Now i need to vertical table for tags
create TABLE #Tags ( Tag nvarchar(250) NOT NULL );
Now i'am fill #Tags table with While, you can use Cursor but while is faster
declare #Rows int = 1
declare #Tag nvarchar(1024)
declare #Id int = 0
WHILE #Rows>0
BEGIN
Select Top 1 #Tag=Tag,#Id=Id from #dt where Id>#Id
set #Rows =##RowCount
if #Rows>0
begin
insert into #Tags(Tag) SELECT Data FROM dbo.StringToTable(#Tag, ',')
end
END
last step : join Table2 with #Tags
select distinct t.*
from Table2 t
inner join #Tags on (',' + t.Tag + ',') like ('%,' + #Tags.Tag + ',%')
Table rowcount= 28000 Table2 rowcount=9735 select is less than 2 second
I use this kind of solution with paths of trees. First put a comma at the very begin and at the very end of the string. Than you can call
Where col1 like '%,' || col2 || ',%'
Some database index the column also for the like(postgres do it partially), therefore is also efficient. I don't know sqlserver.
Can somebody help me with this little task? What I need is a stored procedure that can find duplicate letters (in a row) in a string from a table "a" and after that make a new table "b" with just the id of the string that has a duplicate letter.
Something like this:
Table A
ID Name
1 Matt
2 Daave
3 Toom
4 Mike
5 Eddie
And from that table I can see that Daave, Toom, Eddie have duplicate letters in a row and I would like to make a new table and list their ID's only. Something like:
Table B
ID
2
3
5
Only 2,3,5 because that is the ID of the string that has duplicate letters in their names.
I hope this is understandable and would be very grateful for any help.
In your answer with stored procedure, you have 2 mistakes, one is missing space between column name and LIKE clause, second is missing single quotes around search parameter.
I first create user-defined scalar function which return 1 if string contains duplicate letters:
EDITED
CREATE FUNCTION FindDuplicateLetters
(
#String NVARCHAR(50)
)
RETURNS BIT
AS
BEGIN
DECLARE #Result BIT = 0
DECLARE #Counter INT = 1
WHILE (#Counter <= LEN(#String) - 1)
BEGIN
IF(ASCII((SELECT SUBSTRING(#String, #Counter, 1))) = ASCII((SELECT SUBSTRING(#String, #Counter + 1, 1))))
BEGIN
SET #Result = 1
BREAK
END
SET #Counter = #Counter + 1
END
RETURN #Result
END
GO
After function was created, just call it from simple SELECT query like following:
SELECT
*
FROM
(SELECT
*,
dbo.FindDuplicateLetters(ColumnName) AS Duplicates
FROM TableName) AS a
WHERE a.Duplicates = 1
With this combination, you will get just rows that has duplicate letters.
In any version of SQL, you can do this with a brute force approach:
select *
from t
where t.name like '%aa%' or
t.name like '%bb%' or
. . .
t.name like '%zz%'
If you have a case sensitive collation, then use:
where lower(t.name) like '%aa%' or
. . .
Here's one way.
First create a table of numbers
CREATE TABLE dbo.Numbers
(
number INT PRIMARY KEY
);
INSERT INTO dbo.Numbers
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number > 0;
Then with that in place you can use
SELECT *
FROM TableA
WHERE EXISTS (SELECT *
FROM dbo.Numbers
WHERE number < LEN(Name)
AND SUBSTRING(Name, number, 1) = SUBSTRING(Name, number + 1, 1))
Though this is an old post it's worth posting a solution that will be faster than a brute force approach or one that uses a scalar udf (which generally drag down performance). Using NGrams8K this is rather simple.
--sample data
declare #table table (id int identity primary key, [name] varchar(20));
insert #table([name]) values ('Mattaa'),('Daave'),('Toom'),('Mike'),('Eddie');
-- solution #1
select id
from #table
cross apply dbo.NGrams8k([name],1)
where charindex(replicate(token,2), [name]) > 0
group by id;
-- solution #2 (SQL 2012+ solution using LAG)
select id
from
(
select id, token, prevToken = lag(token,1) over (partition by id order by position)
from #table
cross apply dbo.NGrams8k([name],1)
) prep
where token = prevToken
group by id; -- optional id you want to remove possible duplicates.
another burte force way:
select *
from t
where t.name ~ '(.)\1';
Is there any way to group by all the columns of a table without specifying the column names? Like:
select * from table group by *
The DISTINCT Keyword
I believe what you are trying to do is:
SELECT DISTINCT * FROM MyFooTable;
If you group by all columns, you are just requesting that duplicate data be removed.
For example a table with the following data:
id | value
----+----------------
1 | foo
2 | bar
1 | foo
3 | something else
If you perform the following query which is essentially the same as SELECT * FROM MyFooTable GROUP BY * if you are assuming * means all columns:
SELECT * FROM MyFooTable GROUP BY id, value;
id | value
----+----------------
1 | foo
3 | something else
2 | bar
It removes all duplicate values, which essentially makes it semantically identical to using the DISTINCT keyword with the exception of the ordering of results. For example:
SELECT DISTINCT * FROM MyFooTable;
id | value
----+----------------
1 | foo
2 | bar
3 | something else
If you are using SqlServer the distinct keyword should work for you. (Not sure about other databases)
declare #t table (a int , b int)
insert into #t (a,b) select 1, 1
insert into #t (a,b) select 1, 2
insert into #t (a,b) select 1, 1
select distinct * from #t
results in
a b
1 1
1 2
I wanted to do counts and sums over full resultset. I achieved grouping by all with GROUP BY 1=1.
Short answer: no. GROUP BY clauses intrinsically require order to the way they arrange your results. A different order of field groupings would lead to different results.
Specifying a wildcard would leave the statement open to interpretation and unpredictable behaviour.
nope. are you trying to do some aggregation? if so, you could do something like this to get what you need
;with a as
(
select sum(IntField) as Total
from Table
group by CharField
)
select *, a.Total
from Table t
inner join a
on t.Field=a.Field
No because this fundamentally means that you will not be grouping anything. If you group by all columns (and have a properly defined table w/ a unique index) then SELECT * FROM table is essentially the same thing as SELECT * FROM table GROUP BY *.
Here is my suggestion:
DECLARE #FIELDS VARCHAR(MAX), #NUM INT
--DROP TABLE #FIELD_LIST
SET #NUM = 1
SET #FIELDS = ''
SELECT
'SEQ' = IDENTITY(int,1,1) ,
COLUMN_NAME
INTO #FIELD_LIST
FROM Req.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'new340B'
WHILE #NUM <= (SELECT COUNT(*) FROM #FIELD_LIST)
BEGIN
SET #FIELDS = #FIELDS + ',' + (SELECT COLUMN_NAME FROM #FIELD_LIST WHERE SEQ = #NUM)
SET #NUM = #NUM + 1
END
SET #FIELDS = RIGHT(#FIELDS,LEN(#FIELDS)-1)
EXEC('SELECT ' + #FIELDS + ', COUNT(*) AS QTY FROM [Req].[dbo].[new340B] GROUP BY ' + #FIELDS + ' HAVING COUNT(*) > 1 ')
You can use Group by All but be careful as Group by All will be removed from future versions of SQL server.