Search for occurrences of string in one table field in the field of another

Search for occurrences of string in one table field in the field of another - sql

Let's say I want to find mentions of names listed in one table within another. So for instance I have this table:
ID | Name
----+-----------------------
1 | PersonA
2 | PersonB
3 | PersonC
4 | PersonD
Now I want to search a field in another table for these persons' names and produce a count for each. Here's what I've tried, to no avail:
select
Name,
sum(
select
count(*)
from Posts
where Posts.Body like '%[^N]' + [Name] + '%'
) as [Count]
from NamesTable
order by Name;
I am using Data Explorer here on SE, so whatever syntax will work there is what I need. I'm not sure how to get this working or if this is even the best approach.

Your query is very close. You just don't need the sum() in the outer query:
select Name,
(select count(*)
from Posts
where Posts.Body like '%[^N]' + [Name] + '%'
) as [Count]
from NamesTable
order by Name;

Related

How can I delete completely duplicate rows from a query, without having a unique value for it?

I'm having an issue getting information from an MS Access Database table. I need a count of a code but I don't have to take into account duplicate rows, which means that I need to delete all duplicate rows.
Here's an example to illustrate what I need:
Code | Name
12 | George
20 | John
12 | George
33 | John
I will need first to delete both rows with the same code, and then I need a count for the name the rest of the table data for example this will be the result that I'm expecting:
Name | Count
John | 2
I already have a query that does that for me, but is taking around 1 hour to get me around 5000 rows and I need something more efficient. My query:
select name, count(*) from Table
where name = '" + input_name + "'
and code in (select code from Table group by code
having count(code) = 1)
group by name
order by count(name) desc;
I would appreciate any suggestion.

Rather than using in, I might suggest filtering the original dataset in a subquery, e.g.:
select u.name, count(*)
from (select t.code, t.name from yourtable t group by t.code, t.name having count(*) = 1) u
group by u.name
Here, change yourtable to the name of your table.

Best practice for joinning 2 tables using LIKE operator or better approach

I have 2 tables that have to be processed once a day in data warehouse.
MessageTable
Id integer primary key
Message varchar(max)
Example:
Id | Message
1 | Hi! This is the first message.
2 | the last message.
PartTable
PartId integer primary key
Words varchar(100)
Example:
PartId | Message
1 | This
2 | message, first
3 | last
Table 1 contains messages to be compared with Table 2 in order to know which parts each message is belonged to.
So above example should return like this.
Id | MessageId | PartId
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
Because message(id 1) contains "This" keyword as well as "message" and "first", it can be part of 0 and 1.
When keywords in a part are separated by comma all the keywords need to be found in message irrespective of the order.
Stored procedure I roughly made for this process is like this.
INSERT INTO ResultTable(MessageId, PartId)
SELECT MessageTable.Id as MessageId, PartTable.Id as PartID
FROM MessageTable m, PartTable p
WHERE
(SELECT COUNT(VALUE) FROM STRING_SPLIT(p.Word, ',') WHERE CHARINDEX(CONCAT(' ', VALUE, ' '), m.Message) > 0) = (SELECT COUNT(VALUE) FROM STRING_SPLIT(p.Word, ','))
This SQL statement seems to work even though I haven't confirmed thoroughly. But this doesn't look like a good practice.
Should I just try to use more relational approach on PartTable like below? Then all the word rows for a part should be found in message to determine message is belonged to the part.
Id | PartId | Word
1 | 1 | This
2 | 2 | message
3 | 2 | last
I can create this table using STRING_SPLIT on PartTable or PartTable can be refactored. But I don't see the way to join this table with MessageTable. Also I am expecting there would be a lot of rows in MessageTable.
Can anyone give me any help on this?
Thanks,

Hmmmm . . . You can combine all parts and messages and split the parts into words. A where clause can be used for filtering, so only matches are included. A final aggregation and counting returns the message/part pairs where all words match:
select m.id, pt.partid
from message m cross join
parttable pt cross apply
string_split(pt.words, ',') s
where m.message like '%' + s.value + '%'
group by m.id, pt.partid
having count(*) = (select count(*)
from parttable pt2 cross apply
string_split(pt.words, ',') s
where pt2.partid = pt.partid
);
This is not efficient and it is very hard to optimize in SQL Server given your data structure.
A better structure for the parttable would be an improvement for the query:
select m.id, ptn.partid
from message m join
(select ptn.*, count(*) over (partition by partid) as cnt
from parttablenormalized ptn
) ptn
on m.message like '%' + ptn.word + '%'
group by m.id, pnt.partid, cnt
having count(*) = cnt;
However performance might not change much. You would need to denormalize message as well for a speedier query.

SQL Group By and changing the grouped values

I have a table like this:
1 | Madrid | 45000
2 | Berlin | 35000
3 | Berlin | 65000
Now I want to show a result like this:
1 | Madrid | 45000
2 | Berlin | "Different Values"
So basically I want to use a "Group By" and if it is grouped, then change the value of some columns to a manual string.
I thought about using a view, update all values in this view to the string where i have duplications of the grouped column and then use the real query.
I even thought about implementing an assembly into the sql server that does this, but I don't find any good tutorials on this, only that you can do it.
Or has someone an even better idea? (The real tables used here are huge and the sql query does take sometimes up to 3 minutes to perform, so I made this example simple and I didn't wanted to work here with counts on every column to group, because that could take more than just a few minutes.

Something like this should work
select min(id) as id,
name,
case when count(*) = 1
then cast(sum(value) as varchar)
else 'Different values'
end as value
from your_table
group by name

I would do this as:
select min(id) as id, city,
(case when min(value) = max(value)
then cast(max(value) as varchar(255))
else 'Different Values'
end) as result
from t
group by city;
In fact, I might use something more informative than "different values", such as the range:
select min(id) as id, city,
(case when min(value) = max(value)
then cast(max(value) as varchar(255))
else cast(min(value) as varchar(255)) + '-' + cast(max(value) as varchar(255))
end) as result
from t
group by city;

postgreSQL - get most frequent value from many columns

I have a table hobbies:
+++++++++++++++++++++++++++++++
+ hobby_1 | hobby_2 | hobby_3 +
+---------+---------+---------+
+ music | soccer | [null] +
+ movies | music | cars +
+ cats | dogs | music +
+++++++++++++++++++++++++++++++
I want to get to most freuqent used value. The answer would be music
I know the query to get the most frequent value for one column:
SELECT hobby_1, COUNT(*) FROM hobbies
GROUP BY hobby_1
ORDER BY count(*) DESC;
But how to get the most frequent value when combining all columns.

You need to unpivot the data. Here is one method:
select h.hobby, count(*)
from ((select hobby_1 as hobby from hobbies) union all
(select hobby_2 as hobby from hobbies) union all
(select hobby_3 as hobby from hobbies)
) h
group by h.hobby
order by count(*) desc;
However, you should really fix your data structure. Having multiple columns only distinguished by a number is usually a sign of a problem with the data structure. You should have a table with one row for each hobby.

Unique results from database?

I am selecting all badge numbers from a database where category is equal to 1.
category | badge number
0 | 1
1 | 1
2 | 5
1 | 1
Sometimes the category is duplicated, is there a way to only get unique badge numbers from the database?
So above there is two 1's in category, each with badge number 1. How can I make sure the result only gives '1' rather than '1,1'

Use DISTINCT key word in the SELECT statement.
SELECT DISTINCT badge_number FROM Your_Table WHERE category = 1

Use the distinct keyword in your select.
select distinct badge_number from table_name where category = 1

Have you tried Select Distinct :
SELECT DISTINCT [badge number] from table
where Category=1
http://www.w3schools.com/sql/sql_distinct.asp

Select Distinct Badgenumber from table where Category = 1

SELECT DISTINCT BadgeNumber FROM dbo.TableName
Where Category = 1
Edited:
Ohh, there are so many posts already .... !!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Search for occurrences of string in one table field in the field of another - sql

Your query is very close. You just don't need the sum() in the outer query: select Name, (select count(*) from Posts where Posts.Body like '%[^N]' + [Name] + '%' ) as [Count] from NamesTable order by Name;

Related

How can I delete completely duplicate rows from a query, without having a unique value for it?

Best practice for joinning 2 tables using LIKE operator or better approach

SQL Group By and changing the grouped values

postgreSQL - get most frequent value from many columns

Unique results from database?

Categories

Resources