SQL - reverse like. Is it possible? - sql

I have a db table containing these data
| id | curse_word |
----------------------
| 1 | d*mn |
| 2 | sh*t |
| 3 | as*h*le |
I am creating this website that sort of behaves like an online forum where people can created discussion threads and talk about it. To be able to post you need to register. To register to need a forum username. We wanted to prevent usernames from containing curse words anywhere in it. These curse words are defined in our database.
To check for this using the table above, I have thought of using an sql query with like.
But what if the username registered is something like this -> "someshttyperson". Since there is the word sht in it, the username should not be allowed. So this is something like using a sql query with reverse like.
I tried the following command below but it won't work:
select * from table where "somesh*ttyperson" LIKE curse_word
How can I make it work?

Although I'd give Tomalak's comment some consideration, here's a solution that might fit your needs:
SELECT COUNT(*) FROM curse_words
WHERE "somesh*ttyperson" LIKE CONCAT('%', curse_word, '%');
In this way you are actually composing a LIKE comparison term for each of the curse words by prepending and appending a % (e.g. %sh*t%).
LIKE might be a bit expensive to query if you plan on having millions of curse words but I think it's reasonable to assume you aren't.
All you have to do now is test for this result being strictly equal to 0 to let the nickname through, or forbid it otherwise.

Related

What is the best way to search multiple columns of a database for user/client inputted information? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have data in the following format:
+---------+---------+----------+-----------+-----------+-----------+
| id | title | author | keyword_1 | keyword_2 | keyword_3 |
+---------+---------+----------+-----------+-----------+-----------+
I am looking to store it in a database so I can search by title, keyword_1, keyword_2, or keyword_3.
An example would be
+---------+------------------+-----------+-------------+-------------+-----------+
| id | title | author | keyword_1 | keyword_2 | keyword_3 |
+---------+------------------+-----------+-------------+-------------+-----------+
| 123 | Learn Java 101 | John Doe | java | programming | software |
+---------+------------------+-----------+-------------+-------------+-----------+
On the front end, there is a form where the user inputs a title and/or keywords. The database needs to be queried for this information. But the user input will likely not be an exact match, so we need to do some kind of regex or fuzzy matching. The user payload may input something like:
{
title: "Learn Java",
author: "Jon Doee",
keyword1: "computers",
keyword2: "softwar",
keyword3: null,
}
I realize there are some built-in operations, for example, in Postgres we have LIKE and Levenshtein(). However, I'm not sure if this is the right approach. It seems like a very expensive operation to compare a keyword with all three columns.
Surely there must be a clean way to do this. I am posting here because I want to check whether this is or is not the path that I should go down.
From an architectural standpoint is this the correct way to store the data? I thought about using a document-based system and I'm not sure that that would be much better or worse.
I'm somewhat new to all this and would appreciate some guidance on what is recommended.
Thanks!
I would start with a normalized relational model:
Books:
| id | title | author |
| 123 | Learn Java 101 | John Doe |
Then:
BookKeywords
| book_id | Keyword |
| 123 | java |
| 123 | programming |
| 123 | software |
One particularly valuable feature of this data model is that you can have a Keywords table and validate that only valid keywords go into this table.
This is the "normal" way to store multiple values per entity.
After you have mastered this, you can think about alternative structures. For instance:
Storing the keywords as a text field and using text search can work well under some circumstances.
Storing the keywords as an array can work well under some circumstances.
Storing the keywords in JSON can work well under some circumstances.
But start with what the SQL language was designed to support -- separate entities in tables.
When you are using RDBMS and you have clear idea what information you will be storing, why do you prefer storing in document.
In RDBMS, one generally uses datatypes like json, xml, etc when information isn't relational or purpose is just storing and retrieving and there are least modification.
Looking at your table, relational method will always give you faster results as compared to document method when dealing with huge data.
Yes, like operations are bit expensive and alternate is REGEXP or SIMILAR TO(for Postgres). You should know where to use what. You can always create pattern matching index on the columns which you are going to use in where clause. GIN/GIST index for column where more than 2 words are being stored. ex:Title
If there are continuous updates or delete being performed, consider performing maintenance operations on table by setting correct vacuum parameters, analysing table, index rebuild/recreate.
If there are millions of records being stored, use table partitioning.
Your requirement is pretty decent and I don't see any need of storing in document over here.

Oracle SQL Like not working for hyphenated words

I have a table column that holds the description which looks like this:
id description
--------------------------------------
100 ... post-doctorate ...
200 ... postdoctorate ...
300 ... post doctorate ...
I implemented a searching mechanism where users can search for keywords and somehow I'm having issues searching for these words. Even though I'm using LIKE in my WHERE clause I can't seem to include all 3 rows above.
Query
WHERE description LIKE '%post-doctorate%'
I would like to be able to search all 3 of them using any of the variations illustrated as my keyword.
I also looked at using SOUNDEX but even that doesn't work.
Please Note
Kindly ignore the fact that I'm not using parameterized queries in here. I'm aware of what it is and how to use it but this was an old project I created.
A method using like is:
WHERE description LIKE '%post%doctorate%'
But that is much more general than you want. So, use regular expressions:
WHERE REGEXP_LIKE(description, 'post[- ]?doctorate'
Or, if you want to allow any character to appear at most once:
WHERE REGEXP_LIKE(description, 'post(.)?doctorate'

Read sql query via xml file

I have a financial application that has a large set of rules to check. The file is stored in a sql server. This is a web application using C#. Each file must be checked for these rules and there are hundreds of rules to consider. These rules change every few weeks to months. My thought was to store these rules in an xml file and have my code behind read the xml and dynamically generate the sql queries on the file. For testing purposes we are hard coding these rules, but would like to move to an architecture that is more accommodating of these rules changes. I'd think that xml is a good way to go here, but I'd appreciate advice of those that have gone down similar roads before.
The complexity of each rule check is small and generally are just simple statements such as: "If A && B && (C || D)" then write output string to log file".
My thought would be to code up the query in xml (A && B && (C || D)) and attach a string to that node in the xml. If the query is successful the string is written, if the query is not successful no string is written.
Thoughts?
In response to a comment, here is a more specific example:
The database has an entity called 'assets'. There are a number of asset types supported, such as checking, savings, 401k, IRA, etc etc. An example of a rule we want to check would be: "If the file has a 401k, append warning text to the report saying ". That example is for a really simple case.
We also get into more complex and dynamic cases where for a short period of time a rule may be applied to deny files with clients in specific states with specific property types. Classic example is to not allow condominiums in Florida. This rule max exist for a while, then be removed.
The pool of rules are constantly changing based on the discretion of large lending banks. We need to be able to make these rule changes outside of the source code for the site. Thus the idea of using xml and have the C# parse the xml and apply the rules dynamically was my idea. Does this help clarify the application and its needs?
could you just have a table with SQL in it? you could then formalist it a bit by having the SQL return a particular strucure..
so you table of checks might be:
id | chechGroup | checkName | sql |
1 | '401k checks'| '401k present' |select |
| '401k present'|
| ,count(*) |
| ,'remove 401k'|
|from |
| assests |
|where |
| x like '401k%'|
you could insist that the sql in the sql column returns something of the format:
ruleName | count | comment
'401k present'| 85 |'remove 401k'
you could have different types of rules.. when i have done similar to this I have not returned totals instead I have returned something more like:
table | id | ruleBorken | comment
'assets' | 1 | '401k present' | 'remove 401k'
this obviously would have a query more like:
select
'assets'
,id
,'401k present'
,'remove 401k'
from
assets
where
x like '401k%'
this makes it easier to generate interactive reports where the aggregate functions are done by the report (e.g. ssrs) allowing drill down to problem records.
the queries that validate the rules can either be run within a stored procedure that selects the queries out and uses EXEC to execute them, or they could be run from your application code one by one.
some of the columns (e.g. rule name) can be populated but he calling stored procedure or code.
the comments and rulename in this example are basically the same, but it can be handy to have the comments separate and put a case statement in there. - e.g. when failing validation rules, say on fields that should not be blank if you have a 401k, then you can have a case statement that tells which fields are missing in the comments.
If you want end users or non devs to create the rules then you could look at ways of generating the where clause in code and allowing the user to select table, rule name and generate a where clause through some interface, then save it to your rule table and you are good to go.
if all of your rules return a set format it allows you to have one report template for all rules - equally if you have exactly three types of rule then you could have three return formats and three report formats.. basically i like formalizing the result structure as it allows much more reuse elsewhere

SQL - Calculating columns using dynamic functions

I'm trying to create a set of data that I'm going to write out to a file, it's essentially a report composed of various fields from a number of different tables, some columns need to have some processing done on them, some can just be selected.
Different users will likely want different processing performed on certain columns, and in the future, I'll probably need to add additional functions for computed columns.
I'm considering the cleanest/most flexable approach to storing and using all the different functions I'm likely to need for these computed columns, I've got two ideas in my head, but I'm hoping there might be a much more obvious solution I'm missing.
For a simple, slightly odd example, a Staff table:
Employee | DOB | VacationDays
Frank | 01/01/1970 | 25
Mike | 03/03/1975 | 24
Dave | 05/02/1980 | 30
I'm thinking I'd either end up with a query like
SELECT NameFunction(Employee, optionID),
DOBFunction(DOB, optionID),
VacationFunction(VacationDays, optionID),
from Employee
With user defined functions, where the optionID would be used in a case statement inside the functions to decide what processing to perform.
Or I'd want to make the way the data is returned customisable using a lookup table of other functions:
ID | Name | Description
1 | ShortName | Obtains 3 letter abbreviation of employee name
2 | LongDOB | Returns DOB in format ~ 1st January 1970
3 | TimeStampDOB | Returns Timestamp for DOB
4 | VacationSeconds | Returns Seconds of vaction time
5 | VacationBusinessHours | Returns number of business hours of vacation
Which seems neater, but I'm not sure how I'd formulate the query, presumably using dynamic SQL? Is there a sensible alternative?
The functions will be used on a few thousand rows.
The closest answer I've found was in this thread:
Call dynamic function name in SQL
I'm not a huge fan of dynamic SQL, although in this case I think it might be the best way to get the result I'm after?
Any replies appreciated,
Thanks,
Chris
I would go for the second solution. You could even use real stored proc names in your lookup table.
create proc ShortName (
#param varchar(50)
)as
begin
select 'ShortName: ' + #param
end
go
declare #proc sysname = 'ShortName'
exec #proc 'David'
As you can see in the example above, the first parameter of exec (i.e. the procedure name) can be a parameter. This said with all the usual warnings regarding dynamic sql...
In the end, you should go with whichever is faster, so you should try both ways (and any other way someone might come up with) and decide after that.
I like better the first option, as long as your functions don't have extra selects to a table. You may not even need the user defined functions, if they are not going to be reused in a different report.
I prefer to use Dynamic SQL ony to improve a query's performance, such as adding a dynamic ordering or adding / removing complex WHERE conditions.
But these are all subjective opinions, the best thing is try, compare, and decide.
Actually, this isn't a question of what's faster. It is a question of what makes the code cleaner, particularly for adding new functionality (new columns, new column formats, re-ordering them).
Don't think of your second approach as "using dynamic SQL", since that tends to have negative connotations. Instead, think of it as a data-driven approach. You want to build a table that describes the columns that users can get, and the formats. This is great! Users can then provide a list of columns, and you'll have a magical stored procedure that combines the information from the users with the information in your metadata table, and produces the desired result.
I'm a big fan of data-driven approaches, and dynamic SQL is the best SQL tool I've found so far for implementing them.

Query sql on string

I have a db with users that have all this record .
I would like to do a query on a data like
CN=aaa, OU=Domain,OU=User, OU=bbbbbb,OU=Department, OU=cccc, OU=AUTO, DC=dddddd, DC=com
and I need to group all users by the same ou=department.
How can I do the select with the substring to search a department??
My idea for the solution is to create another table that is like this:
---------------------------------------------------
ldapstring | society | site
---------------------------------------------------
"CN=aaa, OU=Domain,OU=User, OU=bbbbbb,OU=Department, OU=cccc, OU=AUTO, DC=dddddd, DC=com" | societyName1 | societySite1
and my idea is to compare the string with these on the new table with the tag like but how can I take the society and site when the like string occurs?????
Please help me
You could always do ColumnName LIKE '%OU=Department%'.
Regardless, I think this needs to be normalized into a better table, if possible. Multivalue columns should be avoided as much as possible.
IF you aren't dealing with a database, the next best thing would be a regular expression.
Maybe you should look into MySQL regular expressions. I, myself, have never used it, but just wanted to suggest it :-)
http://dev.mysql.com/doc/refman/5.1/en/regexp.html