How to do nickname searches in Postgres [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Hi guys so I have a question about nick name searches.
I have a very large database and in my Accounts Entity I have a firstname column. When a user searches for an account by first name, it is possible that they may be using a nickname. For example searching for Bob, should also return Robert.
The way I would think to do this would be to create a table called nickname, with two columns, the nickname, and name. That way we map bob->robert.
Then when doing the query make the where clause look like this "WHERE firstname IN (SELECT name FROM nickname WHERE nickname = 'bob')"
The two problems I have is, the query above seems very inefficient, and would be very slow over large data sets (I could be wrong here so please tell me if so, when I say large data set I mean 14 million rows).
The Second problem I have is where to get the Nickname data from. This is the only thing I have found so far: https://code.google.com/p/nickname-and-diminutive-names-lookup/downloads/list
Any help would be greatly appreciated.

One option would be to use full text search instead:
http://www.postgresql.org/docs/current/static/textsearch.html
This would allow you to add custom dictionaries, among other colorful features:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html

I had to solve a similar problem. We had a table with name variations that were linked to an individual. This was for a database of authors.
We then created a mapping table with soundex and double metaphone entries for these names (pre-generated) then did queries against that table to find individuals.
If you're not familiar with soundex or double metaphone, they are phonetic algorithms to match when using misspellings and similiar names. Soundex was developed for the US Census.
In our case, we already had the variations of every name the person published as rather than a generic list of names. However, the soundex algorithm can help with similar spellings. You'll still need to get a nickname list from somewhere, but this should help with the performance.
The reason I suggested two algorithms is that we had a lot of collisions using just one, but with both of those together it was a rather good filter. Double Metaphone worked better for non Western European names.
I'd suggest adding a front end element to let your customer service people (or customers) add their nickname too. Customers can help you build up a nickname list and you can use known nicknames to help with fuzzy searches on others eventually.

Related

Do I need to create sample tables with data always to recreate the issue in questions? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
As I am a beginner to stackoverflow community, Can someone help me with following questions?
What are the possible ways to write queries and give answers
to stackoverflow questions?
Do I need to create databases with sample data locally from
scratch to answer them?
Is there any online tools that can
write queries with dummy data help
to answer the questions quickly?
Really appreciate your answers on this.
I believe you are looking for something like this
What are the possible ways to create code samples and give answers to stackoverflow questions?
https://dbfiddle.uk/
http://sqlfiddle.com/
https://www.db-fiddle.com/
Here you can create objects, create data and write code simulate "the answer" to some question.
Do I need to create local projects from scratch to answer them?
I am not sure what are you asking here but NO.
Is there any online tools that can create code samples with dummy data (in case of a SQL question) help to answer the questions quickly?
https://generatedata.com/
https://www.onlinedatagenerator.com/
This question and also possible answers are likely to some part opinion based (and this is something which should be avoided on Stackoverflow).
But some things to answer your question should be generally correct and not opinion based. So whenever you ask a SQL question, take care of following points:
Describe your issue as most precise and clear as possible, but avoid to add unnecessary or confusing information
Tag the DB you are using
Always show the sample input and expected outcome
Show what you tried to solve your issue (in best case, you already created a query that is almost correct) and explain why it's not working yet.
If possible, provide a DB fiddle to allow people to replicate your issue.
There will be further points (maybe someone will provide another answer or comment this one), so overall you can just ask yourself "Have I done everything possible to make it possible that people can understand my issue and provide a possible solution?" and make sure your question is asked in a way which answers this question with "Yes!".
Let's make a very simple example:
Your sample input is:
Table yourtable
column1
column2
1
2
2
2
3
4
0
10
And you want to find this outcome:
column1
column2
2
2
0
10
The rows you want to receive should have identic values in column1 and column2 OR column1 = 0.
Now you also tell us that you tried this query:
SELECT column1, column2
FROM yourtable
WHERE column1 = column2
AND column1 = 0;
But this query doesn't find any rows. Remember this is a really simple example and of course, you would have found the reason. But since we just want to illustrate the way of asking questions, now you have already shown us sample input, expected outcome and your failing idea how to get this result.
Now, you could also add a fiddle example with all this information.
db<>fiddle
Well, perfect! Now people can easily understand and replicate your issue and just need to think about a solution and check this solution using the fiddle you've shown. And then you will likely get a correct answer in a short time and don't have to ask plenty of questions because your question was too unclear.
So in this simple case, someone would tell you that you need OR instead of AND.
SELECT column1, column2
FROM yourtable
WHERE column1 = column2
OR column1 = 0;
After googling for an answer, I could find one site to write queries with sample data in it (sample data already uploaded there). It is very helpful to quickly find answers. So, I am gonna post it here.
https://data.stackexchange.com/stackoverflow/query/edit/1604254

SQL naming convention: Adding data type to column name [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
As a developer, a common mistake that I keep on repeating is assuming the data type of a column. I have read multiple articles regarding SQL column naming convention but have not seen any reference regarding data type as part of the column name - specifically for SQL Server.
E.g. Revenue_f for float, Organization_v for varchar, AccountNumber_i for integer and so on.
This must have been thought of already before but I want to know the reason why it is not being used, or an expert's input regarding the matter; pointing me to the right article/documentations will be greatly appreciated.
That is a horrible naming convention. Consider how awful it would be if you need to change AccountNumber to a character datatype. Do you then go back and rename the column and change every single query everywhere? Or do you leave the suffix in your column name even though it is no longer accurate? If you want to know the datatype of a column the ONLY way is to look at the definition of the table.
Also, a single character really is kind of useless. How do you handle nvarchar vs varchar? And what about the scale?
P.S. Even though I wrote an answer I am voting to close this question because it is primarily opinion based and as such is considered off topic for SO.

Why does SQL want you to select the column and THEN refer to the table? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
So this isn't a technical question, but rather questioning why a language is designed the way it is.
I've been learning SQL and one thing that's been bothering me greatly is how SQL asks you to name the column you want and THEN name the table you want to get it from. To me, it would make more sense that you refer to the parent body (which is the table) and THEN the column it has. But SQL seems to forces users to do it the other way around. Why?
I'm just curious as to why the language is designed this way.
SELECT column
FROM table
why not
FROM table
SELECT column
SQL tries to mimic English language to some extent, so that it feels natural to formulate the query.
In spoken English you would say something like "I want the names of the employees". You would not say "I want of the employees their names" or something like that.
But you are right, it might have been a good idea to have the query represent the order of execution. And "From the employee table I want the names" would not be so far off the mark :-)
SQL is a descriptive language, not a procedural language. It describes the result set being produced. And, you can think of that result set as a report, with column headers.
As such, the basic querying construct returns those column headers. The rest of the query describes how they are produced.
You may find this post useful. Starting with FROM is the most logical way to think about a query (Why would anyone write SELECT before knowing what to SELECT from?). However, SQL guidelines were designed as if your query were a command. Thus, you are commanding the system to SELECT the data for you, and the FROM further specifies that command.
Of course, the actual execution is distinct from the lexical and logical orders above.

Database table columns names [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have noticed that some people in database columns instead of for example id, user, addressare using something like h_id, e_user, f_address etc...
Is that some kind of security aspect? or maybe these are some shortcuts of words?
Its because there might be many id fields like user_id,category_id,that's why they use so that code is understandable.
And talking about columns name like f_address, they are just shortcut for say first address. It doesn't have anything to do with security but to increase the query readability use proper name to fields so that people can understand just by seeing column name what data it saves.
If there are fields like category_id and sub_category_id , it is understandable from the field name, but if i denote it using c_id and s_id, its hard to depict.
Well, 'User' is a security object in SQL Server, so using that is kind of scary. 'ID' and 'address' are way too generic to provide any semantics when used as attribute names.
If a purpose of design is to be maintainable and readable, then there some words that simply don't work.
Definitely not security related.
Some use it for readability or speed (you don't have to remember which table you gave a certain name->see following example) when writing queries.
i.e.
select a.name, b.name from table1 a join table2 b on a.id=b.id
Like this you have to remember that table1 is named a and table2 is named b etc.
But if you use tablename_field (which you can shorten by using only the first letter of the tablename). That way you never have duplicate fieldnames when creating join queries.

Contains() function falters with strings of numbers?

For some background information, I'm creating an application that searches against a couple of indexed tables to retrieve some records. It isn't overtly complex to the point of say Google, but it's good enough for the purpose it serves, barring this strange issue.
I'm using the Contains() function, and it's going very well, except when the search contains strings of numbers. Now, I'm only passing in a string -- nowhere numerical datatypes being passed in -- only characters. We're searching against a collection of emails, each appended with a custom ID when shot off from a workflow. So while testing, we decided to search via number strings.
In our test, we isolated a number 0042600006, which belongs to one and only one email subject. However, when using our query we are getting results for 0042600001, 0042600002, etc. The query is this as follows (with some generic columns standing in):
SELECT description, subject FROM tableA WHERE CONTAINS((subject), '0042600006')
We've tried every possible combination: '0042600006*', '"0042600006"' and '"0042600006*"'.
I think it's just a limitation of the function, but I thought this would probably be the best place for answers. Thanks in advance.
Asked this same question recently. Please see the insightful answer someone left me here
Essentially what this user says to do is to turn off the noise words (Microsoft has included integers 0-9 as noise in the Full Text Search). Hope you can use this awesome tool with integers as I now am!
try to add language 1033 as an additional parameter. that worked with my solution.
SELECT description, subject FROM tableA WHERE CONTAINS((subject), '0042600006', language 1033)
try using
SELECT description, subject FROM tableA WHERE CONTAINS((subject), '%0042600006%')