Is there a way to compare the similarity between sentences in sql? I have large dataset and I need to identify instances where there are similar words in a two or more setences.
How do I tell SQL to only return the values below?
From what I have googled, there may be a way to do this using a Full-Text Search and Semantic Search, but I have been able to find an article that addresses what I am trying to achieve.
Could someone in the group, provide me example or point to an article that could help me? Better yet, is what I am trying to do even achievable in SQL.
No, there is not.
Part of the problem is that "similarity" is a complex setup and this requires a program to analyze the sentence POSSIBLY with months of programming. You give pretty simplistic examples - grats. Even that is not as easy as you think. What about "the small boy wear red t-shirt" - would small boy be a difference or not?
This requires a LOT of work, and a LOT of definition, or a LOT of training of possibly a multi layer neural network.
SQL generally is awful at string manipulation - the best you get is SOUNDEX and that just compares 4 letters of the first word (RTFM, it is actually QUITE interesting how it works, but it makes it absolutely unsuitable for anything like comparing sentences.
So, no - this is simply way outside the scope of anything in SQL, you will have to download the data and use an out of SQL approach (which is also a LOT more fit for this type of work).
You can obviously work around that with simplistic SQL such as #ASH was suggesting - but this is not looking for "similar sentences" but working around specific markers that ARE SPECIFIC FOR YOUR DATA SET. THis is overfitting and bypassing answering the question you have asked.
You can try SOUNDEX function. Google SOUNDEX and then understand if this works for your case. The query is:
SELECT *
FROM your_table
WHERE SOUNDEX(Sentence) = SOUNDEX(Sentence);
I know that continuous query is a query which is registered once and it is evaluated continuously over a data stream. But, I don't understand what does incremental query means. I am reading about continuous data streams and the way we query for a specific pattern in the stream.
Can anyone explain me - what is an incremental query? Explanation with an example will be really helpful
Although after googling a lot, I find some definitions, but none of them explains clearly.
UPDATE:
I don't find the exact paper now in which I found this term, but in this paper I can find it on page no. 6.
You might already have researched incremental algorithm, I think it is what you're looking for.
I have never heard of an 'incremental' query. However that sounds a lot like doctrine's schema update command here in symfony's doc
Food for thought until someone come up with a better answer :)
I'm on a project where I was asked to take a quick peek at some reporting SQL (in a SQL Server 2K5 environment) and was surprised at what I found: 4 to 5 levels of subquerys, distinct clauses, unions, and NoLock hints (which were needed because the SQL was running so long it was blocking standard processing) - all in the same set!.
Because I (foolishly :) mentioned that I thought the SQL was inefficient I've been labeled the "expert" and have been tasked with creating a test for a couple of interviewees to do that will assess their SQL optimizing abilities. I'm hoping someone can point me to some URLS, or maybe provide a list that I can use to help weed out the good from the bad.
Since you mentioned the SQL Server 2005 environment:
More SQL Server interview questions than you possibly could have imagined:
The classic set. Most interviewees will probably have studied these...maybe a good way to gauge who has prepared.
Another classic
Questions from one of the original Stack Overflow DBAs
Another link for best SQL questions and answers
I would give them a Query plan (via EXPLAIN, or whatever your flavor of SQL uses as a keyword) and see if they can decipher what it means, what the weak points are, and how to improve the query.
Look at MySQL's Explain Documentation for help using MySQL's explain and what it means.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
There was this old sql server tool called Lectoneth or something like that, you'd put sql queries in it, and it would rewrite it for you.
I think quest bought them out, but I can't find where to download a free copy of that software.
Really helps when you have no dba, and have lots of sql queries to rewrite.
Thanks
Craig
Doesn't ring a bell, and presumably you've seen, but nothing obvious on Quest's website
Perhaps a tool like Red Gate's SQL Prompt would help - the Pro edition does SQL reformatting.
Edit
Think i've found what you're looking for, mentioned here - LECCO SQL Expert. The link to the Lecco website does indeed direct to quest, but a 404.
LECCO SQL Expert is the only complete
SQL performance tuning and
optimization solution offering
problematic SQL detection and
automatic SQL rewrite. With its
built-in Artificial Intelligence (AI)
based Feedback Searching Engine, LECCO
SQL Expert reduces the effort required
to optimize SQL and makes even the
most junior programmer an expert.
Developers use LECCO SQL Expert to
optimize SQL during application
development. DBAs eliminate
problematic SQL before users
experience application performance
problems by using LECCO SQL Expert in
production systems.
Looks like it's no longer about - all mentions of I could find indicated it supported up to SQL 2000, and stale links - looks like it wasn't a free tool. As said in my comments, I think this kind of thing is a skill well worth possessing and would benefit in the long run to not relay on a tool to try and do it for you.
I wasn't aware of this tool before now, so I have picked up something from this question - got me intrigued!
Final Update:
To confirm, that product has indeed gone as Lecco was acquired some years ago now. Thanks to Brent Ozar for confirmation.
I think you're looking for a product that's been merged into Toad for SQL Server. The commercial version of Toad has a SQL Optimizer feature that tries lots of ways to rewrite your SQL statements, then tests them to find which ways are the fastest.
You can download Toad here:
http://www.toadsoft.com/
But be aware that that feature is a paid-version-only feature.
Well rather than spending your time looking for a magic bullet, why not spend some time learning performance tuning (you will need a book, this is too complex for the Internet generally). Plus it is my belief that if you want ot write decent new code, you need to understand performance in databases. There is no reason to be unable to write code that avvoids the most common problems.
First, rewrite every query to use ANSII syntax anytime you open it up to revise it for any other reason. Code review all SQl changes and do not pass the code review unless explicit joins were used.
Your first step in performance tuning to identify which queries and procs are causing the trouble. You can use tools that will tell you the worst performing queries in terms of overall time, but don't forget to tune the queries that are run frequnetly as well. Cutting seconds off a query that runs thousands of times a day can really speed things up. Also since you are in oprod already, likely your users are complaining about certain areas, those areas should be looked at first.
Things to look for that cause performance problems:
Cursors
Correlated subqueries
Views that call views
Lack of proper indexing
Functions (especially scalar function that make the query run row by row insted of through a set)
Where clauses that aren't sargeable
EAV tables
Returning more data than you need (If you have anything with select * and a join, immediately fix that.)
Reusing sps that act on one record to loop throuhg a large group of records
Badly designed autogenerated complex queries from ORMs
Incorrect data types resulting in the need to be continually be converting data in order to use it.
Since you have the old style syntax it is highly likely you have a lot of accidental cross joins
Use of distinct when it can be replaced with a derived table instead
Use of union when Union all would work
Bad table design that requires difficult construction of queries that can never perform well. If you find yourself frequently joining to the same table multiple times to get the data you need, then look at the design of the tables.
Also since you have used implicit joins you need to be aware that even in SQL Server 2000 the left and right implicit syntax does not work correctly. Sometimes this interprets as a cross join instead of a left join or right join. I would make it a priority to find and fix all of these queries immediately as they may currently be returning an incorrect result set. Bad data results are even worse that slow data returns.
Good luck.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I work in a group of about 25 developers. I'm responsible for coming up with the database design (tables, views, etc) and am called apon for performance tuning when necessary.
There are a couple of different applications that connect. Database access is via JDBC, hibernate, and iBatis SQL maps. Developers with various levels of experience write SQL statements.
What guidelines would you give to developers to write good SQL?
By good I mean: correct, performs well, easy to understand and maintain.
These are just meant to be easy to follow guidelines - I want to get people onto the right track for the majority of situations. We will break these guidelines when it makes sense.
EDIT: We have in place code reviews for all source commits (SQL, java, etc) enforced through a jira workflow.
If you have 25 developers writing SQL queries against your database you are in quite a bit of trouble. Guidelines are not worth much when your junior developers are learning SQL and checking in a mess.
I would like to offer 4 recommendations
Use an ORM of sorts so your all your devs write less SQL.
Invest in training, buy books, send people to courses.
Have all the SQL reviewed by the senior SQL developers, by all, I mean every SQL statement, no exceptions. This way your senior guys will be able to teach the juniors over time.
Have a single person, who lives and breaths Oracle, responsible for the database. By responsible I mean knows every query, understands all the structure and is able to give expert advice.
Here are some additional things you may add to your existing guidelines/checklist.
Have you tested your queries on a large data set? How was performance?
Have you performed a quick index review on the tables that are being accessed? Are all the right indexes in place? Do you recommend and new indexes?
For high volume queries, are any covering indexes required?
Are you using "NOT IN" in cases where a "LEFT JOIN" should be used?
Is your work transactionally sound? Are you missing a transaction somewhere?
Here's what I already have in my guidelines.
Work in sets, not row by row
The best way to make something go quicker is to avoid doing work you do not have to do
Databases love to join
Fully qualify and specify column names (so SQL does not break when additional columns are added)
Select only the data you need (never select *, never more rows than you require, never every column just becaues it's there)
How to use rownum to limit resultsets
Bind Variables vs Literals (use bind variables in all but a few special cases related to skewed data)
Avoid functions or calculations on columns in the WHERE clause (except for a special case of function based index)
Use ORDER BY for all queries returning more than one row (this is mostly for testability)
Each of these points is expanded a bit in the actual guidelines I've written out with an example relevant to our database schema.
Read Tom Kyte's books. He explains how you can write fast code and how you can measure performance and scalability. If you have a problem you can probably find the answer on the "ask tom"-site.
Introduce basic style guide that covers:
naming (of everything - tables, columns, procedures, aliases, ...) .
formatting style
line width
what reserved words require new line (e.g where)
are reserved word capitalized or small caps
indenting
...
Here are some examples:
Oracle PL/SQL Programming, Fourth Edition. There is older, 2nd edition - available online
SQL and PL/SQL Coding Standards
Be very strict about naming, it will be easier for you to read other people's code.
As formatting is concerned, there are tools available that can format automatically, so maybe you don't need very detailed description here.
If you are a database developer, you need to know what an EXECUTION PLAN is. If you don't then go mine coal or something.
Before developing:
first, you think what best EXECUTION PLAN will be,
second you create tables and indexes, and
third you use hints to persuade the optimizer to come out with the plan you made.
You do use hints. Forget automatic optimization, it's a marketing myth. No optimizer knows your data better than you and never will.
There are no "programmers who create queries" and "system administrators who create indexes". Programmers program, system administrators make backups (or whatever they make).
Triggers are evil.
Prefix you columns, tables and views (SELECT prs_name FROM t_person)
Make lines and indent
An hour long presentation on some Oracle fundamentals (eg parsing, SGA vs PGA). "Do this" rules may or may not apply to your situation. Give them an understanding of what the DB side does, and they at least have a basis on which to make a decision.
Plus Code reviews.
Pair-program. Any advantange it provides for agile development in general, at least doubles for SQL development.
Second choice, code reviews for all SQL.
Along with the recommendation to have queries reviewed by senior programmers, if you can get the buy-in, have code reviews which involve as many team members as possible.
I'm by no means a guru but here are my tips:
Don't use ORDER BY unless you really need an ordered list as it incurs a performance hit.
Understand the explain plan and also recognise that the plan on your development environment is often different from your production environment. Don't expect it to accurately reflect real life performance
The pros of using hints is that you get to choose your explain plan, the cons of using hints is that the optimal plan may change over time and you might be choosing a plan that is suboptimal in the long term
Make sure the developers know when to use INNER JOIN, OUTER JOIN, [NOT] IN, [NOT] EXISTS - you can put in place a lot of processes but one or two Cartesian products will bring production performance to its knees
Ensure your developers understand indexes - what they are, when they should be used, when they should be avoided
Have a DBA monitor the most executed queries and the most expensive queries and highlight these as candidates for optimisation
Peer review
Coding standards (especially code comments on particularly long/complex queries)
Unit testing
Don't write SQL if you can help it, use HQL (or JPQL is on Java EE) whenever possible
Don't use SELECT *
Pick your internet sources wisely (e.g. asktom.oracle.com)
Don't use cursors
Don't do string concatenation in SQL
Write queries such that they use indexes (fundamentally this means base WHERE predicates on the indexes that exist)
use MERGE instead of other awkward 'upsert' type logic
When working with dates, make sure you understand how they're stored in Oracle vs. how they are stored in Java, especially when it relates to TimeZone. Depending on the Calendar/Date types, this information can be stripped out, remapped to the TZ of the default locale, etc.
Most importantly: Don't use the excuse of being a developer for not knowing how to write good SQL, and how the database works. You don't have to be a DBA, but you need to invest in your own training to make yourself suitable for the task. By the same token, your company needs to invest in that as well.
I don't mean to say that these "Don'ts" always apply. It's just that, if you're talking about a developer who is not comfortable with Oracle, they need to know what they're doing before they start deciding whether those types of things are necessary and appropriate.