New table or using commas? - sql

I want to learn your suggestions about that:
Using commas for product's categories
OR Creating a 3rd table.
Which one is more useful?
Thanks alot for your suggestions.

Creating a 3rd table is definitely the better option.
Using comma-delimited values is only going to cause you problems later down the line. It will also make writing SQL queries a lot more complex than they need, and the performance will be costed for this.

Related

How to create random SQLqueries based on database?

For testing purposes I require a large amount of queries.
Creating this manually is not an option, so I am searching a tool which will do this automatically.
Sadly, the only solution I found (sqlsmith), is limited to postgres and SQLite.
Are there any similar tools for SQL-Server?
"I do not know from what random place people will want to travel to a random other place, so instead, let's create roads for every possible combination of origin and destination".
That sounds kind of insane, doesn't it? The same applies to what you seem to be wanting to achieve. You basically are hoping to find a tool that generates random queries against your database so you can feed them to the tuning advisor, which will then suggest query optimization indexes for hypothetical queries.
If you want to performance tune your database, you should have a pretty good idea of the type of questions your users will be throwing at it, as well as the structure of your data. Typical questions that will help you get started would be things like:
What is the most common search my users would do against this table?
What criteria are they most likely to use?
Which columns are guaranteed or likely to contain unique data in every row?
Which columns will most likely have a low selectivity of data? (I.e. Male/Female)
are you looking for generate random data for multiple tables ? we generally use redgate data genearator tool for the same.
for SQL tuning purpose I would suggest
https://www.brentozar.com/blitzindex/
http://www.nguyenlamminhdieu.com/zone/213/news/vi-VN/zone/213/news/351-database-engine-tuning-advisor-in-sql-server.aspx

Does Table-Valued Function (SQL) create table on each call? [performance]

Okay this might sound a noob question, but SQL isn't my really strength, so I am requesting some help here.
I am trying to implement something, but I am concerned about performance issues.
The problem I am trying to fix is something like this:
I have a column with a lot of data separated by commas ","
Something like this: data1,data2,data3,data57
What I need is looping through each piece of data separated by commas for all the records, and then do something with that single piece data, do you get it?
I found a solution that can actually help me, but I am worried about system performance, because I might need to make multiple calls to this function using different parameters!
Does a table is created on each call I made to the Table-Valued Function (UDF) or does the sql server saves it as cache? [maybe I would rather need a temporary table?]
Thank you for your help in advance!
Note: The data is not mine, and I should use it as is, so suggesting to change the database is out of question (however I know that would be the best scenario).
a
Note2: The purpose of this question/problem is to import initial data to the database, performance may not be a serious problem since it won't run many times, but still I wanna regard that issue, and do it the best way I can!
User defined, table-valued functions that are composed of multiple statements, as the one you found is, will create an object in the tempdb system database, populate it and then dispose of it when the object goes out of scope.
If you want to run this multiple times over the same parameters, you might consider creating a table variable and caching the result in that yourself. If you're going to be calling it on different lists on comma-separated values though, there's not a great way of avoiding the overhead. SQL Server isn't really built for lots of string manipulation.
Generally, for one-off jobs, the performance implications of this tempdb usage is not going to be a major concern for you. It's more concerning when it's a common pattern in the day-to-day of the database life.
I'd suggest trying, if you can, on a suitably sized subset of the data to gauge the performance of your solution.
Since you say you're on SQL Server 2016, you can make use of the new STRING_SPLIT function, something like
SELECT t.Column1, t.Column2, s.value
FROM table t
CROSS APPLY STRING_SPLIT(t.CsvColumn, ',') s
May get you close to where you want, without the need to define a new function. Note, your database needs to be running under the 2016 compatibility level (130) for this to be available, simply running on SQL 2016 isn't enough (they often do this with new features to avoid the risk of backwards-compatibility-breaking changes).

Managing very large SQL queries

I'm looking for some ideas managing very large SQL queries in Oracle.
My employer is looking to build very wide reports ( 150 - 200 ) columns of data per report.
Each item is a sub-query or an element from a view. The data has to be real time, so DW style batch processing is not an option. We also don't use any BI tools , just a java app that generates Excel ( its a requirement to output data in Excel)
The query also contains unions as feeds from other systems.
The queries result in very large SQL ( about 1500 lines) that is very difficult to manage.
What strategies can I employ to make the work more manageable?
It is also not a performance problem. I was able to optimize the query to be very efficient , its mostly width of the query , managing 200 columns is a challenge in itself.
I deal with queries this length daily and here is some of what helps me out in manitaining them:
First alias every single one of the those columns. When you are building it you may know where each one came from but when it is time to make a change, it is really helpful to know exactly where each column came from. This applies to join conditions, group by and where conditions as well as the select columns.
Organize in easily understandable and testable chunks. I use temp tables to pull things that make sense together and so I can see the results before the final query while in test mode.
This brings me to test mode. If I have chunks of data, I design the proc with a test mode and then query individual temp tables when in test mode, so I can see where the data went wrong if there is a bug. Not sure how Oracle works but in SQL Server, I make this the last parameter and give it a default value, so that it doesn't need to be passed in by the application.
Consider logging the execution details and the values of passed in parameters and certainly log any error messages. This will help tremendously when you have to troubleshoot why this report that has functioned perfectly for six years doesn't work for this one user.
Put columns on a separate line for each one and do the same for where clauses. At times you may have to troublshoot by commenting out joins until you find the one that is causing the problem. It is easier if you can easily comment out the associated fields as well.
If you don't have a technical design document, then at least use comments to explain your thought process. You want to understand the whys not the hows in any comments. This stuff is hard to come back to later and understand even when you wrote it. Give your future self some help.
In developing from scratch, I put the select list in and then comment all but the first item. Then I build the query only until I get that value - testing until I am sure what I got was correct. Then I add the next one and whatever joins or where conditions I might need to get it. Test again making sure it is right. (Oops why did that go from 1000 records to 20000 when I added that? Hmm maybe there is something I need to handle there or is that right?) By adding only one thing at a time, you will find an error in the logic much faster and be much more confident of your results. It will also take you less time than trying to build a massive query in one go.
Finally, there is no substitute for understanding your data. There are plently of complex queries that work but do not give the correct answer. Know if you need an inner join or a left join. Know what where conditions you need to get the records you want. Know how to handle the records when you have a one-to-many relationship (this may require push back on the requirements); should you have 3 lines (one for each child record), or should you put that data in a comma delimited list or should you pick only one of the many records and have one line using aggregation. If the latter, what is the criteria for choosing the record you want to keep?
Without seeing the specifics of your problem, here are a couple of ideas that immediately come to mind:
If you are looking purely for management, I might suggest organizing your subqueries as a number of views and then referencing those views in your final query.
For performance on the other hand you may want to consider creating temp tables or even materialized views (which are fixed views) to break up the heavier parts of your process.
If your queries require an enormous amount of subquerying in order to gain usable data, you might need to rethink your database design and possibly create a number of datamarts to easily access reporting data. Think of these as mini-warehouses sans the multi-year trended data.
Finally, I know you said you don't use any BI tools but this problem certainly seems like one that might make sense by organizing your data into "cubes" or Business Object "universes". It might be worthwhile to at least entertain the cost of bringing on a BI tool vs. the programming hours to support the current setup.

Prevent use of pre ANSI-92 old syntax

I wonder if there's a way to prevent the creation of objects that contain old ansi sintax of join, maybe server triggers, can anyone help me?
You can create a DDL trigger and mine the eventdata() XML for the content of the proc. If you can detect the old syntax using some fancy string-parsing functions (maybe looking for commas between known table names or looking for *= or =*), then you can roll back the creation of the proc or function.
First reaction - code reviews and a decent QA process!
I've had some success looking at sys.syscomments.text. A simple where text like '%*=%' should do. Be aware that long SQL strings may be split across multiple rows. I realise this won't prevent objects getting in there in the first place. But then DDL triggers won't tell you how big your current problem is.
Although I fully understand your effort, I believe that this type of actions is the wrong way of getting where you want. First of all, you might get into serious trouble with your boss and, depending of where you work, get fired.
Second, as stated before, doing code reviews, explaining why the old syntax sucks. You have to have a decent reason why one should avoid the *= stuff. 'Because you don't like it' is not a feasible argument. In fact, there are quite some articles around showing that certain problems are just not solvable using this type of syntax.
Third, you might want to point out that separating conditions into grouping (JOIN ... ON...) and filtering conditions (WHERE...) increases the readability and might therefore be an options.
Collect your arguments and convince your colleagues rather than punishing them in quite an arrogant way.

How do I organize my queries in Oracle?

I have a base query made (Thanks Justin Cave!)
Now I have to use that query and join it to different tables and sub queries many times over to do checks against our data. Additional queries are likely to be added in the future. So in the end there will be maybe two dozen checks for the data and the findings will be summarized in an SSRS report. If this were in MSSQL, I would put the results of the queries into a temp table and finally run a select on the temp table. Doing as much research as possible I've decided that the best way would be to use the WITH clause and joining with the other temp tables and queries to get results, then Unionize all of the queries together to get my result. However this seems like it is going to be extremely messy and large. I'd use Global Temporary Tables, but they seem to be frowned upon in Oracle. Perhaps you have a better method for modularizing and organizing this?
Per our licensing agreement we are not able to add new tables in oracle (so I am told), but we are able to add view, stored procedures and functions.
Thanks in advance!
If materialized views are not forbidden to use, you can use them to get all the advantages of a temp table.
But unless you need results of a sub-query in several different queries, you can just use as many independent sub-queries per query as you like, and operate on them as if these were tables. Most of the time you'll have pretty decent query plans.
Also, in my eyes, using a global temp table to speed up analysis 10x is worth it — as long as you don't expose sensitive data to someone not trusted.
Roll them all up into various stored procedures and enclose them in Oracle packages.
Then you can have a package for each logic area of your application. E.g. PKG_USERS, PKG_ACCOUNTS, etc.
It is also easier to track changes because you can put these under version control and see all changes at a glance.
It works for me, hopefully it helps you...