What does this SELECT statement do? - sql

I am looking at some code and came across something extremely unfamiliar to me ; Google did not prove fruitful for results so I was wondering if anyone could explain what the following code does? It does not refer to any of my tables or databases, so I assume it's general code and I needn't provide my database layout? Thanks very much.
The code :
SELECT ROW_NUMBER() OVER (ORDER BY Object_ID) AS weeks
FROM SYS.OBJECTS

It will select the numbers from 1 to N where N is the number of rows in sys.objects. It will not guarantee a sort-order.
Probably this code is intended to provide all week numbers (omg!) under the assumption that there are at least 52 rows in sys.objects.
This code, however, will return more than 52 rows and the result is not guaranteed to be ordered. I recommend you get rid of this nastiness.
Edit: As an alternative, I'd choose to create the following table: CREATE TABLE Weeks (WeekNumber TINYINT NOT NULL PrimaryKey) and fill it appropriately. This will be even faster than selecting from sys.objects because this custom table will be smaller and correctly sorted.

The developer is using records in the system table sysobjects to get a list of sequential numbers by using the ROW_NUMBER() window function and aliasing the column as "Weeks". The number of rows in the sys.objects view will change based on the objects defined to the database so why someone would do this is beyond me...
If a simple list of sequential numbers is needed there are more predictable ways to get them.

Related

How can I create a temporary numbers table with SQL?

So I came upon a question where someone asked for a list of unused account numbers. The query I wrote for it works, but it is kind of hacky and relies on the existence of a table with more records than existing accounts:
WITH tmp
AS (SELECT Row_number()
OVER(
ORDER BY cusno) a
FROM custtable
fetch first 999999 rows only)
SELECT tmp.a
FROM tmp
WHERE a NOT IN (SELECT cusno
FROM custtable)
This works because customer numbers are reused and there are significantly more records than unique customer numbers. But, like I said, it feels hacky and I'd like to just generate a temporary table with 1 column and x records that are numbered 1 through x. I looked at some recursive solutions, but all of it looked way more involved than the solution I wound up using. Is there an easier way that doesn't rely on existing tables?
I think the simple answer is no. To be able to make a determination of absence, the platform needs to know the expected data set. You can either generate that as a temporary table or data set at runtime - using the method you've used (or a variation thereof) - or you can create a reference table once, and compare against it each time. I'd favour the latter - a table with a single column of integers won't put much of a dent in your disk space and it doesn't make sense to compute an identical result set over and over again.
Here's a really good article from Aaron Bertrand that deals with this very issue:
https://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
(Edit: The queries in that article are TSQL specific, but they should be easily adaptable to DB2 - and the underlying analysis is relevant regardless of platform)
If you search all unused account number you can do it :
with MaxNumber as
(
select max(cusno) MaxID from custtable
),
RecurceNumber (id) as
(
values 1
union all
select id + 1 from RecurceNumber cross join MaxNumber
where id<=MaxID
)
select f1.* from RecurceNumber f1 exception join custtable f2 on f1.id=f2.cusno

How do I generate a (dummy) column of arbitrary length with MonetDB?

I would like to run the equivalent of PostgreSQL's
SELECT * FROM GENERATE_SERIES(1, 10000000)
I've read this:
http://blog.jooq.org/2013/11/19/how-to-create-a-range-from-1-to-10-in-sql/
But most suggestions there don't really take an arbitrary length - the query depends on the length otherwise than by just replacing a number. Also, some suggestions do not apply in MonetDB. So, what's my best course of action (if any)?
Notes:
- I'm using a version from February 2013. Answers about more recent features are also welcome, but are exactly what I'm looking for.
- Assume the existing tables don't have enough lines; and do not assume that, say, a Cartesian product of the longest table with itself is sufficient (or alternatively, maybe that's too costly to perform).
Try with:
SELECT value
FROM sys.generate_series(initial_value, end_value, offset);
I have to report that the function is quite unstable on Jul2015 release as is causing the server process to crash. Hope you have better luck.
If you wants to generate an arbitrary numeric value you can use:
SELECT rand();
Forgive me; I've never worked with MonetDB before. But the documentation leads me to believe you can solve this with the ROW_NUMBER function and a pre-populated table like SYS.COLUMNS.
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS;
This falls into jooq.org's category of just taking random records from a “large enough” table.
PostgreSQL's generate_series function is elegant, but non-standard. It's absent in other mainstream engines like SQL Server, Oracle, and MySQL. Your version of MonetDB doesn't have it either.
MonetDB does have the ROW_NUMBER function, a close equivalent in standard SQL. It assigns a sequential integer to rows in a result set. It will output the correct values, but it needs some rows in your database already. A chicken and egg problem!
SYS.COLUMNS is a system metadata table that contains one row for every column in your database. Most "empty" relational databases still have hundreds of system columns that appear in tables like these.
If the first query produces more rows than you need, you can push it into a subquery and filter the intermediate result.
SELECT rownum
FROM (
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS
) AS tally
WHERE rownum >= 1 AND rownum <= 10;
But what if you need to generate more rows than you have in SYS.COLUMNS? Unfortunately, the shape of the query does depend on how many rows you want to generate.
A common workaround in the Microsoft SQL Server community would be to join SYS.COLUMNS to itself. This will produce an intermediate table containing the square of the number of rows in the table. In practice, it's probably more rows than you'll ever need.
With a self-join, the solution looks like this:
SELECT rownum
FROM (
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS AS a, SYS.COLUMNS AS b
) AS tally
WHERE rownum >= 1 AND rownum <= 100000;
Hopefully these queries are also relevant in MonetDB world!

SQL Select is very slow with CHARINDEX

I am using sql server and have a table with 2 columns
myId varchar(80)
cchunk varchar(max)
Basicly it stores large chunk of text so thats why I need it varchar(max).
My problem is when I do a query like this:
select *
from tbchunks
where
(CHARINDEX('mystring1',tbchunks.cchunk)< CHARINDEX('mystring2',tbchunks.cchunk))
AND
CHARINDEX('mystring2',tbchunks.cchunk) - CHARINDEX('mystring1',tbchunks.cchunk) <=10
It takes about 3 seconds to complete, and the table chunks is about only 500,000 records and data returned from the above query is anywhere between 0 to 800 max
I have unclustered index on myid column, it helped with making fast select count(*) but didnt help with the above query.
I tried using Fulltext but was slow. i tried spliting the text in cchunk into smaller parts and adding an id column that will connect all those splited chunks, but ended up with a table with 2 million records of splited chunks of text (i did that so i can add index) but still the query was even slower.
EDIT:
modified the table to include primary key (int)
created fultext catalog with "Accent Senstive=true"
created fulltext index on my tabe on column "cchunk"
ran the same above query and it ended up taking 22 seconds with is much slower
UPDATE
Thanks everyone for suggesting the FullText (#Aaron Bertrand thanks!), i converted my query to this
SELECT * FROM tbchunksAS FT_TBL INNER JOIN
CONTAINSTABLE(tbchunks, cchunk, '(mystring1 NEAR mystring2)') AS KEY_TBL
ON FT_TBL.cID = KEY_TBL.[KEY]
by the way the cID is the primary key i added later.
anyway i am getting borad results and i notice that the higher the RANK column that was returned the better the results. my question is when RANK starts to get accurate?
An index isn't going to help with CHARINDEX at all. An index on a particular column is only going to be able to quickly find rows where the value in the indexed field is exactly an indexed value. I'm actually quite surprised that query only takes 3 seconds given that it has to read every single row four times (or at the very least, twice).
Well as good the ideas that were presented here, no body manage to really solve my problem but rather provided helpful tips that lead me to the solution which i would love to share.
Using Full text really was the answer like many mentioned but i managed to use the Contains in combination with Near so it can totally replace my current sql query and provide an awesome speed.
CONTAINS(tbchunks, 'NEAR ((mystring1, mystring2), 3, TRUE)')

Fast Way To Estimate Rows By Criteria

I have seen a few posts detailing fast ways to "estimate" the number of rows in a given SQL table without using COUNT(*). However, none of them seem to really solve the problem if you need to estimate the number of rows which satisfy a given criteria. I am trying to get a way of estimating the number of rows which satisfy a given criteria, but the information for these criteria is scattered around two or three tables. Of course a SELECT COUNT(*) with the NOLOCK hint and a few joins will do, and I can afford under- or over-estimating the total records. The probem is that this kind of query will be running every 5-10 minutes or so, and since I don't need the actual number-only an estimate-I would like to trade-off accuracy for speed.
The solution, if any, may be "SQL Server"-specific. In fact, it must be compatible with SQL Server 2005. Any hints?
There is no easy way to do this. You can get an estimate for the total number of rows in a table, e.g. from system catalog views.
But there's no way to do this for a given set of criteria in a WHERE clause - either you would have to keep counts for each set of criteria and the values, or you'd have to use black magic to find that out. The only place that SQL Server keeps something that would go into that direction is the statistics it keeps on the indices. Those will have certain information about what kind of values occur how frequently in an index - but I quite honestly don't have any idea if (and how) you could leverage the information in the statistics in your own queries......
If you really must know the number of rows matching a certain criteria, you need to do a count of some sort - either a SELECT COUNT(*) FROM dbo.YourTable WHERE (yourcriteria) or something else.
Something else could be something like this:
wrap your SELECT statement into a CTE (Common Table Expression)
define a ROW_NUMBER() in that CTE ordering your data by some column (or set of columns)
add a second ROW_NUMBER() to that CTE that orders your data by the same column (or columns) - but in the opposite direction (DESC vs. ASC)
Something like this:
;WITH YourDataCTE AS
(
SELECT (list of columns you need),
ROW_NUMBER() OVER(ORDER BY <your column>) AS 'RowNum',
ROW_NUMBER() OVER(ORDER BY <your column> DESC) AS 'RowNum2'
FROM
dbo.YourTable
WHERE
<your conditions here>
)
SELECT *
FROM YourDataCTE
Doing this, you would get the following effect:
your first row in your result set will contain your usual data columns
the first ROW_NUMBER() will contain the value 1
the second ROW_NUMBER() will contain the total number of row that match that criteria set
It's surprisingly good at dealing with small to mid-size result sets - I haven't tried yet how it'll hold up with really large result sets - but it might be something to investigate and see if it works.
Possible solutions:
If the count number is big in comparison to the total number of rows in the table, then adding indexes that cover where condition will help and the query will be very fast.
If the result number is close to the total number of rows in the table, indexes will not help much. You could implement a trigger that would maintain a 'conditional count table'. So whenever row matching condition added you would increment the value in the table, and when row is deleted you would decrement the value. So you will query this small 'summary count table'.

MySQL - Selecting data from multiple tables all with same structure but different data

Ok, here is my dilemma I have a database set up with about 5 tables all with the exact same data structure. The data is separated in this manner for localization purposes and to split up a total of about 4.5 million records.
A majority of the time only one table is needed and all is well. However, sometimes data is needed from 2 or more of the tables and it needs to be sorted by a user defined column. This is where I am having problems.
data columns:
id, band_name, song_name, album_name, genre
MySQL statment:
SELECT * from us_music, de_music where `genre` = 'punk'
MySQL spits out this error:
#1052 - Column 'genre' in where clause is ambiguous
Obviously, I am doing this wrong. Anyone care to shed some light on this for me?
I think you're looking for the UNION clause, a la
(SELECT * from us_music where `genre` = 'punk')
UNION
(SELECT * from de_music where `genre` = 'punk')
It sounds like you'd be happer with a single table. The five having the same schema, and sometimes needing to be presented as if they came from one table point to putting it all in one table.
Add a new column which can be used to distinguish among the five languages (I'm assuming it's language that is different among the tables since you said it was for localization). Don't worry about having 4.5 million records. Any real database can handle that size no problem. Add the correct indexes, and you'll have no trouble dealing with them as a single table.
Any of the above answers are valid, or an alternative way is to expand the table name to include the database name as well - eg:
SELECT * from us_music, de_music where `us_music.genre` = 'punk' AND `de_music.genre` = 'punk'
The column is ambiguous because it appears in both tables you would need to specify the where (or sort) field fully such as us_music.genre or de_music.genre but you'd usually specify two tables if you were then going to join them together in some fashion. The structure your dealing with is occasionally referred to as a partitioned table although it's usually done to separate the dataset into distinct files as well rather than to just split the dataset arbitrarily. If you're in charge of the database structure and there's no good reason to partition the data then I'd build one big table with an extra "origin" field that contains a country code but you're probably doing it for legitimate performance reason.
Either use a union to join the tables you're interested in http://dev.mysql.com/doc/refman/5.0/en/union.html or by using the Merge database engine http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html.
Your original attempt to span both tables creates an implicit JOIN. This is frowned upon by most experienced SQL programmers because it separates the tables to be combined with the condition of how.
The UNION is a good solution for the tables as they are, but there should be no reason they can't be put into the one table with decent indexing. I've seen adding the correct index to a large table increase query speed by three orders of magnitude.
The union statement cause a deal time in huge data. It is good to perform the select in 2 steps:
select the id
then select the main table with it