I have simple SSIS package where I import data from flat file into SQL Server table (SQL Server 005). File contains 70k rows and table has no primary key. Importing is sucessful but when I open SQL Server table the order of rows is different from the that of file. After observing closely I see that data in table is sorted by default by first column. Why this is happening? and how I can avoid default sort?
Thanks.
You cannot rely on ordering unless you specify order by in your SQL query. SQL is a relational algebra that works with sets. Those sets are unordered. Database tables do not have an intrinsic ordering.
It may well be that the sets are ordered due to the way the data is retrieved from the tables. This may be based on primary key, order of insertion, clustered key, seemingly random order based on the execution plan of the query or the actual data in the table or even the phase of the moon.
Bottom line, if you want a specific order, use order by. If you don't want a specific order, the DBMS is free to deliver your rows in any order, including one based on the first column.
If you really want them sorted depending on the position in the import file, you should add another column to the table to store an increasing number based on its position in that file. Then use order by using that column. But that's a pretty arbitrary sort order, you're generally better off choosing one that makes more sense to the data (transaction ID, date/time, customer number or whatever else you have).
If you want to avoid the default sort (however variable that may be), use a specific sort.
In general no order is applied if there is no ordering in the select query.
What I have noticed is that the table results might return in the order of the primary key, but this is not gaurenteed either.
So all in all, if you do not specify a ordering, no ordering can be assumed.
Related
I have the following table structure
id name
--------------
10991 Shoug
10990 Moneera
10989 Abc
10988 xyz
id is the primary key column, As you can see the id is in decreasing order (ie: select * from users returns the records in this order) because of the way the data was inserted.
How do I resort the table in the ascending order of the primary key permanently? Preferably with SQL alone?
I found this answer but its not working for me. I am using Postgres as the database.
How do I resort the table in the ascending order of the primary key permanently?
You are missing a fundamental concept about relational databases. In SQL, tables represent unordered sets. There is no "permanent" ordering. The only ordering is provided by the order by clause on a query.
Some databases support a concept called "clustering"/"clustered indexes". This means that the data on data pages is actually ordered according to some key. In these databases, even when using a table with a clustered index, you are still not guaranteed that the data is returned in any particular order. Unless you use ORDER BY.
Postgres does not support this functionality, so even this is not available.
Apparently not possible in Postgres.
Curious what your use case was for this. I was researching any downside to using UUID as PK in Postgres, which led me to your post. Since pg does not physically reorder the data according to PK, I concluded that only the size (vs traditional sequence) would be the difference. I'm more used to SQL Server creating a Cluster index and reordering data by default.
As your link in your OP suggests, you can force it to reorder, but it is not persistent. Postgres also has a CLUSTER command to do this. This does lock your table, though.
https://www.postgresql.org/docs/current/sql-cluster.html
https://dba.stackexchange.com/questions/38710/how-does-postgresql-physically-order-new-records-on-disk-after-a-cluster-on-pri
Suppose, if following rows are inserted in chronological order into a table:
row1, row2, row3, row4, ..., row1000, row1001.
After a while, we delete/remove the latest row1001.
As in this post: How to get Top 5 records in SqLite?
If the below command is run:
SELECT * FROM <table> LIMIT 1;
Will it assuredly provide the "row1000"?
If no, then is there any efficient way to get the latest row(s)
without traversing through all the rows? -- i.e. without using
combination of ORDER BY and DESC.
[Note: For now I am using "SQLite", but it will be interesting for me to know about SQL in general as well.]
You're misunderstanding how SQL works. You're thinking row-by-row which is wrong. SQL does not "traverse rows" as per your concern; it operates on data as "sets".
Others have pointed out that relational database cannot be assumed to have any particular ordering, so you must use ORDER BY to explicitly specify ordering.
However (not mentioned yet is that), in order to ensure it performs efficiently, you need to create an appropriate index.
Whether you have an index or not, the correct query is:
SELECT <cols>
FROM <table>
ORDER BY <sort-cols> [DESC] LIMIT <no-rows>
Note that if you don't have an index the database will load all data and probably sort in memory to find the TOP n.
If you do have the appropriate index, the database will use the best index available to retrieve the TOP n rows as efficiently as possible.
Note that the sqllite documentation is very clear on the matter. The section on ORDER BY explains that ordering is undefined. And nothing in the section on LIMIT contradicts this (it simply constrains the number of rows returned).
If a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined.
This behaviour is also consistent with the ANSI standard and all major SQL implementations. Note that any database vendor that guaranteed any kind of ordering would have to sacrifice performance to the detriment of queries trying to retrieve data but not caring about order. (Not good for business.)
As a side note, flawed assumptions about ordering is an easy mistake to make (similar to flawed assumptions about uninitialised local variables).
RDBMS implementations are very likely to make ordering appear consistent. They follow a certain algorithm for adding data, a certain algorithm for retrieving data. And as a result, their operations are highly repeatable (it's what we love (and hate) about computers). So things repeatably look the same.
Theoretical examples:
Inserting a row results in the row being added to the next available free space. So data appears sequential. But an update would have to move the row to a new location if it no longer fits.
The DB engine might retrieve data sequentially from clustered index pages and seem to use clustered index as the 'natural ordering' ... until one day a page-split puts one of the pages in a different location. * Or a new version of the DMBS might cache certain data for performance, and suddenly order changes.
Real-world example:
The MS SQL Server 6.5 implementation of GROUP BY had the side-effect of also sorting by the group-by columns. When MS (in version 7 or 2000) implemented some performance improvements, GROUP BY would by default, return data in a hashed order. Many people blamed MS for breaking their queries when in fact they had made false assumptions and failed to ORDER BY their results as needed.
This is why the only guarantee of a specific ordering is to use the ORDER BY clause.
No. Table records have no inherent order. So it is undefined which row(s) to get with a LIMIT clause without an ORDER BY.
SQLite in its current implemantation may return the latest inserted row, but even if that were the case you must not rely on it.
Give a table a datetime column or some sortkey, if record order is important for you.
In SQL, data is stored in tables unordered. What comes out first one day might not be the same the next.
ORDER BY, or some other specific selection criteria is required to guarantee the correct value.
I am new to NoSQL databases and I have changed my database schema from storing dates as a UTC timestamp string, to a UNIX timestamp (number), in hopes that I can create either a scan or a query expression to find the 1000 most recent items in the table. I have yet to find a simple snippet of code to accomplish this using the AWSDynamoDBQueryExpression class. Scan doesn't appear to have any sort mechanism but query might. Any ideas?
There is no ORDER BY functionality in DynamoDB. If you want to run a top N query you'll have to performa a scan and then order the results yourself.
Mark B is right that query results can be ordered by the sort key but that is only within the context of a query. Queries are inherently limited to a single partition key.
If your table is small then you can get away with creating a Global Secondary Index on the table in which the partition key can be an attribute that is the same for all items and then use the timestamp attribute as a sort key. But keep in mind that this will break down once your table gets bigger. And if you're doing that you might as well not be using Dynamo. You're better off with an relational database on RDS.
First you need to make sure the timestamp field is the sort key for your DynamoDB table (or the sort key for a Global Secondary Index on that table). Then you just need to run a query. From the documentation:
Query results are always sorted by the sort key value. If the data
type of the sort key is Number, the results are returned in numeric
order; otherwise, the results are returned in order of UTF-8 bytes. By
default, the sort order is ascending. To reverse the order, set the
ScanIndexForward parameter to false.
I was wondering if anyone knows if there is a way to sort the data that already resides in a database. That is, I want to sort what is there but NOT retrieve it in in query.
I am asking because I have a list of things in this database, that I would like to add to in future and would like to order it once I've added them.
So what I mean is, I would like to not have to download all the data; sort it; then put it back onto the database.
Thanks in advance.
If there is only one column in your table then this is fine and you can simply sort the data in that table.
But if there are more than one columns in your table then it would be dependent on the other columns as well(ie, you need to specify which column you are looking to sort.) Also if there is a primary key attached to the table then its not possible as primary key would be in ascending order by default. In that case you can only have it while selecting the data from the table.
(My suggestion is to sort the data while selecting your records from the table as that would be easy and will have less risk)
EDIT:
To make my point straight and clear, the best way to achieve what you are trying to simply use ORDER BY ASC or ORDER BY DESC while you are selecting the data from your table.
If you are creating a new table then you can create index organized table to ensure that the data is stored ordered by index.
I was wondering if anyone knows if there is a way to sort the data that already resides in a database.
Normal relational tables, called as heap-organized tables, store rows in any order i.e. unsorted. You sort the rows if required only when you fetch them, not when you store them. And the only way to guarantee the sorting while retrieving the rows is to use an ORDER BY clause.
This question is related to a previous question,
Grouping by or iterating through partitions in SQL
I need to support SQL Server 2005, 2008, 2008 R2, and cannot depend on the Enterprise or full version (must work in SQL Server Express).
What I am trying to do is create a user-defined computed column that is essentially either a row_number() or dense_rank() over a partition by clause. This needs to act like an index, in that whenever rows are added to the table, this user-defined column is automatically generated.
I looked over the following Microsoft link that explains how to create a column based on a function, http://msdn.microsoft.com/en-us/library/ms186755.aspx. It doesn't quite get there.
It may not be possible, especially without the full version of SQL Server. Any thoughts?
Main feature of partitioning is splitting single object into multiple relatively independent database objects. Partitioning allows:
switch partitions from one table to another immediately, because it becomes a meta data operation, rather than physical copy.
lock single partition instead of the whole table, thus giving several processes simultaneous independent access to each partition.
spread a single table over multiple db files, storing each partition in its own file
compress data and change some other properties for each partition individually.
It cannot be done by any other means.
So I think you are looking to a way to assign an ID to each combination of some key, which consists of multiple columns. There is a number of ways to do it, depending on the result you want to achieve:
Create an index on several columns. It may be clustered, it may be on primary key columns. It will aid searches and get you data sorted (in case of clustered index).
Create separate table with all your primary key columns and assign each combination a unique key (eg using IDENTITY property). Insert this key to your main table to use as "partition id". Insert values automatically using a trigger or a join.
Please note, that SQL Server can use only one column for partitioning.