Re copy and re-order table - sql

I am using SQLite. I want to create a new table ordered differently than an existing tables's data. I have tried the following but it does not work in either the Mozzilla SQLite admin tools or SQLite Manager. What am I doing wrong?
INSERT INTO temp (SnippetID, LibraryID,Name, BeforeSelection, AfterSelection, ReplaceSelection, NewDocument, NewDocumentLang, Sort)
SELECT (SnippetID, LibraryID,Name, BeforeSelection, AfterSelection, ReplaceSelection, NewDocument, NewDocumentLang, Sort)
FROM Snippets ORDER BY LibraryID;
Thanks - JZ

My question to you is a simple "Why?". SQL is a relational algebra and data sets only have order when you specify it on extraction.
It makes no sense to talk about the order of your data since your database is free to impose whatever order it likes. Any decent database will not care one bit the order in which records are inserted, only the keys and constraints and other properties that can be used to efficiently store and retrieve the data.
Only when you extract data with a select ... order by does the order have to be fixed. If you want to be able to efficiently extract data ordered by your LibraryID column, simply index it and use an order by clause when extracting the data.

By default the table is "ordered" on the primary key, or the first column in the table, which is often the primary key anyways. when you do
select * from mytable
it uses this default "order". Everything Pax says is true though and I +1 to it.

Related

Insert a record at the particular row in a psql table [duplicate]

I am just wondering if I can maintain the order of insertion of data in SQL Server?
I am working on my own project and it is kind of blog site having a number of posts. I will save my posts to my SQL Server but I want them in the order of insertion.
Question: I understand that If I use auto-incrementing integer in SQL Server as the primary key, I can maintain the order of insertion. But I want to use "Guid" for primary key, instead of identity. Then "Guid" does not seem to maintain the order of insertion.
Should I use both auto-incrementing integer for order of insertion and Guid for identity?
Or Is there any other way to maintain the order of insertion with Guid set to primary key?
Related question: the reason why I want to maintain the order of insertion is that in that way, I don't have to use order by clause which causes extra sorting process.
But should not I trust the order of data returned from database without any ordering clauses like order by?
Should I always use some ordering clauses for ordering my data? Then what if I have a massive amount of data to order? Then how can I handle the situation?
Sorry for too many questions in a post but I believe they are all related.
You are misguided.
SQL tables represent unordered sets. If you want a result set in a particular order, then you need to use an ORDER BY clause in the query. The SQL optimizer might not use the ORDER BY, finding another way to return the results in order.
You can have an identity column that is not the primary key. But actually, you can have both an identity column and a guid column, with the former as the primary key and the latter as a unique key. Another solution is to have a CreatedAt datetime. You can use this for ordering . . . or even as a clustered index if you really wanted to.
When you specify ORDER BY in an INSERT...SELECT statement, SQL Server will assign IDENTITY values in the order specified. However, that does not mean rows are necessarily inserted in that order.
If you need rows returned in a specific order, you must use ORDER BY to guarantee ordering. Indexes can be leverage provide order data efficiently depending on the query particulars.

Create database model using columns names (no FK, no relation)

I have a database with no FKs, PKs and not any documentation to show what tables relates with each other. So i'm in big trouble to do a reverse engineering on the physical data to mind how does one table relates to another one. Does anyone knows a tool to create a model based only by the names of columns?
Example:
Table A have a column with name ID_NTP_BAZINGA that relates to table B with the column name ID_NTP_BAZINGA.
Hm, it is hard task to do (and I kind of have done it myself), but I can think of some hints to automate your work, for example commands like sp_msforeachdb or sp_msforeachtable might be handy.
To determine FK relations in a way you mentioned, you could use such query:
select * from (
select object_name(object_id) [table],
name,
count(*) over (partition by name) [cnt]
from [DB_name].sys.columns
) a where cnt > 1
which, in your case, would return (among others)
Table A | ID_NTP_BAZINGA
Table B | ID_NTP_BAZINGA
and give you already some insight.
For candidates for PK, one could use sp_msforeachtable with dynamic SQL checking if count(distinct *) is equal to count(*) - this would tell you whether you have unique values or not, and in the end you would count(*) with is not null filter in where clause, so you'll know if you have nulls in particular column.
These are some general hints and exact queries you have to write yourself. This answer will get you started.
This is data archaeology. It's ironic because one of the reasons for developing databases in the first place was to guarantee that data would be better documented than it had been in files and records. One of the best handles on the data model in this case is the application code.
Look at the SQL used by the application, especially at the queries. Retrieval is when data gets turned into information, and that's where you'll get your clues. Pay special attention to the ON and WHERE clauses. You may be able to figure out while columns were functioning as PKs and FKs, even if they weren't declared as such.

How to create a Primary Key on quasi-unique data keys?

I have a nightly SSIS process that exports a TON of data from an AS400 database system. Due to bugs in the AS400 DB software, ocassional duplicate keys are inserted into data tables. Every time a new duplicate is added to an AS400 table, it kills my nightly export process. This issue has moved from being a nuisance to a problem.
What I need is to have an option to insert only unique data. If there are duplicates, select the first encountered row of the duplicate rows. Is there SQL Syntax available that could help me do this? I know of the DISTINCT ROW clause but that doesn't work in my case because for most of the offending records, the entirety of the data is non-unique except for the fields which comprise the PK.
In my case, it is more important for my primary keys to remain unique in my SQL Server DB cache, rather than having a full snapshot of data. Is there something I can do to force this constraint on the export in SSIS/SQL Server with out crashing the process?
EDIT
Let me further clarify my request. What I need is to assure that the data in my exported SQL Server tables maintains the same keys that are maintained the AS400 data tables. In other words, creating a unique Row Count identifier wouldn't work, nor would inserting all of the data without a primary key.
If a bug in the AS400 software allows for mistaken, duplicate PKs, I want to either ignore those rows or, preferably, just select one of the rows with the duplicate key but not both of them.
This SELECT statement should probably happen from the SELECT statement in my SSIS project which connects to the mainframe through an ODBC connection.
I suspect that there may not be a "simple" solution to my problem. I'm hoping, however, that I'm wrong.
Since you are using SSIS, you must be using OLE DB Source to fetch the data from AS400 and you will be using OLE DB Destination to insert data into SQL Server.
Let's assume that you don't have any transformations
Add a Sort transformation after the OLE DB Source. In the Sort Transformation, there is a check box option at the bottom to remove duplicate rows based on a give set of column values. Check all the fields but don't select the Primary Key that comes from AS400. This will eliminate the duplicate rows but will insert the data that you still need.
I hope that is what you are looking for.
In SQL Server 2005 and above:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY almost_unique_field ORDER BY id) rn
FROM import_table
) q
WHERE rn = 1
There are several options.
If you use IGNORE_DUP_KEY (http://www.sqlservernation.com/home/creating-indexes-with-ignore_dup_key.html) option on your primary key, SQL will issue a warning and only the duplicate records will fail.
You can also group/roll-up your data but this can get very expensive. What I mean by that is:
SELECT Id, MAX(value1), MAX(value2), MAX(value3) etc
Another option is to add an identity column (and cluster on this for an efficient join later) to your staging table and then create a mapping in a temp table. The mapping table would be:
CREATE TABLE #mapping
(
RowID INT PRIMARY KEY CLUSTERED,
PKIN INT
)
INSERT INTO #mapping
SELECT PKID, MIN(rowid) FROM staging_table
GROUP BY PKID
INSERT INTO presentation_table
SELECT S.*
FROM Staging_table S
INNER JOIN #mapping M
ON S.RowID = M.RowID
If I understand you correctly, you have duplicated PKs that have different data in the other fields.
First, put the data from the other database into a staging table. I find it easier to research issues with imports (especially large ones) if I do this. Actually I use two staging tables (and for this case I strongly recommend it), one with the raw data and one with only the data I intend to import into my system.
Now you can use and Execute SQL task to grab the one of the records for each key (see #Quassnoi for an idea of how to do that you may need to adjust his query for your situation). Personally I put an identity into my staging table, so I can identify which is the first or last occurance of duplicated data. Then put the record you chose for each key into your second staging table. If you are using an exception table, copy the records you are not moving to it and don't forget a reason code for the exception ("Duplicated key" for instance).
Now that you have only one record per key in a staging table, your next task is to decide what to do about the other data that is not unique. If there are two different business addresses for the same customer, which do you chose? This is a matter of business rules definition not strictly speaking SSIS or SQL code. You must define the business rules for how you chose the data when the data needs to be merged between two records (what you are doing is the equivalent of a de-dupping process). If you are lucky there is a date field or other way to determine which is the newest or oldest data and that is the data they want you to use. In that case once you have selected just one record, you are done the intial transform.
More than likely though you may need different rules for each other field to choose the correct one. In this case you write SSIS transforms in a data flow or Exec SQl tasks to pick the correct data and update the staging table.
Once you have the exact record you want to import, then do the data flow to move to the correct production tables.

Set sort-order field based on alphabetically ordering of another field

I've recently added a couple of fields to some tables in my database (SQL Server 2005) to allow users to customize the sort order of the rows. I've followed this pattern for all of the tables:
-- Alter the InvoiceStatus table
ALTER TABLE [dbo].[InvoiceStatus] ADD [Disabled] bit NOT NULL DEFAULT 0
GO
ALTER TABLE [dbo].[InvoiceStatus] ADD [SortOrder] int NOT NULL DEFAULT 0
GO
-- Use the primary key as the default sort order
UPDATE [dbo].[InvoiceStatus]
SET [SortOrder] = [InvoiceStatusId]
GO
Normally, as you can see, I've used the primary key as the default sort order. Now I am however in the situation that I would like to use the alphabetical ordering of a text field in the table as the default sort order.
Using the above table as an example (which has a text field [InvoiceStatusName]), is there a similar nice and short query I could write to use the alphabetical ordering of [InvoiceStatusName] as the default sort order?
Update:
The question is already answered, but some has pointed out that this solution might not be ideal so I just want to add some context for future references. This is an old system (not legacy-old, but it has been around for quite some years) in use a handful of different places.
There are several lists/drop-downs in the application with your typical "status" type (such as invoice status, order status, customer type etc.). Back when the system was first written these were standard values in use every place (not meant to be changed in any way), but some users have started to request the ability to add new statuses, remove those no longer in use and specify a custom sort order (one status might be more frequently used, and it is thus nice to have it at the top of the list).
The easiest way I found to do this (without having to mess around with too much of the old code) was to add two new fields, Disabled and SortOrder, to all the relevant tables. The Disabled field is used to "hide" un-used types (cannot delete them because of referential integrity, and the value they hold does also need to be kept), and the SortOrder field is there so the users can specify their own custom sort order. Since all the relevant tables also share these same two columns, it was very easy to make a simple interface to handle the sorting (and disabling) in a generic way.
;WITH so AS
(
SELECT
SortOrder,
ROW_NUMBER() OVER (ORDER BY InvoiceStatusName) AS rn
FROM dbo.InvoiceStatus
)
UPDATE so SET SortOrder = rn
You could use the ROW_NUMBER() function to map the sorted name to an integer.
UPDATE dbo.InvoiceStatus
SET SortOrder = ivsn.Number
FROM dbo.InvoiceStatus ivs
INNER JOIN (
SELECT dbo.InvoiceStatusID
, [number] = ROW_NUMBER() OVER (ORDER BY InvoiceStatusName)
FROM dbo.InvoiceStatus
) ivsn ON ivsn.InvoiceStatusID = ivs.InvoiceStatusID
You should ask yourself though if this scheme is the best solution for your problem. The implementation as is doesn't scale well.
The way you have chosen to implement table sorting is unusual and apparently restricted to sorting by integers only.
Really I'd recommend you redesign your sorting sub-system in another way.
e.g Have a meta-data table somewhere that holds the name of the column that users want to sort by, and then use ORDER BY (columnname) on the relevant queries
edit: Oh wait, we're talking about lookup tables and letting users change the order. Now I've got my head around that your implementation makes a lot more sense to me.

How to avoid the same record inserted twice in MongoDB (using Mongoid) or ActiveRecord (Rails using MySQL)?

For example, if we are doing Analytics recording the page_type, item_id, date, pageviews, timeOnPage.
It seems that they are several ways to avoid it. Is there an automatic way?
create index on the fields that uniquely identify the record, for example [page_type, item_id, date] and make the index unique, so that when adding the same record, it will reject it.
or, make the above the primary index, which is unique, if the DB or framework supports it. In Rails, usually the ID 1, 2, 3, 4 is the primary index, though.
or, query the record using the [page_type, item_id, date], and then update that record if it already exists (or don't do anything if the pageviews and timeOnPage already has the same values). If record doesn't exist, then insert a new record with this data. But if need to query the record this way, looks like we need an index on these 3 fields anyways.
Insert new records all the time, but when query for the values, use something like
select * from analytics where ... order by created_at desc limit 1
that is, get the newest created record and ignore the rest. But this seems like a solution for 1 record but not so feasible when it is summing up values (doing aggregates), such as select sum(pageviews) or select count(*).
Is there also some automatic solution besides using the methods above?
Jian,
Your first option seems viable to me. And simplest way. Mongo supports this feature by default.
On insert it will check for the unique combination, if exists it will ignore the insert and write the "E11000 duplicate key error index" message in server log. Otherwise it will proceed with the normal insertion.
But it seems this will not work in the case of bulk insert. If any duplicate is there, entire batch will be failed. Quick googling shows up the existing mongo bug reporting jira ticket. Its still open.
I can't speak for Mongoid/MongoDB, but if you wish to enforce a uniqueness constraint in a relational database, you should create a uniqueness constraint. That's what they're there for! In MySQL, that is equivalent to a unique index; you could specify it as CONSTRAINT ... UNIQUE (col1, col2), but this will just create a unique index anyway.