how to not insert common data in a column in sqlite? - sql

I have a table named category and the table looks like
cat_id cat_name
1 Science
2 Arts
and another table named item which looks like
item_id item_name cat_id
1 physics 1
2 literature 2
3 chemistry 1
please mark that cat_id is the foreign key here of item table.
now I want that If I put math as item_name under Arts category then it will insert successfully but I want this to happen in such a way that if I want to put same data again then it wont insert. please mark also that I have cat_name and item_name only then my query fetches the category_id using cat_name from category table and inserts into the item table like this way
insert into item (item_name,cat_id) select 'math',category.cat_id from category where category.cat_name = 'Arts'
but if I run this query again it inserts the same item math again, but I want to stop this to happen, what should I do?

Reyjohn, can't you define the column as unique? That will prevent duplicate values.
I don't use sqlite but my impression is it uses similar syntax to MySQL, but without some of the stricter checks.
As per this answer:
sqlite - How to get INSERT OR IGNORE to work
You could use INSERT OR IGNORE statement.

SQLite doesn't have "INSERT OR UPDATE" which would be perfect for your case.
But you can emulate it with "INSERT OR REPLACE" - which is supported by SQLite - together with a "UNION". I developed this technique because I wanted to issue one single query and let the SQLite engine do everything.
INSERT OR REPLACE INTO item (item_id, item_name, cat_id)
SELECT item_id, item_name, cat_id FROM (
SELECT item_id, item_name, 'new_category' FROM item
WHERE item_name='math'
UNION
SELECT NULL, 'math', 'new_category'
) ORDER BY item_id DESC LIMIT 1;
So you are basically making one "SELECT WHERE item_name='math'" to get the existing record (if it exists) and then you concatenate this result (UNION) with a SELECT that generates a new record with a NULL item_id.
The trick then is the "ORDER BY item_id DESC LIMIT 1" at the end, where the actual result for the INSERT statement is exactly 1 record: the one that existed, with same item_id and a 'new_category' or a new one with a NULL item_id, which forces SQLite to create one for you.
You can also check if record exists before (with a SELECT) and decide if you need an INSERT or an UPDATE. But this leads to 2 queries and additional code on the client side. Thats why I developed the technique above.

Related

SQL unpivot & insert

Sorry for the lack of info -- SQL Server 2008.
I'm struggling to get a couple of column values from table A into a new row in table B for each row in A where a column isn't null.
Table A's structure is as:
UserID | ClientUserID | ClientSessionID | [and a load of other irrelevant columns)
Table B:
UserID | Name | Value
I want to create rows in table B for each non-null ClientUserID or ClientSessionID in A - using the column name as B's "Name", and column value as "B's Value".
I'm struggling to write my "unpivot" statement - just getting the syntax correct! I'm trying to follow along with some samples but can't
Here's my SQL query so far - any further help would be appreciated (just getting this SELECT is frustrating me, let alone doing the insert!)
SELECT UserID, ClientUserID, ClientSessionID FROM websiteuser WHERE ClientSessionID IS NOT null
This gives me the rows that I need to perform actions upon -- but I just can't get the syntax correct for UNPIVOTing this data and turning it into my insert.
You can unpivot records in this fashion by using UNION to get each new row:
INSERT INTO TableB (UserID, Name, Value)
SELECT UserID, 'ClientUserID' AS Name, ClientUserID AS Value
FROM TableA
WHERE ClientUserID IS NOT NULL
UNION ALL
SELECT UserID, 'ClientSessionID' AS Name, ClientSessionID AS Value
FROM TableA
WHERE ClientSessionID IS NOT NULL
I am using UNION ALL in this case as UNION implies a DISTINCT operation across the entire set, which should normally be unnecessary when pivoting unique records.
If your ClientUserID and ClientSessionID columns are not the same datatype, you may have to cast one or both to the same.

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.

Normalizing a table, from one to the other

I'm trying to normalize a mysql database....
I currently have a table that contains 11 columns for "categories". The first column is a user_id and the other 10 are category_id_1 - category_id_10. Some rows may only contain a category_id up to category_id_1 and the rest might be NULL.
I then have a table that has 2 columns, user_id and category_id...
What is the best way to transfer all of the data into separate rows in table 2 without adding a row for columns that are NULL in table 1?
thanks!
You can create a single query to do all the work, it just takes a bit of copy and pasting, and adjusting the column name:
INSERT INTO table2
SELECT * FROM (
SELECT user_id, category_id_1 AS category_id FROM table1
UNION ALL
SELECT user_id, category_id_2 FROM table1
UNION ALL
SELECT user_id, category_id_3 FROM table1
) AS T
WHERE category_id IS NOT NULL;
Since you only have to do this 10 times, and you can throw the code away when you are finished, I would think that this is the easiest way.
One table for users:
users(id, name, username, etc)
One for categories:
categories(id, category_name)
One to link the two, including any extra information you might want on that join.
categories_users(user_id, category_id)
-- or with extra information --
categories_users(user_id, category_id, date_created, notes)
To transfer the data across to the link table would be a case of writing a series of SQL INSERT statements. There's probably some awesome way to do it in one go, but since there's only 11 categories, just copy-and-paste IMO:
INSERT INTO categories_users
SELECT user_id, 1
FROM old_categories
WHERE category_1 IS NOT NULL

Approach to a Bin Packing sql problem

I have a problem in sql where I need to generate a packing list from a list of transactions.
Data Model
The transactions are stored in a table that contains:
transaction id
item id
item quantity
Each transaction can have multiple items (and coincidentally multiple rows with the same transaction id). Each item then has a quantity from 1 to N.
Business Problem
The business requires that we create a packing list, where each line item in the packing list contains the count of each item in the box.
Each box can only contain 160 items (they all happen to be the same size/weight). Based on the total count of the order we need to split items into different boxes (sometimes splitting even the individual item's collection into two boxes)
So the challenge is to take that data schema and come up with the result set that includes how many of each item belong in each box.
I am currently brute forcing this in some not so pretty ways and wondering if anyone has an elegant/simple solution that I've overlooked.
Example In/Out
We really need to isolate how many of each item end up in each box...for example:
Order 1:
100 of item A100 of item B140 of item C
This should result in three rows in the result set:
Box 1: A (100), B (60) Box 2: B(40), C (120) Box 3: C(20)
Ideally the query would be smart enough to put all of C together, but at this point - we're not too concerned with that.
How about something like
SELECT SUM([Item quantity]) as totalItems
, SUM([Item quantity]) / 160 as totalBoxes
, MOD(SUM([Item Quantity), 160) amountInLastBox
FROM [Transactions]
GROUP BY [Transaction Id]
Let me know what fields in the resultset you're looking for and I could come up with a better one
I was looking for something similar and all I could achieve was expanding the rows to the number of item counts in a transaction, and grouping them into bins. Not very elegant though.. Moreover, because string aggregation is still very cumbersome in SQL Server (Oracle, i miss you!), I have to leave the last part out. I mean putting the counts in one single row..
My solution is as follows:
Example transactions table:
INSERT INTO transactions
(trans_id, item, cnt) VALUES
('1','A','50'),
('2','A','140'),
('3','B','100'),
('4','C','80');
GO
Create a dummy sequence table, which contains numbers from 1 to 1000 (I assume that maximum number allowed for an item in a single transaction is 1000):
CREATE TABLE numseq (n INT NOT NULL IDENTITY) ;
GO
INSERT numseq DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 1000 INSERT numseq DEFAULT VALUES ;
GO
Now we can generate a temporary table from transactions table, in which each transaction and item exist "cnt" times in a subquery, and then give numbers to the bins using division, and group by bin number:
SELECT bin_nr, item, count(*) count_in_bin
INTO result
FROM (
SELECT t.item, ((row_number() over (order by t.item, s.n) - 1) / 160) + 1 as bin_nr
FROM transactions t
INNER JOIN numseq s
ON t.cnt >= s.n -- join conditionally to repeat transaction rows "cnt" times
) a
GROUP BY bin_id, item
ORDER BY bin_id, item
GO
Result is:
bin_id item count_in_bin
1 A 160
2 A 30
2 B 100
2 C 30
3 C 50
In Oracle, the last step would be as simple as that:
SELECT bin_id, WM_CONCAT(CONCAT(item,'(',count_in_bin,')')) contents
FROM result
GROUP BY bin_id
This isn't the prettiest answer but I am using a similar method to keep track of stock items through an order process, and it is easy to understand, and may lead to you developing a better method than I have.
I would create a table called "PackedItem" or something similar. The columns would be:
packed_item_id (int) - Primary Key, Identity column
trans_id (int)
item_id (int)
box_number (int)
Each record in this table represents 1 physical unit you will ship.
Lets say someone adds a line to transaction 4 with 20 of item 12, I would add 20 records to the PackedItem table, all with the transaction ID, the Item ID, and a NULL box number. If a line is updated, you need to add or remove records from the PackedItem table so that there is always a 1:1 correlation.
When the time comes to ship, you can simply
SELECT TOP 160 FROM PackedItem WHERE trans_id = 4 AND box_number IS NULL
and set the box_number on those records to the next available box number, until no records remain where the box_number is NULL. This is possible using one fairly complicated UPDATE statement inside a WHILE loop - which I don't have the time to construct fully.
You can now easily get your desired packing list by querying this table as follows:
SELECT box_number, item_id, COUNT(*) AS Qty
FROM PackedItem
WHERE trans_id = 4
GROUP BY box_number, item_id
Advantages - easy to understand, fairly easy to implement.
Pitfalls - if the table gets out of sync with the lines on the Transaction, the final result can be wrong; This table will get many records in it and will be extra work for the server. Will need each ID field to be indexed to keep performance good.

Insert results of subquery into table with a constant

The outline of the tables in question are as follows:
I have a table, lets call it join, that has two columns, both foreign keys to other tables. Let's call the two columns userid and buildingid so join looks like
+--------------+
| join |
|--------------|
|userid |
|buildingid |
+--------------+
I basically need to insert a bunch of rows into this table. Each user will be assigned to multiple buildings by having multiple entries in this table. So user 13 might be assigned to buildings 1, 2, and 3 by the following
13 1
13 2
13 3
I'm trying to figure out how to do this in a query if the building numbers are constant, that is, I'm assigning a group of people to the same buildings. Basically, (this is wrong) I want to do
insert into join (userid, buildingid) values ((select userid from users), 1)
Does that make sense? I've also tried using
select 1
The error I'm running into is that the subquery returns more than one result. I also attempted to create a join, basically with a static select query that was also unsuccessful.
Any thoughts?
Thanks,
Chris
Almost! When you want to insert to values of a query, don't try to put them in the values clause. insert can take a select as an argument for the values!
insert into join (userid, buildingid)
select userid, 1 from users
Also, in the spirit of learning more, you can create a table that doesn't exist by using the following syntax:
select userid, 1 as buildingid
into join
from users
That only works if the table doesn't exist, though, but it's a quick and dirty way to create table copies!