Trying to do a SQL Merge statement, but it errors - sql

I've got a stored procedure that accepts batches of product info. For each product, either insert it into the DB or Update it, if it already exists.
A product is defined by a composite key -> ProductCompanyId (where the product came from) and a ProductId (the unique Id, per company).
I'm trying to do a MERGE query. It works perfectly, until the batch has the same composite key in it more than once.
For example, lets imagine I have 10 products. 2 of these have the same comp-keys but different prices.
I thought the first 'row' would be inserted, with the 2nd row being an update.
Here is some FULL REPO sql code to show you what I've done.
It's far too hard to try and make sure there's only unique composite key's per batch. So, is there anything I can do?
I'm using SQL Server 2012 .. so I'm not sure if that's an issue.

Instead of using #MergeData directly as source of your MERGE statement, you could (should) rewrite it to use subquery or CTE that will filter duplicate rows and choose the correct one to be used in MERGE
WITH CTE_MergeData AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY ProductCompanyId, ProductId ORDER BY ProductId DESC) RN --find some better order
FROM #MergeData
)
MERGE INTO #SomeTable T
USING (SELECT * FROM CTE_MergeData WHERE RN = 1) S
ON T.ProductCompanyId = S.ProductCompanyId AND T.ProductId = S.ProductId
WHEN MATCHED THEN
UPDATE
SET T.Name = S.Name,
T.Price = S.Price
WHEN NOT MATCHED THEN
INSERT (ProductCompanyId, ProductId, Name, Price)
VALUES (S.ProductCompanyId, S.ProductId, S.Name, S.Price)
OUTPUT S.ProductCompanyId, S.ProductId, Inserted.Id INTO #MergeProductsResults;

Related

store select result in variable and use in where condition

A have 3 SQL queries that I put together using union.
The first 2 queries return unique Order ID's but the third query repeats Order ID's from the first 2.I need to exclude those results in query 3 .
Example:
QUERY 1:
SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
WHEN TYPE=A
query 1 sample data {12121212,13131313}
QUERY 2:
SELECT DISTINCT
ORDER_ID,
PRODUCT,
FROM
ORDERS
WHEN CATEGORY=X
Query 2 sample data {14141414,15151515}
QUERY 3:
SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
WHEN TYPE=C
Query 3 sample data {17171717,12121212,14141414}
So query 3 repeats data from query 1 and 2. The real data is much larger. What I am trying to is to use the results of the first 2 queries to be excluded from query3
query1
union
query2
union
SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
WHEN TYPE=C
AND
ORDER_ID NOT IN (Variable1=ORDER_ID IN QUERY1,Variable2=ORDER_ID IN QUERY2)
Desired data {12121212,13131313,14141414,15151515,16161616,17171717}
How can I store variables to use in the where condition ?
Appreciate any help
It seems (but it is really hard to guess with any certainty) that you maybe want to do hierarchical data gathering, whether data is selected from 1st available source (from prioritized list), that has it, and next sources would get ignored.
Best of all I think would be making a persistent table having it all:
ORDER_ID, PRODUCT, Source_Priority
Such a table would track up-to-date data from all the tables (for example by AFTER UPADTE OR INSERT OR DELETE trigger on ORDERS and other sources) and then your query would be reformulated as "query new table, get rows with maximum-per-ORDER_ID-value-of-Source_Priority", which was asked many times, and is solved by one JOIN in Firebird 2 or by Window Functions in Firebird 3.
If you do not want to have a dedicated persistent prioritized table, You can do it, using
Global Temporary Table to do the sorting
https://firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-tbl.html#fblangref25-ddl-tbl-gtt
MERGE command to do conditional inserts (clearing the table before it)
https://firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-dml-merge.html
SELECT to read the data from prepared GTT
If you want to pretend that would be one query, not 2+N, then you can hide them inside
a named Stored Procedure or an anonymous EXECUTE BLOCK https://firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-dml-execblock.html
Something like that:
Create GLOBAL TEMPORARY TABLE TMP_ORDERS
( ORDER_ID integer PRIMARY KEY,
PRODUCT VarChar(10) );
And then replace your "one true query" with this sequence (maybe hidden inside the Execution Block if need be)
DELETE FROM TMP_ORDERS;
INSERT INTO TMP_ORDERS
SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
WHEN TYPE=A;
MERGE INTO TMP_ORDERS as Dest
USING (
SELECT DISTINCT
ORDER_ID,
PRODUCT,
FROM
ORDERS
WHEN CATEGORY=X
) as Src
WHEN NOT MATCHED THEN
INSERT(Dest.ORDER_ID, Dest.PRODUCT) VALUES(Src.ORDER_ID, Src.PRODUCT);
MERGE INTO TMP_ORDERS as Dest
USING (
SELECT DISTINCT
ORDER_ID,
PRODUCT,
FROM
ORDERS
WHEN TYPE=C) as Src
WHEN NOT MATCHED THEN
INSERT(Dest.ORDER_ID, Dest.PRODUCT) VALUES(Src.ORDER_ID, Src.PRODUCT);
SELECT * FROM TMP_ORDERS;
P.S. but why did your partial queries even contained DISTINCT ? Isn't ORDER_ID already your primary key there, which can never be non-distinct, to start with?
you can try this
select * from
(SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
where TYPE='A') as q1,
(SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
WHERE CATEGORY='X'
) as q2,
(SELECT DISTINCT
ORDER_ID,
PRODUCT
FROM
ORDERS
where TYPE='C') as q3
WHERE q1.order_id=q2.order_id

How to make all rows in a table identical which the exception of 1 field?

I am trying to make it so all the users have the same items because I am doing an experiment with my app and need the experimental control of flattened data.
I used the following SQL statement in my last attempt:
insert into has (email,id,qty,price,item_info,image)
select 'b#localhost.com',id,qty,price,item_info,image
from
(
select * from has
where email != 'b#localhost.com'
) as o
where o.id not in
(
select id from has
where email = 'b#localhost.com'
);
This should add all items which 'b#localhost.com' does not already have but other users do have, to 'b#localhost.com's inventory. (the 'has' table)
However, I get the following error:
The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index
I understand what this error means, but I do not understand why it is occurring. My statement inserts all records which that email doesn't already have, so there should not be any duplicate id/email pairs.
The database structure is shown below, circles are attributes, squares are tables, and diamonds are relationship sets. The HAS table is a relationship set on USER_ACCOUNT and ITEM, where the primary keys are email and id respectively.
Please try the following...
INSERT INTO has ( email,
id,
qty,
price,
item_info,
image )
SELECT a.email,
b.id,
a.qty,
a.price,
a.item_info,
a.image
FROM has a
CROSS JOIN has b
JOIN
(
SELECT email,
id
FROM has
) c ON a.email = c.email AND
b.id <> c.id;
The CROSS JOIN appends a copy of each row of has to each row of has. Please see http://www.w3resource.com/sql/joins/cross-join.php for more information on CROSS JOIN's.
Each row of the resulting dataset will have two id fields, two email fields, two qty fields, etc. By giving each instance of has an alias we create the fields a.id, b.id, a.email, etc.
We then compare each combination of "old" email address and "new" id to a list of existing email / id combinations and insert the old values with the new id replacing the old one into has
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
http://www.w3resource.com/sql/joins/cross-join.php for more information on CROSS JOIN's
https://www.w3schools.com/sql/sql_in.asp for more information on WHERE's IN operator
https://www.w3schools.com/sql/sql_groupby.asp for more information on GROUP BY
I think the issue here is not that the code is trying to insert something which already exists, but that it's trying to insert more than one thing with the same PK. In lieu of an response to my comment, here is one way to get around the issue:
INSERT INTO has (email,id,qty,price,item_info,image)
SELECT
'b#localhost.com',
source.id,
source.qty,
source.price,
source.item_info,
source.image
FROM
(
SELECT email, id, qyt, price, item_info, image FROM has
) as source
JOIN
(
SELECT min(email) as min_email, id FROM has GROUP BY by id)
) as filter ON
filter.min_email = source.email
WHERE
source.id not in
(
SELECT id from has WHERE email = 'b#localhost.com'
);
The key difference from your original code is my extra join to the subquery I've aliased as filter. This limits you to inserting the has details from a single email per id. There are other ways to do the same, but I figured that this would be a safe bet for being supported by Derby.
I removed the WHERE clause from the source sub-query as that is handled by the final WHERE.

How to modify query to walk entire table rather than a single

I wrote several SQL queries and executed them against my table. Each individual query worked. I kept adding functionality until I got a really ugly working query. The problem is that I have to manually change a value every time I want to use it. Can you assist in making this query automatic rather than “manual”?
I am working with DB2.
Table below shows customers (cid) from 1 to 3. 'club' is a book seller, and 'qnty' is the number of books the customer bought from each 'club'. The full table has 45 customers.
Image below shows all the table elements for the first 3 users (cid=1 OR cid=2 OR cid=3). The final purpose of all my queries (once combined) is it to find the single 'club' with the largest 'qnty' for each 'cid'. So for 'cid =1' the 'club' is Readers Digest with 'qnty' of 3. For 'cid=2' the 'club' is YRB Gold with 'qnty' of 5. On and on until cid 45 is reached.
To give you a background on what I did here are my queries:
(Query 1-starting point for cid=1)
SELECT * FROM yrb_purchase WHERE cid=1
(Query 2 - find the 'club' with the highest 'qnty' for cid=1)
SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC
(Query 3 – combine the record from the above query with it’s cid)
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club
(Query 4) make sure there is only one record for cid=1
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club FETCH FIRST ROWS ONLY
To get the 'club' with the highest 'qnty' for customer 2, I would simply change the text cid=1 to cid=2 in the last query above. My query seems to always produce the correct results. My question is, how do I modify my query to get the results for all 'cid's from 1 to 45 in a single table? How do I get a table with all the cid values along with the club which sold that cid the most books, and how many books were sold within one tablei? Please keep in mind I am hoping you can modify my query as opposed to you providing a better query.
If you decide that my query is way too ugly (I agree with you) and choose to provide another query, please be aware that I just started learning SQL and may not be able to understand your query. You should be aware that I already asked this question: For common elements, how to find the value based on two columns? SQL but I was not able to make the answer work (due to my SQL limitations - not because the answer wasn't good); and in the absence of a working answer I could not reverse engineer it to understand how it works.
Thanks in advance
****************************EDIT #1*******************************************
The results of the answer is:
You could use OLAP/Window Functions to achieve this:
SELECT
cid,
club,
qnty
FROM
(
SELECT
cid,
club,
qnty,
ROW_NUMBER() OVER (PARTITION BY cid order by qnty desc) as cid_club_rank
FROM
(
SELECT
cid,
club,
sum(qnty) as qnty
FROM yrb_purchase
GROUP BY cid, club
) as sub1
) as sub2
WHERE cid_club_rank = 1
The inner most statement (sub1) just grabs a total quantity for each cid/club combination. The second inner most statement (sub2) creates a row_number for each cid/club combination ordering by the quantity (top down). Then the outer most query chooses only records where that row_number() is 1.

How can i SUM records from a table to another after multiplying two columns

I have a table called orderItems which has two columns, quantity and unit price.It also has a foreign key ordernumber in that very table.
I have another table called ordergroup with primary key ordernumber, which contains SavedTotal column which is the order total based on quantity * unit price for all order item rows that reference that ordernumber.
Now what i struggle with is the sql query that can get all order items based on a certain ordernumber and calculate the total cost.
I have managed to do the multiplication but i am missing the SUM, here is my sql query(based on SQL Server) so far.
UPDATE OrderGroupNew
set OrderGroupNew.SavedTotal = OrderItemNew.UnitPrice*OrderItemNew.QUANTITY
FROM OrderItemNew
inner join OrderGroupNew on OrderItemNew.OrderNumber=OrderGroupNew.OrderNumber
any help is appreciated
UPDATE OrderGroupNew
SET SavedTotal = (
SELECT SUM(UnitPrice * Quantity)
FROM OrderItemNew
WHERE OrderNumber = OrderGroupNew.OrderNumber
)
You can use a TVP as well :
;With o as (
select OrderItemNew.OrderNumber as OrderNumber,
SUM(OrderItemNew.UnitPrice*OrderItemNew.QUANTITY) as OrderSum
Group by OrderItemNew.OrderNumber)
UPDATE OrderGroupNew
set OrderGroupNew.SavedTotal = o.OrderSum
FROM o
INNER JOIN OrderGroupNew on o.OrderNumber=OrderGroupNew.OrderNumber
Well the 1st answer is correct too. Choose the best in term of performance :) (dont know which will be the best, to be honest ! )

Is this the optimum query for what I'm trying to do in MySQL?

I'm running a number of queries that merge constantly-changing data into a master table and one of the queries (below) seems to be running quite slowly.
The set up is as follows: products table and products_temp table have identical structures. New data goes into the products_temp table, then I run queries similar to the one below to merge the new data with the master products table.
INSERT INTO products ( name, brand, price, feeds_id, img_url, referral_url, productid, isbn, ean, upc )
SELECT name, brand, price, feeds_id, img_url, referral_url, productid, isbn, ean, upc
FROM products_temp
WHERE feeds_id = 449
AND productid NOT IN (
SELECT productid
FROM products
WHERE feeds_id = 449
)
Both of these tables have indexes on the feeds_id but I have a feeling that isn't making any difference.
As an example products may contain over 3.5 million rows and products_temp may contain 50,000 to merge products.
So my question really is how long should that take? How quick can I make it?
It is, this technique is called Shadow Table trick.
You may drop index on feeds_id and add a unique key on (feeds_id, productid) in the main table. Thus, you will be able to use INSERT IGNORE for merging. Pay attention to the order of fields in index - feeds_id must be first, so you can perform a search by feeds_id using this index.
NOT IN may cause a slowdown. Depending on what's inside the parenthesis, the query may get stuck in the 'preparing' state.
If you still experience slowdowns, use EXPLAIN or the profiling feature.
Try refactoring the query and set it up as a LEFT JOIN checking for NULL on the right side
INSERT INTO products ( name, brand, price, feeds_id,
img_url, referral_url, productid, isbn, ean, upc )
SELECT A.name, A.brand, A.price,
A.feeds_id, A.img_url, A.referral_url,
A.productid, A.isbn, A.ean, A.upc
FROM
(SELECT * FROM products_temp A WHERE feeds_id = 449) A
LEFT JOIN
(SELECT productid FROM products WHERE feeds_id = 449) B
USING (productid)
WHERE B.productid IS NULL;
also make sure you have this index
ALTER TABLE products_temp ADD INDEX feeds_id (feeds_id);
you should greatly avoid WHERE x not in (select xxx). The mysql query optimizer is very bugged with subqueries and will ignore indexes for example.