Help with MySQL statement - sql

I have written the following SQL statement in MySQL :
USE my_database;
SELECT * FROM some_table WHERE some_column IN (1, 2, 3);
This returns a set of rows that have a column value which is a key into a row of another table (call it some_other_table).
a b c d <--this is the column with the key
1
2
3
I want to say, look up all of the rows in another table with value 1, and do something (null out some column)
Any help is appreciated.

Yes, you can use the multiple-table UPDATE syntax:
UPDATE some_other_table
JOIN some_table ON (some_table.some_key = some_other_table.id)
SET some_other_table.some_field = NULL
WHERE some_table.some_column IN (1, 2, 3);
Example:
CREATE TABLE some_table (id int, some_column int, some_key int);
CREATE TABLE some_other_table (id int, some_field int);
INSERT INTO some_table VALUES (1, 1, 1);
INSERT INTO some_table VALUES (2, 2, 2);
INSERT INTO some_table VALUES (3, 3, 3);
INSERT INTO some_table VALUES (4, 4, 4);
INSERT INTO some_table VALUES (5, 5, 5);
INSERT INTO some_other_table VALUES (1, 10);
INSERT INTO some_other_table VALUES (2, 20);
INSERT INTO some_other_table VALUES (3, 30);
INSERT INTO some_other_table VALUES (4, 40);
Before:
SELECT * FROM some_table;
+------+-------------+----------+
| id | some_column | some_key |
+------+-------------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
+------+-------------+----------+
5 rows in set (0.00 sec)
SELECT * FROM some_other_table;
+------+------------+
| id | some_field |
+------+------------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
+------+------------+
4 rows in set (0.00 sec)
After:
SELECT * FROM some_table;
+------+-------------+----------+
| id | some_column | some_key |
+------+-------------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
+------+-------------+----------+
5 rows in set (0.00 sec)
SELECT * FROM some_other_table;
+------+------------+
| id | some_field |
+------+------------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
| 4 | 40 |
+------+------------+
4 rows in set (0.00 sec)
UPDATE: Further to comments below.
Another example:
CREATE TABLE amir_effective_reference (class int, inst int, rln int, rclass int, rinst int, chg int, typ int);
CREATE TABLE amir_effective_change (chg int, txn int, rltn int, entry int, effective int);
INSERT INTO amir_effective_reference VALUES (1, 100, 1, 50, 20, 10, 5000);
INSERT INTO amir_effective_change VALUES (10, 100, 100, 500, 200);
Result:
UPDATE amir_effective_change
JOIN amir_effective_reference ON (amir_effective_reference.chg = amir_effective_change.chg)
SET amir_effective_change.effective = NULL
WHERE amir_effective_change.rltn IN (100);
SELECT * FROM amir_effective_change;
+------+------+------+-------+-----------+
| chg | txn | rltn | entry | effective |
+------+------+------+-------+-----------+
| 10 | 100 | 100 | 500 | NULL |
+------+------+------+-------+-----------+
1 row in set (0.00 sec)

Related

SQL Eliminate Duplicates whilst merging additional table

i have two tables, ADDRESSES and an additional table CONTACTS. CONTACTS have a SUPERID which is the ID of the ADDRESS they belong to.
I want to identify duplicates (same Name, Firstname and Birthday) in the ADDRESSES Table and merge the contacts of these duplicates onto the latest Adress (latest DATECREATE or highest ID of the Adress).
Afterwards the other duplicates shall be deleted.
My approach for merging the contacts does not work though. Deleting duplicates works.
This is my approach. Would be grateful for support what is wrong here.
Thank you!
UPDATE dbo.CONTACTS
SET SUPERID = ADDRESSES.ID FROM dbo.ADDRESSES
inner join CONTACTS on ADDRESSES.ID = CONTACTS.SUPERID
WHERE ADDRESSES.id in (
SELECT id FROM dbo.ADDRESSES
WHERE EXISTS(
SELECT NULL FROM ADDRESSES AS tmpcomment
WHERE dbo.ADDRESSES.FIRSTNAME0 = tmpcomment.FIRSTNAME0
AND dbo.ADDRESSES.LASTNAME0 = tmpcomment.LASTNAME0
and dbo.ADDRESSES.BIRTHDAY1 = tmpcomment.BIRTHDAY1
HAVING dbo.ADDRESSES.id > MIN(tmpcomment.id)
))
DELETE FROM ADDRESSES
WHERE id in (
SELECT id FROM dbo.ADDRESSES
WHERE EXISTS(
SELECT NULL FROM ADDRESSES AS tmpcomment
WHERE dbo.ADDRESSES.FIRSTNAME0 = tmpcomment.FIRSTNAME0
AND dbo.ADDRESSES.LASTNAME0 = tmpcomment.LASTNAME0
and dbo.ADDRESSES.BIRTHDAY1 = tmpcomment.BIRTHDAY1
HAVING dbo.ADDRESSES.id > MIN(tmpcomment.id)
)
)
Here is a sample for understanding the issue.
ADDRESSES
| ID | DATECREATE | LASTNAME0 | FIRSTNAME0 | BIRTHDAY1 |
|:-----------|------------:|:------------:|------------:|:------------:|
| 1 | 19.07.2011 | Arthur | James | 05.05.1980 |
| 2 | 23.08.2012 | Arthur | James | 05.05.1980 |
| 3 | 11.12.2015 | Arthur | James | 05.05.1980 |
| 4 | 22.10.2016 | Arthur | James | 05.05.1980 |
| 6 | 20.12.2014 | Doyle | Peter | 01.01.1950 |
| 7 | 09.01.2016 | Doyle | Peter | 01.01.1950 |
|:-----------|------------:|:------------:|------------:|:------------:|
CONTACTS
| ID | SUPERID |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 4 |
| 7 | 4 |
| 8 | 6 |
| 9 | 6 |
| 10 | 6 |
| 11 | 7 |
The result shall be like this
ADDRESSES
| ID | DATECREATE | LASTNAME0 | FIRSTNAME0 | BIRTHDAY1 |
|:-----------|------------:|:------------:|------------:|:------------:|
| 4 | 22.10.2016 | Arthur | James | 05.05.1980 |
| 7 | 09.01.2016 | Doyle | Peter | 01.01.1950 |
CONTACTS
| ID | SUPERID |
| 1 | 4 |
| 2 | 4 |
| 3 | 4 |
| 4 | 4 |
| 5 | 4 |
| 6 | 4 |
| 7 | 4 |
| 8 | 7 |
| 9 | 7 |
| 10 | 7 |
| 11 | 7 |
My approach would use a temporary table:
/*
CREATE TABLE addresses
([ID] int, [DATECREATE] varchar(10), [LASTNAME0] varchar(6), [FIRSTNAME0] varchar(5), [BIRTHDAY1] datetime);
INSERT INTO addresses
([ID], [DATECREATE], [LASTNAME0], [FIRSTNAME0], [BIRTHDAY1])
VALUES
(1, '19.07.2011', 'Arthur', 'James', '1980-05-05 00:00:00'),
(2, '23.08.2012', 'Arthur', 'James', '1980-05-05 00:00:00'),
(3, '11.12.2015', 'Arthur', 'James', '1980-05-05 00:00:00'),
(4, '22.10.2016', 'Arthur', 'James', '1980-05-05 00:00:00'),
(6, '20.12.2014', 'Doyle', 'Peter', '1950-01-01 00:00:00'),
(7, '09.01.2016', 'Doyle', 'Peter', '1950-01-01 00:00:00');
CREATE TABLE contacts
([ID] int, [SUPERID] int);
INSERT INTO contacts
([ID], [SUPERID])
VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 4),
(7, 4),
(8, 6),
(9, 6),
(10, 6),
(11, 7);
*/
DROP TABLE IF EXISTS #t; --sqls2016+ only, google for an older method if yours is sub 2016
SELECT id as oldid, MAX(id) OVER(PARTITION BY lastname0, firstname0, birthday1) as newid INTO #t
FROM
addresses;
/*now #t contains data like
1, 4
2, 4
3, 4
4, 4
6, 7
7, 7*/
--remove the ones we don't need to change
DELETE FROM #t WHERE oldid = newid;
BEGIN TRANSACTION;
SELECT * FROM addresses;
SELECT * FROM contacts;
--now #t is the list of contact changes we need to make, so make those changes
UPDATE contacts
SET contacts.superid = #t.newid
FROM
contacts INNER JOIN #t ON contacts.superid = #t.oldid;
--now scrub the old addresses with no contact records. This catches all such records, not just those in #t
DELETE FROM addresses WHERE id NOT IN (SELECT DISTINCT superid FROM contacts);
--alternative to just clean up the records we affected in this operation
DELETE FROM addresses WHERE id IN (SELECT oldid FROM #t);
SELECT * FROM addresses;
SELECT * FROM contacts;
ROLLBACK TRANSACTION;
Please note, i have tested this and it produces the results you want but I advocate caution copying an update/delete query off the internet and running. I've inserted a transaction that selects the data before and after and rolls back the transaction so nothing gets wrecked. Run it on a test db first though!

SQL count occurrences of values grouped by external tables references

What is the best approach in terms of performance and maintainability to count the number of occurrences of the same value in a table, grouping the results with the same reference that groups the entries of the table?
Let's say I have three tables (concepts have been shrinked in order to represent a scenario that is similar to the one I'm working on):
|----------| |----------------| |-----------------------------------|
| MEAL | | RECIPE | | INGREDIENT_ENTRY |
|----------| |----------------| |-----------------------------------|
| ID | ... | | ID | ID_m | ...| | ID | ID_r | amount and description|
|----------| |----------------| |-----------------------------------|
| 1 | ... | | 1 | 1 | ...| | 1 | 1 | '15gr of yeast' |
| 2 | ... | | 2 | 2 | ...| | 2 | 4 | '2 eggs' |
| 3 | ... | | 3 | 3 | ...| | 3 | 1 | '300cl of water' |
| 4 | ... | | 4 | 4 | ...| | 4 | 2 | '300cl of beer' |
|----------| | 5 | 1 | ...| | 5 | 3 | '250cl of milk' |
| 6 | 4 | ...| | 6 | 5 | '100gr of biscuits' |
| 7 | 5 | ...| | 7 | 2 | '15gr of yeast' |
| 8 | 6 | ...| | 8 | 1 | '500gr of flour' |
|----------------| | 9 | 2 | '500gr of flour' |
| 10 | 2 | '10gr of salt' |
| 11 | 4 | '15gr of yeast' |
|-----------------------------------|
The same MEAL can be cooked with a different RECIPE, and each RECIPE is made of different INGREDIENT_ENTRYs, organized in the same RECIPE by sharing the same ID_r value.
INGREDIENT_ENTRY.[amount and description] is a column of type VARCHAR(MAX), this is the value that must be compared.
In the example, making the query with (MEAL 1,RECIPE 1):
It has 3 ingredients (1,3,8), and shares:
Two ingredients with RECIPE 2 (7,9) -> and so can be found in MEAL 2
One ingredient with RECIPE 4 (11) -> and so can be found in MEAL 3
Result should look something like:
|------| |--------| |-------|
| MEAL | | RECIPE | | COUNT |
|------| |--------| |-------|
| 2 | | 2 | | 2 |
| 4 | | 4 | | 1 |
|------| |--------| |-------|
I'm experimenting with views to reduce SQL complexity, but I cannot make it with a single SQL statement and I would like to avoid going back and forth to code (C#) and perform multiple queries (for example query for every ingredient, and reconcile results with HashMaps or similar).
Please, note that I cannot modify the DB structure.
You can find common ingredients using EXISTS. In the below I have simply used a Common table expression so that I don't have to write out the joins more than once to get back to a meal ID:
DECLARE #SelectedMealID INT = 1;
WITH LinkedData AS
(
SELECT MealID = r.ID_m,
RecipeID = r.ID,
Ingredient = i.[amount and description]
FROM RECIPE AS r
INNER JOIN INGREDIENT_ENTRY AS i
ON i.ID_r = r.ID
)
SELECT a.MealID,
a.RecipeID,
CommonIngedients = COUNT(*)
FROM LinkedData AS a
WHERE a.MealID != #SelectedMealID
AND EXISTS
( SELECT 1
FROM LinkedData AS b
WHERE b.Ingredient = a.Ingredient
AND b.MealID = #SelectedMealID
)
GROUP BY a.MealID, a.RecipeID;
I have tested this with the below sample:
-- GENERATE TABLES AND DATA
DECLARE #Meal TABLE (ID INT);
INSERT #Meal (ID) VALUES (1), (2), (3), (4);
DECLARE #Recipe TABLE (ID INT, ID_m INT);
INSERT #Recipe (ID, ID_m)
VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 1), (6, 4), (7, 5), (8, 6);
DECLARE #Ingredient TABLE (ID INT, ID_r INT, AmountAndDescription VARCHAR(MAX));
INSERT #Ingredient (ID, ID_R, AmountAndDescription)
VALUES
(1, 1, '15gr of yeast'), (2, 4, '2 eggs'),
(3, 1, '300cl of water'), (4, 2, '300cl of beer'),
(5, 3, '250cl of milk'), (6, 5, '100gr of biscuits'),
(7, 2, '15gr of yeast'), (8, 1, '500gr of flour'),
(9, 2, '500gr of flour'), (10, 2, '10gr of salt'),
(11, 4, '15gr of yeast');
-- TEST QUERY
DECLARE #SelectedMealID INT = 1;
WITH LinkedData AS
(
SELECT MealID = r.ID_m,
RecipeID = r.ID,
Ingredient = i.AmountAndDescription
FROM #Recipe AS r
INNER JOIN #Ingredient AS i
ON i.ID_r = r.ID
)
SELECT a.MealID,
a.RecipeID,
CommonIngedients = COUNT(*)
FROM LinkedData AS a
WHERE a.MealID != #SelectedMealID
AND EXISTS
( SELECT 1
FROM LinkedData AS b
WHERE b.Ingredient = a.Ingredient
AND b.MealID = #SelectedMealID
)
GROUP BY a.MealID, a.RecipeID;
OUTPUT
MealID RecipeID CommonIngedients
------------------------------------------
2 2 2
4 4 1
N.B. The expected output in the question differs slighly but I think the question may contain a typo (states Recipe 4 relates to meal 3, but this doesn't appear to be the case in the sample data)

Optimization of a sql-query with exists

I have a table:
+----+---------+-----------+--------------+-----------+
| id | item_id | attr_name | string_value | int_value |
+----+---------+-----------+--------------+-----------+
| 1 | 1 | 1 | prop_str_1 | NULL |
| 2 | 1 | 2 | prop_str_2 | NULL |
| 3 | 1 | 3 | NULL | 2 |
| 4 | 2 | 1 | prop_str_1 | NULL |
| 5 | 2 | 2 | prop_str_3 | NULL |
| 6 | 2 | 3 | NULL | 2 |
| 7 | 3 | 1 | prop_str_4 | NULL |
| 8 | 3 | 2 | prop_str_2 | NULL |
| 9 | 3 | 3 | NULL | 1 |
+----+---------+-----------+--------------+-----------+
And I want to select item_id with specific values for the attributes. But this is complicated by the fact that the fetching needs to do on several attributes. I've got to do it just using exists:
select *
from item_attribute as attr
where (name = 1 and string_value = 'prop_str_1')
and exists
(select item_id
from item_attribute
where item_id = attr.item_id and name = 2 and string_value = 'prop_str_2')
But the number of attributes can be increased, and therefore nested queries with exists will increase.
How can I rewrite this query to reduce the nested queries?
UPD:
create table item_attribute(
id int not null,
item_id int not null,
attr_name int not null,
string_value varchar(50),
int_value int,
primary key (id)
);
insert into item_attribute values (1, 1, 1, 'prop_str_1', NULL);
insert into item_attribute values (2, 1, 2, 'prop_str_2', NULL);
insert into item_attribute values (3, 1, 3, NULL, 2);
insert into item_attribute values (4, 2, 1, 'prop_str_1', NULL);
insert into item_attribute values (5, 2, 2, 'prop_str_3', NULL);
insert into item_attribute values (6, 2, 3, NULL, 2);
insert into item_attribute values (7, 3, 1, 'prop_str_4', NULL);
insert into item_attribute values (8, 3, 2, 'prop_str_2', NULL);
insert into item_attribute values (9, 3, 3, NULL, 1);
See if this works for you. It in essence does the same thing... Your first qualifier is that a given attribute name = 1 and string = 'prop_str_1', but then self-joins to attribute table again on same ID but second attribute and string
select
attr.*
from
item_attribute attr
JOIN item_attribute attr2
ON attr.item_id = attr2.item_id
and attr2.name = 2
and attr2.string_value = 'prop_str_2'
where
attr.name = 1
and string_value = 'prop_str_1'
I would also have an index on your table on (name, string_value, item_id) to increase performance of where and join conditions.

Query Maximum VARRAY value

How can I query for the maximum value inside a varray?
create type myWave as varray(10) of int;
create table foo (id number, yVals myWave);
insert into foo values (1, myWave(1, 8, 5));
insert into foo values (2, myWave(1, 3, 4));
insert into foo values (3, myWave(9, 5, 9));
insert into foo values (4, myWave(8, 2));
Incorrect SQL: SELECT id, MAX(yVals) maxY FROM foo
Desired output:
| id | maxY |
|----|------|
| 1 | 8 |
| 2 | 4 |
| 3 | 9 |
| 4 | 8 |
SELECT t1.ID, MAX(t2.column_value) FROM foo t1, TABLE(t1.yVals) t2 group by t1.ID

How to select shipments that have one activity but doesn't have another one

Simplified version of a table
Table ActivityHistory:
ActivityHistoryid(PK) | ShipmentID | ActivityCode | Datetime
1 | 1 | CodeA |
2 | 1 | CodeB |
3 | 1 | CodeC |
4 | 2 | CodeA |
5 | 3 | CodeA |
6 | 3 | CodeB |
7 | 4 | CodeC |
This table contains the list of activities that occurred to given shipments.
Task: I need to select shipments(shipment ids) that has "CodeA" and doesn't have a "CodeC" activity.
In this example, shipment id 2 and 3 will match the criteria.
Table Shipment: (ShipmentID(PK), other shipment related columns)
Thank you.
Try this one -
Query:
DECLARE #temp TABLE
(
ActivityHistoryid INT
, ShipmentID INT
, ActivityCode VARCHAR(20)
)
INSERT INTO #temp (ActivityHistoryid, ShipmentID, ActivityCode)
VALUES
(1, 1, 'CodeA'),
(2, 1, 'CodeB'),
(3, 1, 'CodeC'),
(4, 2, 'CodeA'),
(5, 3, 'CodeA'),
(6, 3, 'CodeB'),
(7, 4, 'CodeC')
SELECT *
FROM #temp t
WHERE ActivityCode = 'CodeA'
AND NOT EXISTS(
SELECT 1
FROM #temp t2
WHERE t2.ActivityCode = 'CodeC'
AND t2.ShipmentID = t.ShipmentID
)
Output:
ActivityHistoryid ShipmentID ActivityCode
----------------- ----------- --------------------
4 2 CodeA
5 3 CodeA