Liquibase script to delete table element based on other table element

Liquibase script to delete table element based on other table element - liquibase

I am pretty unfamiliar with liquibase script.
I have two tables, tableA and tableB.
tableB contains elements that have a tableA_fk value. It means that they point to an element of tableA.
tableA contains elements that come always by group of two. One of the element point to the pk of the other element (relatedpk).
I want to delete all the elements of the tableA that have the field "someValue" equal to NULL and no element of tableB pointing to it.
The elements can be removed only by group of two
Example:
tableA:
+----+---------------------+-----------+-----------+
| pk | name | someValue | relatedpk |
+----+---------------------+-----------+-----------+
| 1 | ElementA | 1 | NULL |
| 2 | ElementA | 1 | 1 |
| 3 | ElementB | NULL | NULL |
| 4 | ElementB | NULL | 3 |
| 5 | ElementC | 3 | NULL |
| 6 | ElementC | 3 | 5 |
| 7 | ElementD | NULL | NULL |
| 8 | ElementD | NULL | 7 |
| 9 | ElementE | NULL | NULL |
| 10 | ElementE | NULL | 9 |
+----+---------------------+-----------+-----------+
tableB:
+----+------------------------------+-----------+
| pk | name | tableA_fk |
+----+------------------------------+-----------+
| 1 | Value1 | 2 |
| 2 | Value2 | 3 |
| 3 | Value3 | 9 |
+----+------------------------------+-----------+
In this example I want to remove ElementD with pk=7,8 from tableA.
Reason:
ElementA cannot be removed because
someValue != null
ElementB cannot be removed because
tableA_fk = 3 for element Value2 in tableB
ElementC cannot be removed because
someValue != null
ElementD can be removed because
someValue=NULL
No Element from tableB point to one of this two elements from tableA.
ElementE cannot be removed because
tableA_fk = 9 for element Value3 in tableB
Is it possible to implement somthing like that in a liquibase script?
In something like that
<changeSet id="remove-elements">
<delete tableName="tableA">
<where>ConditionToRemoveTheCorrectELements</where>
</delete>
</changeSet>

Rather than trying to use the <delete tableName> tag, I would suggest that you just write the SQL required and use a <sql> or <sqlFile> change tag.

Related

Match and set child's ID based on condition

I've got a primary incremental ID column and have to find and set all its childs (in ParentID column) based on values from two other columns (Condition1 and Condition2)
Started ParentID always has Condition2 = 1 (and the same value in Condition1 column)
Initial table
+---------------------------------------------
| ID | ParentID | Condition1 | Condition2 |
+---------------------------------------------
| 1 | null | 1000 | 1 |
| 2 | null | 1000 | null |
| 3 | null | 1000 | null |
| 4 | null | 2000 | 1 |
| 5 | null | 2000 | null |
| 6 | null | 2000 | null |
| 7 | null | 3000 | 1 |
| 8 | null | 3000 | null |
| 9 | null | 3000 | null |
+---------------------------------------------
Desired Output
+---------------------------------------------
| ID | ParentID | Condition1 | Condition2 |
+---------------------------------------------
| 1 | 1 | 1000 | 1 |
| 2 | 1 | 1000 | null |
| 3 | 1 | 1000 | null |
| 4 | 4 | 2000 | 1 |
| 5 | 4 | 2000 | null |
| 6 | 4 | 2000 | null |
| 7 | 7 | 3000 | 1 |
| 8 | 7 | 3000 | null |
| 9 | 7 | 3000 | null |
+---------------------------------------------
Current code returns only one row for each new ID
update u
set u.ParentID = u.ID
from [db].[dbo].[tbl] u
inner join [db].[dbo].[tbl] on
u.Condition2 = 1 and u.Condition1 = u.Condition1

I think one intuitive way of doing this is
UPDATE [db].[dbo].[tbl] u
SET u.ParentID = (SELECT id FROM [db].[dbo].[tbl] u2 WHERE u2.condition1 = u.condition1 and u2.condition2 = 1)
I don't mean that this is better than what you're trying to do, just that I think it's very intuitive and easy to understand if you're having issues.
I think the solution you are looking for is:
update u
set u.ParentID = u2.ID
from [db].[dbo].[tbl] u
inner join [db].[dbo].[tbl] u2 on
u2.Condition2 = 1 and u.Condition1 = u2.Condition1
The important differences here are that I have given the both tables in the join-to-self a name (u and u2). You don't want to set u.ParentID = u.ID (as in your question), you want to set u.ParentID = u2.ID (note the 2). Similarly, you don't want to join the tables on u.condition1 = u.condition1 (since that is always true in this example) or u.condition2 = 1 (since you're applying that condition to the table you're updating, not the table you joined). Even though it is a join-to-self, you need to be clear about which table you are referencing. u in your query refers to the table being updated, but not the table on the right-side of the join.

Normalization settings table (with duplicates in the same cell and with different types in the same column)

I have settings table like this:
id | user_id | key | value
1 | 1 | show brand logos | true
2 | 1 | brand ids | 1,3,4
3 | 1 | search type | advanced
I use this and it works somehow, but I have read about normalization and it brakes even first normalization form rules: single valued attributes (there is a cell with 1,3,4) and attribute domain should not change (there are Boolean, String and Integers).
Have you any ideas how to normalize this table? I have searched a lot, but without success.

create a key table:
id | key
1 | show brand logos
2 | brand ids
3 | search type
add three rows instead on one
id | user_id | key | value_bool| value_str| value_num
1 | 1 | 1 | true | null | null
2 | 1 | 2 | null | null | 1
2 | 1 | 2 | null | null | 3
2 | 1 | 2 | null | null | 4
3 | 1 | 3 | null | advanced | null

Trigger to sort and update table in SQL Server

I have a table like this, Id is the primary key, so obviously the tables is ordered by Id.
+----+----------+--------+
| Id | OrderNo | Col2 |
+----+----------+--------+
| 1 | 3 | Value3 |
| 2 | 1 | Value1 |
| 3 | 2 | Value2 |
+----+----------+--------+
What I want to do is to reorder the table after every new insert by OrderNo, as follows:
+----+----------+--------+
| Id | OrderNo | Col2 |
+----+----------+--------+
| 2 | 1 | Value1 |
| 3 | 2 | Value2 |
| 1 | 3 | Value3 |
| 4 | 4 | Value4 |
+----+----------+--------+
Is this possible? I looked at this question, It says to drop and re-create the table. Is there anyway that i can do it with one go?
As a matter of fact I tried to execute the first query in the accepted answer but SSMS gives me an error.

Use a view to have the data sorted in any manner you like independent of what has been input.

Marking records with 1 on first occurence of unique value

I have a table that I'd like to add a column to that shows a 1 on the first occurrence of a given value for the record within the dataset.
So, for example, if I was using the ID field as where to look for unique occurrences, I'd want a "FirstOccur" column (like the one below) putting a 1 on the first occurrence of a unique ID value in the dataset and just ignoring (leaving as null) any other occurrence:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | |
| 1 | |
| 2 | 1 |
| 2 | |
| 3 | 1 |
| 4 | 1 |
| 4 | |
I have a working 2-step approach that first applies some ranking sql that will give me something like this:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 1 |
| 4 | 2 |
..and I just apply some update SQL to null any value above 1 to get the desired result.
I was just wondering if there was a (simpler) one-hit approach.

Assuming you have a creation date or auto incremented id or something that specifies the ordering, you can do:
update t
set firstoccur = 1
where creationdate = (select min(creationdate)
from t as t2
where t2.id = t.id
);

LIMIT the return of non-unique values

I have two tables. Posts and Replies. Think of posts as a blog entry while replies are the comments.
I want to display X number of posts and then the latest three comments for each of the posts.
My replies has a foreign key "post_id" which matches the "id" of every post.
I am trying to create a main page that has something along the lines of
Post
--Reply
--Reply
--Reply
Post
--Reply
so on and so fourth. I can accomplish this by using a for loop in my template and discarding the unneeded replies but I hate grabbing data from a db I won't use. Any ideas?

This is actually a pretty interesting question.
HA HA DISREGARD THIS, I SUCK
On edit: this answer works, but on MySQL it becomes tediously slow when the number of parent rows is as few as 100. However, see below for a performant fix.
Obviously, you can run this query once per post: select * from comments where id = $id limit 3 That creates a lot of overhead, as you end up doing one database query per post, the dreaded N+1 queries.
If you want to get all posts at once (or some subset with a where) the following will surprisingly work. It assumes that comments have a monotonically increasing id (as a datetime is not guaranteed to be unique), but allows for comment ids to be interleaved among posts.
Since an auto_increment id column is monotonically increasing, if comment has an id, you're all set.
First, create this view. In the view, I call post parent and comment child:
create view parent_top_3_children as
select a.*,
(select max(id) from child where parent_id = a.id) as maxid,
(select max(id) from child where id < maxid
and parent_id = a.id) as maxidm1,
(select max(id) from child where id < maxidm1
and parent_id = a.id) as maxidm2
from parent a;
maxidm1 is just "max id minus 1"; maxidm2, "max id minus 2" -- that is, the second and third greatest child ids within a particular parent id.
Then join the view to whatever you need from the comment (I'll call that text):
select a.*,
b.text as latest_comment,
c.text as second_latest_comment,
d.text as third_latest_comment
from parent_top_3_children a
left outer join child b on (b.id = a.maxid)
left outer join child c on (c.id = a.maxidm1)
left outer join child d on (c.id = a.maxidm2);
Naturally, you can add whatever where clause you want to that, to limit the posts: where a.category = 'foo' or whatever.
Here's what my tables look like:
mysql> select * from parent;
+----+------+------+------+
| id | a | b | c |
+----+------+------+------+
| 1 | 1 | 1 | NULL |
| 2 | 2 | 2 | NULL |
| 3 | 3 | 3 | NULL |
+----+------+------+------+
3 rows in set (0.00 sec)
And a portion of child. Parent 1 has noo children:
mysql> select * from child;
+----+-----------+------+------+------+------+
| id | parent_id | a | b | c | d |
+----+-----------+------+------+------+------+
. . . .
| 18 | 3 | NULL | NULL | NULL | NULL |
| 19 | 2 | NULL | NULL | NULL | NULL |
| 20 | 2 | NULL | NULL | NULL | NULL |
| 21 | 3 | NULL | NULL | NULL | NULL |
| 22 | 2 | NULL | NULL | NULL | NULL |
| 23 | 2 | NULL | NULL | NULL | NULL |
| 24 | 3 | NULL | NULL | NULL | NULL |
| 25 | 2 | NULL | NULL | NULL | NULL |
+----+-----------+------+------+------+------+
24 rows in set (0.00 sec)
And the view gives us this:
mysql> select * from parent_top_3;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 1 | 1 | 1 | NULL | NULL | NULL | NULL |
| 2 | 2 | 2 | NULL | 25 | 23 | 22 |
| 3 | 3 | 3 | NULL | 24 | 21 | 18 |
+----+------+------+------+-------+---------+---------+
3 rows in set (0.21 sec)
The explain plan for the view is only slightly hairy:
mysql> explain select * from parent_top_3;
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 3 | |
| 5 | DEPENDENT SUBQUERY | child | ALL | PRIMARY | NULL | NULL | NULL | 24 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ALL | PRIMARY | NULL | NULL | NULL | 24 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ALL | NULL | NULL | NULL | NULL | 24 | Using where |
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
However, if we add an index for parent_fks,it gets a better:
mysql> create index pid on child(parent_id);
mysql> explain select * from parent_top_3;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 3 | |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 2 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 2 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 2 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
5 rows in set (0.04 sec)
As noted above, this begins to fall apart when the number of parent rows is few as 100, even if we index into parent using its primary key:
mysql> select * from parent_top_3 where id < 10;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 1 | 1 | 1 | NULL | NULL | NULL | NULL |
| 2 | 2 | 2 | NULL | 25 | 23 | 22 |
| 3 | 3 | 3 | NULL | 24 | 21 | 18 |
| 4 | NULL | 1 | NULL | 65 | 64 | 63 |
| 5 | NULL | 2 | NULL | 73 | 72 | 71 |
| 6 | NULL | 3 | NULL | 113 | 112 | 111 |
| 7 | NULL | 1 | NULL | 209 | 208 | 207 |
| 8 | NULL | 2 | NULL | 401 | 400 | 399 |
| 9 | NULL | 3 | NULL | 785 | 784 | 783 |
+----+------+------+------+-------+---------+---------+
9 rows in set (1 min 3.11 sec)
(Note that I intentionally test on a slow machine, with data saved on a slow flash disk.)
Here's the explain, looking for exactly one id (and the first one, at that):
mysql> explain select * from parent_top_3 where id = 1;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 1000 | |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 179 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 179 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 179 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
5 rows in set (56.01 sec)
Over 56 seconds for one row, even on my slow machine, is two orders of magnitude unacceptable.
So can we save this query? It works, it's just too slow.
Here's the explain plan for the modified query. It looks as bad or worse:
mysql> explain select * from parent_top_3a where id = 1;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
| 2 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 4 | DERIVED | <derived6> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 6 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 100 | |
| 7 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 179 | Using where |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | a.id | 179 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | a.id | 179 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
7 rows in set (0.05 sec)
But it completes three orders of magnitude faster, in 1/20th of a second!
How do we get to the much speedier parent_top_3a? We create three views, each one dependent on the previous one:
create view parent_top_1 as
select a.*,
(select max(id) from child where parent_id = a.id)
as maxid
from parent a;
create view parent_top_2 as
select a.*,
(select max(id) from child where parent_id = a.id and id < a.maxid)
as maxidm1
from parent_top_1 a;
create view parent_top_3a as
select a.*,
(select max(id) from child where parent_id = a.id and id < a.maxidm1)
as maxidm2
from parent_top_2 a;
Not only does this work much more quickly, it's legal on RDBMSes other than MySQL.
Let's increase the number of parent rows to 12800, the number of child rows to 1536 (most blog posts don't get comments, right? ;) )
mysql> select * from parent_top_3a where id >= 20 and id < 40;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 39 | NULL | 2 | NULL | NULL | NULL | NULL |
| 38 | NULL | 1 | NULL | NULL | NULL | NULL |
| 37 | NULL | 3 | NULL | NULL | NULL | NULL |
| 36 | NULL | 2 | NULL | NULL | NULL | NULL |
| 35 | NULL | 1 | NULL | NULL | NULL | NULL |
| 34 | NULL | 3 | NULL | NULL | NULL | NULL |
| 33 | NULL | 2 | NULL | NULL | NULL | NULL |
| 32 | NULL | 1 | NULL | NULL | NULL | NULL |
| 31 | NULL | 3 | NULL | NULL | NULL | NULL |
| 30 | NULL | 2 | NULL | 1537 | 1536 | 1535 |
| 29 | NULL | 1 | NULL | 1529 | 1528 | 1527 |
| 28 | NULL | 3 | NULL | 1513 | 1512 | 1511 |
| 27 | NULL | 2 | NULL | 1505 | 1504 | 1503 |
| 26 | NULL | 1 | NULL | 1481 | 1480 | 1479 |
| 25 | NULL | 3 | NULL | 1457 | 1456 | 1455 |
| 24 | NULL | 2 | NULL | 1425 | 1424 | 1423 |
| 23 | NULL | 1 | NULL | 1377 | 1376 | 1375 |
| 22 | NULL | 3 | NULL | 1329 | 1328 | 1327 |
| 21 | NULL | 2 | NULL | 1281 | 1280 | 1279 |
| 20 | NULL | 1 | NULL | 1225 | 1224 | 1223 |
+----+------+------+------+-------+---------+---------+
20 rows in set (1.01 sec)
Note that these timings are for MyIsam tables; I'll leave it to someone else to do timings on Innodb.
But using Postgresql, on a similar but not identical data set, we get similar timings on where predicates involving parent's columns:
postgres=# select (select count(*) from parent) as parent_count, (select count(*)
from child) as child_count;
parent_count | child_count
--------------+-------------
12289 | 1536
postgres=# select * from parent_top_3a where id >= 20 and id < 40;
id | a | b | c | maxid | maxidm1 | maxidm2
----+---+----+---+-------+---------+---------
20 | | 18 | | 1464 | 1462 | 1461
21 | | 88 | | 1463 | 1460 | 1457
22 | | 72 | | 1488 | 1486 | 1485
23 | | 13 | | 1512 | 1510 | 1509
24 | | 49 | | 1560 | 1558 | 1557
25 | | 92 | | 1559 | 1556 | 1553
26 | | 45 | | 1584 | 1582 | 1581
27 | | 37 | | 1608 | 1606 | 1605
28 | | 96 | | 1607 | 1604 | 1601
29 | | 90 | | 1632 | 1630 | 1629
30 | | 53 | | 1631 | 1628 | 1625
31 | | 57 | | | |
32 | | 64 | | | |
33 | | 79 | | | |
34 | | 37 | | | |
35 | | 60 | | | |
36 | | 75 | | | |
37 | | 34 | | | |
38 | | 87 | | | |
39 | | 43 | | | |
(20 rows)
Time: 91.139 ms

Sounds like you just want the LIMIT clause for a SELECT statement:
SELECT comment_text, other_stuff FROM comments WHERE post_id = POSTID ORDER BY comment_time DESC LIMIT 3;
You'll have to run this query once per post you want to show comments for. There are a few ways to get around that, if you're willing to sacrifice maintainability and your sanity in the Quest for Ultimate Performance:
As above, one query per post to retrieve comments. Simple, but probably not all that fast.
Retrieve a list of post_ids that you want to show comments for, then retrieve all comments for those posts, and filter them client-side (or you could do it server-side if you had windowing functions, I think, though those aren't in MySQL). Simple on the server side, but the client-side filtering will be ugly, and you're still moving a lot of data from server to client, so this probably won't be all that fast either.
As #1, but use an unholy UNION ALL of as many queries as you have posts to display, so you're running one abominable query instead of N small ones. Ugly, but it'll be faster than options 1 or 2. You'll still have to do a bit of filtering client-side, but careful writing of the UNION will make that much easier than the filtering required for #2, and no wasted data will be sent over the wire. It'll make for an ugly query, though.
Join the posts and comments table, partially pivoting the comments. This is pretty clean if you only need one comment, but if you want three it'll get messy quickly. Great on the client side, but even worse SQL than #3, and probably harder for the server, to boot.
At the end of the day, I'd go with option 1, the simple query above, and not worry about the overhead of doing it once per post. If you only needed one comment, then the join option might be acceptable, but you want three and that rules it out. If windowing functions ever get added to MySQL (they're in release 8.4 of PostgreSQL), option 2 might become palatable or even preferable. Until that day, though, just pick the simple, easy-to-understand query.

Although there may be a clever way to get this in one query with no schema changes I'm guessing it wouldn't be performant anyway. Edit: Looks like tpdi has the clever solution. It looks potentially pretty fast, but I'd be curious to see a benchmark on specific databases.
Given the constraints of high performance and minimal data transfer I have two suggestions.
Solution with no schema changes or maintenance
First:
SELECT * FROM Posts
Collect the ids, then:
SELECT id FROM Replies WHERE post_id IN (?) ORDER BY id DESC
Finally, loop through those ids, grabbing only the first 3 for each post_id, then do:
SELECT * FROM Replies WHERE post_id IN (?)
More efficient solution if you are willing to maintain a few cache columns
The second solution is assuming that there are far more reads than writes, you can minimize lookups by storing the last three comment ids on the Posts table every time you add a Reply. In that case you would simply add three columns last_reply_id, second_reply_id, third_reply_id or some such. Then you can look up with either two queries like:
SELECT * FROM Posts
Collect the ids from those fields, then:
SELECT * FROM Replies WHERE post_id IN (?)
If you have those fields you could also manually construct a triple join, which would get the data in one query, although the field list would be quite verbose. Something like
SELECT posts.*, r1.title, r2.title ... FROM Posts
LEFT JOIN Replies as r1 ON Posts.last_reply_id = Replies.id
LEFT JOIN Replies as r2 ON Posts.second_reply_id = Replies.id
...
Which you prefer probably depends on your ORM or language.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Liquibase script to delete table element based on other table element - liquibase

Rather than trying to use the <delete tableName> tag, I would suggest that you just write the SQL required and use a <sql> or <sqlFile> change tag.

Related

Match and set child's ID based on condition

Normalization settings table (with duplicates in the same cell and with different types in the same column)

Trigger to sort and update table in SQL Server

Marking records with 1 on first occurence of unique value

LIMIT the return of non-unique values

Categories

Resources