Dynamically removing records when certain columns = 0; data cleansing

Dynamically removing records when certain columns = 0; data cleansing - sql

I have a simple card table:
CREATE TABLE `users_individual_cards` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` char(36) NOT NULL,
`individual_card_id` int(11) NOT NULL,
`own` int(10) unsigned NOT NULL,
`want` int(10) unsigned NOT NULL,
`trade` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`,`individual_card_id`),
KEY `user_id_2` (`user_id`),
KEY `individual_card_id` (`individual_card_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
I have ajax to add and remove the records based on OWN, WANT, and TRADE. However, if the user removes all of the OWN, WANT, and TRADE cards, they go to zero but it will leave the record in the database. I would prefer to have the record removed. Is checking after each "update" to see if all the columns = 0 the only way to do this? Or can I set a conditional trigger with something like:
//psuedo sql
AFTER update IF (OWN = 0, WANT = 0, TRADE = 0) DELETE
What is the best way to do this? Can you help with the syntax?

Why not just fire two queries from PHP (or other front end)?
update `users_individual_cards` ...
delete `users_individual_cards` where ... (same condition) and own + want + trade = 0

The trigger will be:
CREATE TRIGGER users_individual_cards_trigger
AFTER UPDATE ON users_individual_cards
FOR EACH ROW
BEGIN
DELETE FROM users_individual_cards
WHERE 'OWN' = 0 AND 'WANT' = 0 AND 'TRADE' = 0;
END$$
The solutions throw the delete query will be better because not all versions of mysql support it.

Related

Constraint violation when merging into table

I'm having a staging table and a datawarehouse table, which keep giving me constraint violation. i can't seem to figure out why since DRIVERID and RACEID a combination of those should be unique? How come i get contraint violation - primary key
table
CREATE TABLE QUALIFYING (
QUALIFYID DECIMAL(18,0) IDENTITY NOT NULL,
RACEID DECIMAL(18,0) DEFAULT '0' NOT NULL,
DRIVERID DECIMAL(18,0) DEFAULT '0' NOT NULL,
CONSTRUCTORID DECIMAL(18,0) DEFAULT '0' NOT NULL,
DRIVERNUMBER DECIMAL(18,0) DEFAULT '0' NOT NULL,
DRIVERPOSITION DECIMAL(18,0) DEFAULT NULL,
Q1 VARCHAR(255) UTF8 DEFAULT NULL,
Q2 VARCHAR(255) UTF8 DEFAULT NULL,
Q3 VARCHAR(255) UTF8 DEFAULT NULL,
PRIMARY KEY(QUALIFYID)
);
Staging
CREATE OR REPLACE TABLE STGQUALIFYING(
raceId int DEFAULT '0' NOT NULL,
driverId int DEFAULT '0' NOT NULL,
constructorId int DEFAULT '0' NOT NULL,
driverNumber int DEFAULT '0' NOT NULL,
driverPosition int DEFAULT NULL,
q1 varchar(255) DEFAULT NULL,
q2 varchar(255) DEFAULT NULL,
q3 varchar(255) DEFAULT NULL,
PRIMARY KEY(RACEID, DRIVERID)
);
SQL
MERGE INTO QUALIFYING c
USING STGQUALIFYING n
ON
(n.RACEID = c.RACEID AND n.DRIVERID = c.DRIVERID)
WHEN MATCHED THEN
UPDATE SET
CONSTRUCTORID = n.CONSTRUCTORID, DRIVERNUMBER = n.DRIVERNUMBER, DRIVERPOSITION = n.DRIVERPOSITION, Q1 = n.Q1, Q2 = n.Q2, Q3 = n.Q3
WHEN NOT MATCHED THEN
INSERT (RACEID, DRIVERID, CONSTRUCTORID, DRIVERNUMBER, DRIVERPOSITION, Q1, Q2, Q3) VALUES
(RACEID, DRIVERID, CONSTRUCTORID, DRIVERNUMBER, DRIVERPOSITION, Q1, Q2, Q3);

The EXASolution user manual says:
The content of an identity column applies to the following rules:
If you specify an explicit value for the identity column while inserting a row, then this value is inserted.
In all other cases monotonically increasing numbers are generated by the system, but gaps can occur between the numbers.
and
You should not mistake an identity column with a constraint, i.e. identity columns do not guarantee unique values. But the values are unique as long as values are inserted only implicitly and are not changed manually.
You've put a primary key constraint on your identity column, so it must be unique. Since you are getting duplicates from your merge, either (a) you have, at some point, provided explicit values as in the first bullet above or updated a value manually, and the monotonically increasing sequence has reached a point where it is clashing with those existing values; or (b) there's a bug in their merge. The former seems more likely.
You can look at recently inserted value if you have one, or do a temporary insert of a new row (with merge) to see if it will create a row successfully, and if so whether you already have ID values higher than the one it allocates for that new row. If there are no higher values already, and insert works and merge continues to fail consistently, then it sounds like something you'd need to raise with EXASolution.

What to join on

I have a table which associates an id to a latitude and a longitude.
For every id in that table, I'm trying find closest ids, and store them in another table with travel time, either if the route doesn't already exists or if the travel time is shorter (a route exists if there is an entry in transfers)
I'm currently using :
6371 * SQRT(POW( RADIANS(stop_lon - %lon) * COS(RADIANS(stop_lat + %lat)/2), 2) + POW(RADIANS(stop_lat - %lat), 2)) AS distance
To find this distance.
It does work pretty well, however I don't know what to join on (for the self join).
How should I do ?
Here 'SHOW CREATE TABLE' for the different tables which are usefull here :
CREATE TABLE `stops` (
`stop_id` int(10) NOT NULL,
`stop_name` varchar(100) NOT NULL,
`stop_desc` text,
`stop_lat` decimal(20,16) DEFAULT NULL,
`stop_lon` decimal(20,16) DEFAULT NULL,
PRIMARY KEY (`stop_id`),
FULLTEXT KEY `stop_name` (`stop_name`),
FULLTEXT KEY `stop_desc` (`stop_desc`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 PAGE_CHECKSUM=1
CREATE TABLE `transfers` (
`transfer_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`from_stop_id` int(10) NOT NULL,
`to_stop_id` int(10) NOT NULL,
`transfer_time` int(10) NOT NULL,
PRIMARY KEY (`transfer_id`),
UNIQUE KEY `transfer_id` (`transfer_id`),
KEY `to_stop_id` (`to_stop_id`),
KEY `from_stop_id` (`from_stop_id`)
) ENGINE=InnoDB AUTO_INCREMENT=81810 DEFAULT CHARSET=utf8 PAGE_CHECKSUM=1

Perhaps:
FROM transfers AS a
JOIN transfers AS b ON b.from_stop_id = to_stop_id
There is to be a third table? And it does not parallel either of the existing ones? Let me see if I have the right model: stops is like airports. transfers is like waiting in an airport for your next leg of a flight. But transfers fails to have the stop_id of itself; this is confusing. And the third_table would be the flight time/distance between stops?
Or maybe a transfer is just a flight from one airport to another? And there is no delay while waiting for the next leg?
Other notes:
PRIMARY KEY (`transfer_id`),
UNIQUE KEY `transfer_id` (`transfer_id`),
Since a PRIMARY KEY is a UNIQUE KEY, the latter is redundant (and wasteful); DROP it.
decimal(20,16) is overkill.
Datatype Bytes resolution
------------------ ----- --------------------------------
DECIMAL(6,4)/(7,4) 7 16 m 52 ft Houses/Businesses
DECIMAL(8,6)/(9,6) 9 16cm 1/2 ft Friends in a mall
DECIMAL(20,16) 20 microscopic

Why does this MySQL Create Table statement fail?

Using mySQLAdmin tool, I try to create a table. The tool generates the SQL statement, and then replorts a "Can't create table" with no other clue on what error it is!
Here it is :
CREATE TABLE `C121535_vubridge`.`Products` (
`pr_ID` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`pr_Name` VARCHAR(45) NOT NULL,
`pr_Type` VARCHAR(2) NOT NULL COMMENT 'H=Hand Series V=VuBridge software E=Event Subs S=Sponsoring',
`pr_AuthorID` INTEGER UNSIGNED COMMENT '= m_ID (for Bridge Hand Series',
`pr_SponsorID` INTEGER UNSIGNED NOT NULL,
`pr_DateCreation` DATETIME NOT NULL,
`pr_Price` FLOAT NOT NULL,
`pr_DescriptionText` TEXT,
`pr_Description` VARCHAR(245),
PRIMARY KEY (`pr_ID`),
CONSTRAINT `FK_prAuthor` FOREIGN KEY `FK_prAuthor` (`pr_AuthorID`)
REFERENCES `Members` (`m_ID`)
ON DELETE SET NULL
ON UPDATE NO ACTION,
CONSTRAINT `FK_Sponsor` FOREIGN KEY `FK_Sponsor` (`pr_SponsorID`)
REFERENCES `Members` (`m_ID`)
ON DELETE SET NULL
ON UPDATE NO ACTION
) ENGINE = InnoDB;
Can someone help?

The CREATE TABLE works for me if I omit the foreign key references:
CREATE TABLE `Products` (
`pr_ID` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`pr_Name` VARCHAR(45) NOT NULL,
`pr_Type` VARCHAR(2) NOT NULL COMMENT 'H=Hand Series V=VuBridge software E=Event Subs S=Sponsoring',
`pr_AuthorID` INTEGER UNSIGNED COMMENT '= m_ID (for Bridge Hand Series',
`pr_SponsorID` INTEGER UNSIGNED NOT NULL,
`pr_DateCreation` DATETIME NOT NULL,
`pr_Price` FLOAT NOT NULL,
`pr_DescriptionText` TEXT,
`pr_Description` VARCHAR(245),
PRIMARY KEY (`pr_ID`)
)
...so I'm inclined to believe that C121535_vubridge.MEMBERS does not already exist. C121535_vubridge.MEMBERS needs to be created before the CREATE TABLE statement for the PRODUCTS table is run.

Just split up the create table and try one part at the time. This way you should be able to identify a single line that it fails on.

I do note in the reference manual that if a symbol subclause is given for the CONSTRAINT clause (in your case, the back-quoted strings before FOREIGN KEY in each clause, FK_prAuthor and FK_Sponsor) have to be unique over the database. Are they? If not, that symbol can be omitted and InnoDB will assign then automatically.
Similarly, the tables your FKs refer to may not have the structure that this create statement expects.

Table sync and copy into other table

I have two tables. Table A and Table B. They are identical. Ever 10 min i need to check if there any changs happend (New and updated) to Table A and copy into Table B. And also enter in Table C if i see a differance and new.
I also need to log if there any new records in Table A to table B and Table C
Iam planning to do join and compare the records. If i do that i might miss the new records. Is there any better way to do this kind of sync. It has to be done in SQL i can not use any other tools like SSIS.

Here's what I came up with in making some simple tables in SQL:
# create some sample tables and data
DROP TABLE alpha;
DROP TABLE beta;
DROP TABLE charlie;
CREATE TABLE `alpha` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`data` VARCHAR(32) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1;
CREATE TABLE `beta` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`data` VARCHAR(32) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1;
CREATE TABLE `charlie` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`data` VARCHAR(32) DEFAULT NULL,
`type` VARCHAR(16) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1;
INSERT INTO alpha (data) VALUES ("a"), ("b"), ("c"), ("d"), ("e");
INSERT INTO beta (data) VALUES ("a"), ("b"), ("c");
# note new records of A, log in C
INSERT INTO charlie (data, type)
(SELECT data, "NEW"
FROM alpha
WHERE id NOT IN
(SELECT id
FROM beta));
# insert new records of A into B
INSERT INTO beta (data)
(SELECT data
FROM alpha
WHERE id NOT IN
(SELECT id
FROM beta));
# make a change in alpha only
UPDATE alpha
SET data = "x"
WHERE data = "c";
# note changed records of A, log in C
INSERT INTO charlie (data, type)
(SELECT alpha.data, "CHANGE"
FROM alpha, beta
WHERE alpha.data != beta.data
AND alpha.id = beta.id);
# update changed records of A in B
UPDATE beta, alpha
SET beta.data = alpha.data
WHERE alpha.data != beta.data
AND alpha.id = beta.id;
You would of course have to expand this for the type of data, number of fields, etc. but this is a basic concept if it helps.

It's a pity that you can't use SSIS (not allowed?) because it's built for this kind of thing. Anyway, using pure SQL you should be able to something like the following: if your tables have got a created/updated timestamp column, then you could query Table B for the highest one and get all records from table A with timestamps higher than that one.
If there's no timestamp to use, hopefully there's a PK like an int that can be used in the same way.
Hope that helps?
Valentino.

I would try using a trigger or transactional replication.

Hopefully you have a good unique key that is used in the tables. To get new records you can do the following:
SELECT * FROM tableA
WHERE NOT EXISTS( SELECT * FROM tableB WHERE pkey.tableA = pkey.TableB)

How do you setup Post Revisions/History Tracking with ORM?

I am trying to figure out how to setup a revisions system for posts and other content. I figured that would mean it would need to work with a basic belongs_to/has_one/has_many/has_many_though ORM (any good ORM should support this).
I was thinking a that I could have some tables like (with matching models)
[[POST]] (has_many (text) through (revisions)
id
title
[[Revisions]] (belongs_to posts/text)
id
post_id
text_id
date
[[TEXT]]
id
body
user_id
Where I could join THROUGH the revisions table to get the latest TEXT body. But I'm kind of foggy on how it will all work. Has anyone setup something like this?
Basically, I need to be able to load an article and request the latest content entry.
// Get the post row
$post = new Model_Post($id);
// Get the latest revision (JOIN through revisions to TEXT) and print that body.
$post->text->body;
Having the ability to shuffle back in time to previous revisions and removing revisions would also be a big help.
At any rate, these are just ideas of how I think that some kind of history tracking would work. I'm open to any form of tracking I just want to know what the best-practice is.
:EDIT:
It seems that moving forward, two tables seems to make the most sense. Since I plan to store two copies of text this will also help to save space. The first table posts will store the data of the current revision for fast reads without any joins. The posts body will be the value of the matching revision's text field - but processed through markdown/bbcode/tidy/etc. This will allow me to retain the original text (for the next edit) without having to store that text twice in one revision row (or having to re-parse it each time I display it).
So fetching will be be ORM friendly. Then for creates/updates I will have to handle revisions separately and then just update the post object with the new current revision values.
CREATE TABLE IF NOT EXISTS `posts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) unsigned DEFAULT NULL,
`allow_comments` tinyint(1) unsigned DEFAULT NULL,
`user_id` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`body` text NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `published` (`published`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `postsrevisions` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`is_current` tinyint(1) unsigned DEFAULT NULL,
`date` datetime NOT NULL,
`title` varchar(100) NOT NULL,
`text` text NOT NULL,
`image` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`),
KEY `is_current` (`is_current`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

Your Revisions table as you have shown it models a many-to-many relationship between Posts and Text. This is probably not what you want, unless a given row in Text may provide the content for multiple rows in Posts. This is not how most CMS architectures work.
You certainly don't need three tables. I have no idea why you think this is needed for 3NF. The point of 3NF is that an attribute should not depend on a non-key attribute, it doesn't say you should split into multiple tables needlessly.
So you might only need a one-to-many relationship between two tables: Posts and Revisions. That is, for each post, there can be multiple revisions, but a given revision applies to only one post. Others have suggested two alternatives for finding the current post:
A flag column in Revisions to note the current revision. Changing the current revision is as simple as changing the flag to true in the desired revision and to false to the formerly current revision.
A foreign key in Posts to the revision that is current for the given post. This is even simpler, because you can change the current revision in one update instead of two. But circular foreign key references can cause problems vis-a-vis backup & restore, cascading updates, etc.
You could even implement the revision system using a single table:
CREATE TABLE PostRevisions (
post_revision_id SERIAL PRIMARY KEY,
post_id INT NOT NULL,
is_current TINYINT NULL,
date DATE,
title VARCHAR(80) NOT NULL,
text TEXT NOT NULL,
UNIQUE KEY (post_id, is_current)
);
I'm not sure it's duplication to store the title with each revision, because the title could be revised as much as the text, couldn't it?
The column is_current should be either 1 or NULL. A unique constraint doesn't count NULLs, so you can have only one row where is_current is 1 and an unlimited number of rows where it's NULL.
This does require updating two rows to make a revision current, but you gain some simplicity by reducing the model to a single table. This is a great advantage when you're using an ORM.
You can create a view to simplify the common case of querying current posts:
CREATE VIEW Posts AS SELECT * FROM PostRevisions WHERE is_current = 1;
update: Re your updated question: I agree that proper relational design would encourage two tables so that you could make a few attributes of a Post invariant for all that post's revisions. But most ORM tools assume an entity exists in a single table, and ORM's are clumsy at joining rows from multiple tables to constitute a given entity. So I would say if using an ORM is a priority, you should store the posts and revisions in a single table. Sacrifice a little bit of relational correctness to support the assumptions of the ORM paradigm.
Another suggestion is to consider Dimensional Modeling. This is a school of database design to support OLAP and data warehousing. It uses denormalization judiciously, so you can usually organize data in a Star Schema. The main entity (the "Fact Table") is represented by a single table, so this would be a win for an ORM-centric application design.

You'd probably be better off in this case to put a CurrentTextID on your Post table to avoid having to figure out which revision is current (an alternative would be a flag on Revision, but I think a CurrentTextID on the post will give you easier queries).
With the CurrentTextID on the Post, your ORM should place a single property (CurrentText) on your Post class which would allow you to access the current text with essentially the statement you provided.
Your ORM should also give you some way to load the Revisions based on the Post; If you want more details about that then you should include information about which ORM you are using and how you have it configured.

I think two tables would suffice here. A post table and it's revisions. If you're not worried about duplicating data, a single table (de-normalized) could also work.

For anyone interested, here is how wordpress handles revisions using a single MySQL posts table.
CREATE TABLE IF NOT EXISTS `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_author` bigint(20) unsigned NOT NULL DEFAULT '0',
`post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_content` longtext NOT NULL,
`post_title` text NOT NULL,
`post_excerpt` text NOT NULL,
`post_status` varchar(20) NOT NULL DEFAULT 'publish',
`comment_status` varchar(20) NOT NULL DEFAULT 'open',
`ping_status` varchar(20) NOT NULL DEFAULT 'open',
`post_password` varchar(20) NOT NULL DEFAULT '',
`post_name` varchar(200) NOT NULL DEFAULT '',
`to_ping` text NOT NULL,
`pinged` text NOT NULL,
`post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`post_content_filtered` text NOT NULL,
`post_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
`guid` varchar(255) NOT NULL DEFAULT '',
`menu_order` int(11) NOT NULL DEFAULT '0',
`post_type` varchar(20) NOT NULL DEFAULT 'post',
`post_mime_type` varchar(100) NOT NULL DEFAULT '',
`comment_count` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `post_name` (`post_name`),
KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
KEY `post_parent` (`post_parent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas