How to use SQL - INSERT...ON DUPLICATE KEY UPDATE? - sql

I have a script which captures tweets and puts them into a database. I will be running the script on a cronjob and then displaying the tweets on my site from the database to prevent hitting the limit on the twitter API.
So I don't want to have duplicate tweets in my database, I understand I can use 'INSERT...ON DUPLICATE KEY UPDATE' to achieve this, but I don't quite understand how to use it.
My database structure is as follows.
Table - Hash
id (auto_increment)
tweet
user
user_url
And currently my SQL to insert is as follows:
$tweet = $clean_content[0];
$user_url = $clean_uri[0];
$user = $clean_name[0];
$query='INSERT INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';
mysql_query($query);
How would I correctly use 'INSERT...ON DUPLICATE KEY UPDATE' to insert only if it doesn't exist, and update if it does?
Thanks

you need some UNIQUE KEY on your table, if user_url is tweer_url, then this should fit (every tweet has a unique url, id would be better).
CREATE TABLE `hash` (
`user_url` ...,
...,
UNIQUE KEY `user_url` (`user_url`)
);
and its better to use INSERT IGNORE on your case
$query='INSERT IGNORE INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';
ON DUPLICATE KEY is useful when you need update existing row but you want to insert just once

Try using:
$query='INSERT INTO hash (tweet, user, user_url)
VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")
ON DUPLICATE KEY UPDATE tweet = VALUES(tweet)';

ON DUPLICATE KEY UPDATE doesn't seem to be the right solution here, as you don't want to update if the value is already in the table.
I would use Twitter's own unique Status ID field (which should be unique for each tweet) instead of your hash id. Add that as a field on your table, and define it as the primary key (or as a unique index.) Then use REPLACE INTO, including the status ID from Twitter.
This has the advantage that you can always track your record back to a unique Tweet on twitter, so you could easily get more information about the Tweet later if you need to.

Related

Storing logs in postgres database as text vs json type

Let's say we want to create a table to store logs of user activity in a database. I can think of 2 ways of doing this:
A table having a single row for each log entry that contains a log id, a foreign key to the user, and the log content. This way we will have a separate row for each activity that happens.
A table having a single row for the activity of each unique user(foreign key to the user) and a log id. We can have a json type column to store the logs associated with each user. Each time an activity occurs, we can get the associated log entry and update its JSON column by appending the new activity to it.
Approach 1 provides a clean way of adding new log entries without the need to update the old ones. But querying such a table to get the activity of a user would query the entire table.
Approach 2 adds complexity to adding a new user activity since we would have to fetch and update the JSON object but querying would just return a single row.
I need help to understand if one approach can be clearly advantageous over the other.
Databases are optimized to store and retrieve small rows from a big table. So go for the first solution. Indexes make joins like that fast.
Lumping all data for a user into a single JSON object won't make you happy: each update would have to read, modify and write the whole JSON, which is not efficient at all.
If you logs changes a lot, in terms of properties, I would create a table with:
log_id, user_id (fk) and log in json format with each row as one activity.
It won't be a performance problem if you index your table. In postgresql you can index on fields inside a json column.
Approach 2 will become slower to update after each update, as the column size grows. Also, querying will be more complex.
Also consider a logging framework that can parse semi-structured data into database columns, such as Serilog.
Otherwise I would also recommend your option '1', a single line per log with an index on the user_id, but suggest adding a timestamp to your columns so the query engine can sort on the order of events before having to parse the json itself for a timestamp:
CREATE TABLE user_log
(
log_id bigint, -- (PRIMARY KEY),
log_ts timestamp NOT NULL DEFAULT(now()),
user_id int NOT NULL, --REFERENCES users(user_id),
log_content json
);
CREATE INDEX ON user_log(user_id);
SELECT user_id, log_ts, log_content => 'action' AS user_action FROM user_log WHERE user_id = ? ORDER BY log_ts;

Enforce unique constraint only on some queries

Suppose I've got an accounts table, and in that table I've got a profile_id column. I'd like to be able to have multiple rows in the column with the same profile_id, but I'd also like to be able to say "insert iff no other row has this profile_id value." Normally, I'd use a unique constraint for this, but that offers me no way to override it when I expect the profile_id to exist.
Is there a way to (without race conditions) have an insert error or fail if a value already exists, or alternatively, have an insert ignore a constraint and enter values that don't match the constraint?
You could build a partial index; it is at least a partial solution to your problem. That is, define a column that says something like:
only_uniques_allowed
And then build the index:
create unique index unq_t_profile_id_uniques on (profile_id) where only_uniques_allowed;
Then, any row that has only_uniques_allowed set to true would have a unique profile_id among the rows with this setting.
However, you don't really want to enforce a constraint, so perhaps what you want to do is better done at the application level. Or perhaps it suggests that you need a different data model.

SQL Server Database value duplication

I have a user editing table in my admin page in my website.
I want to check if theres a duplication with the username and the email when i update a row. Every row prefers to a different user and has her own id. I want that if in a certain row there a username and email values they cant be duplicated (every id has its own stats). How can i check the duplication ? (I work with myadohelper)
Hope to a quick answer, thanks
The best way to do this is by setting up a unique constraint/index in the database.
alter table t add constraint unq_t_username_email on t(username, email);
An attempt to add a row that already exists will result in an error.

How to store all user settings in one database with a unique id

Im making an app where I have a server-side sql database to store the user settings of all users.
Im not sure how to make each user unique, so that the database knows who is who.
The database is storing these user data for each row: id, email, county, age and gender.
So im thinking the best way is to make the user unique to he/she's email - which is unique - so that the when the settings are updated or outputted, the sql knows what row to fetch.
How should I go about with this?
And how would i then output the right data to the right user?
An entity in the database should have a primary key. I understand that in your design the id field is going to be the primary key. Usually this is an auto-generated integer. This is called a surrogate key In this case you need to tell to the table that the email field must be unique as well. You can do that by creating a unique index for this field. The unique index will prevent the creation of two different users with the same email. Going with this approach you can query the table checking either for id or for email.
An alternative is to have natural key. In this case, email would be the primary key of your table, so you wouldn't have the id field. Going with this approach you can query the table checking either for email, which is the unique identifier of each user.

INSERT INTO a table from another table, but with the correct user ID for each comment a user adds

I have two tables - a 'users' and a 'comments'. They both have the column 'IDUsers'. When a user adds a comment I want their user ID to come from the 'users' table and go into the 'comments' table into the 'IDUsers' column from the 'IDUsers' column.
However, at the same time a comment is being added from the user - so I also am using the INSERT INTO for the new information. I'm also using ColdFusion - if that makes much difference.
Hope everyone understands what I'm saying....and thanks for the help.
What exactly is the problem? When a user is posting a comment, you know their user ID. So use that UserID to do the insert into the comments table.
If you're generating the user object on the fly - as a part of the same transaction (make sure it's a single transaction! It's important - synchronization issues!!!) - use something like mysql_insert_id() equivalent in ColdFusion (http://www.coldfusionmuse.com/index.cfm/2005/8/8/identity%20after%20insert). After you do the insert into the user table, get the ID of the inserted record; then use that ID as the foreign key into the comments record.
You will know the userId while a user is adding comments. Use that particular userId in the insert statement. And to make sure that only userId from users table comes in comments table, you can set userId in comments table as a foreign key from users table.