Apache Solr update without unique key - apache

I know how to make an Solr atomic update based on the document unique key. But i don't know if there is a possibility to update a bunch of documents based on another field (not the unique key).
Bellow there is an example of what i need:
For example i have the fields: id (unique key), name, status. I want to update
the "name" in all documents where "status" is X.
Can i do that or i am forced to use the unique key?
Thanks.

You cannot do that - the unique key is required as it will update only 1 document at a time. From a previous discussion:
That is not a feature available in Solr.
You can update a full document or do a partial update of a single
document based on its unique key
http://lucene.472066.n3.nabble.com/Update-multiple-documents-in-one-query-td4070337.html
As discussed in that thread you would probably need to write a script that would pull each document up and issue the atomic update separately.

Related

Saving change history

Background:
I am trying to solve one simple problem. I have a database with two tables, one stores text (this is something like articles), and the other stores the category to which this text belongs. Users can make changes to the text, and I need to save who and when made the changes, also when saving changes, the user writes a comment on his changes, which I also save.
As I have done now:
I added another table to which I save everything related to changes, who made the changes and when, as well as a comment on the changes, and the ID of the text to which the changes apply.
What is the problem:
Deleting the text also needs to be recorded in history, but since in the records with history there is a foreign key with a check, then I have to delete the entire history that is associated with the text, so as not to get an error.
What I have tried else:
I tried to add an attribute to the table with the text "Deleted", and the row is not physically deleted, but the "Deleted" = 1 flag is simply set, and at the same time I can save the history, and even the moment of deletion. But there is another problem, the table with the text has an attribute "Name", which must be unique, and if the record is not physically deleted, then when I try to insert a new record with the value "Name", which already exists, I get a uniqueness error, although the old record with such a name is considered remote.
Question:
What are the approaches to solving the problem, in which it is possible to save the history of changes in another table, even after deleting records from the main table, and at the same time keep the uniqueness of some attributes of the main table and maintain data integrity.
I would be grateful for any options and hints.
A good practice is to use a unique identifier such as a UUID as the primary key for your primary record (ie. your text record). That way, you can safely soft delete the primary record and any associated metadata can be kept without fear of collisions in the future.
If you need to enforce uniqueness of certain attributes (such as the Name you mentioned) you can create a secondary index (non-clustered index in SQL terminology) on that column in the table and then, when performing the soft delete you can set the Name to NULL and record the old Name value in some other column. For SQL Server (since 2008), in order to allow multiple NULL values in a unique index you need to created what they call a filtered index where you explicitly say you want to ignore NULL values.
In other words, you schema would consist of something like this:
a UUID as primary key for the text record
change metadata would have a foreign key relation to text record via the UUID
a Name column with a non-clustered UNIQUE index
a DeletedName column that will store the Name when record is deleted
a Deleted bit column that can be NULL for non-deleted records and set to 1 for deleted
When you do a soft-delete, you would execute an atomic transaction that would:
set the DeletedName = Name
set Name = NULL (so as not to break the UNIQUE index)
mark record as deleted by setting Deleted = 1
There are other ways too but this one seems to be easily achievable based on what you already have.
In my opinion, you can do it in one of two ways:
Using the tables corresponding to the main table, which includes the action field, and using the delete , insert , update trigger of main tables for filling.
ArticlesTable(Id,Name) -> AuditArticlesTable(Id,Name,Action,User,ModifiedDate)
You can use the Filtered unique index (https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes?view=sql-server-ver15) on the “Name” field to solving your issue on adding same name when exists another instance as deleted record

Multiple filters in Rows.Find()

I have a code where I am trying to delete records by pulling records from database and then updating them with Delete Flag set to "Y". I am facing issues in discarding the previously deleted items to come up in my search.
This is what I am using to get the table rows -
Datatableadapter.getData().Rows.Find(ID.Text)
This searches on the Primary field of the table automatically. Now i want to add delete flag filter also to the search criteria. Pls suggest what to do.
Find is meant when searching by the entity key or composite key. Use the Where to search by additional criteria. You can either use both conditions in your where, or use Find so you leverage the clustered index (ideally), and then where to enforce your business rule that the element has not been deleted.

How do I create a unique public id column which is not primary key

I read somewhere that it is bad to use your db table's primary key as a public identifier online. However, I would like my users to link to a specific object in the table.
How do I create a unique identifier column to my table that is non-related to the primary key (which is a auto-increment integer)?
My initial idea is to use a php script to generate random hexadecimal values of suitable length (there will be about 100 000-200 000 items i the table at most I think) and then inserting them. But then I don't know if it would be unique...
You can use a GUID (Globally Unique IDentifier) to uniquely identify a record. The number of possible GUIDs is so high the chances of duplicating one is next to nothing. Similarly, the chances of someone guessing the GUID is so low that generally they are safe to display to the user (for example www.yoursite.com?id=21EC20203AEA1069A2DD08002B30309D).
If you're using php you can use the com_create_guid method. *Note: This method is only supported in PHP5. For PHP4, look at uniqueid.

Rails & Postgresql create a field that auto increments from 1

I'm looking to create a field called id_visual in my table orders which starts at 1 and auto increments from there. I could create a method in my model to do it but I thought there must be a better more foolproof way. Any ideas?
From what I can tell, you want a secondary id based on the primary id? The identity key can only be table based and can not be dependent on another key. You will have to do this in code and then save it to a new field on before_create. The easiest way to do this is for each order that you want to id, get the count of all orders less than or equal to the one you are working with based on whatever the primary key is. Its a simple one query calculation.
This is something your database should be providing at some level, either in a transaction or with some other locking.
Take a look at this question for some ways to get postgres configured to auto-increment a column:
PostgreSQL Autoincrement

How to use SQL - INSERT...ON DUPLICATE KEY UPDATE?

I have a script which captures tweets and puts them into a database. I will be running the script on a cronjob and then displaying the tweets on my site from the database to prevent hitting the limit on the twitter API.
So I don't want to have duplicate tweets in my database, I understand I can use 'INSERT...ON DUPLICATE KEY UPDATE' to achieve this, but I don't quite understand how to use it.
My database structure is as follows.
Table - Hash
id (auto_increment)
tweet
user
user_url
And currently my SQL to insert is as follows:
$tweet = $clean_content[0];
$user_url = $clean_uri[0];
$user = $clean_name[0];
$query='INSERT INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';
mysql_query($query);
How would I correctly use 'INSERT...ON DUPLICATE KEY UPDATE' to insert only if it doesn't exist, and update if it does?
Thanks
you need some UNIQUE KEY on your table, if user_url is tweer_url, then this should fit (every tweet has a unique url, id would be better).
CREATE TABLE `hash` (
`user_url` ...,
...,
UNIQUE KEY `user_url` (`user_url`)
);
and its better to use INSERT IGNORE on your case
$query='INSERT IGNORE INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';
ON DUPLICATE KEY is useful when you need update existing row but you want to insert just once
Try using:
$query='INSERT INTO hash (tweet, user, user_url)
VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")
ON DUPLICATE KEY UPDATE tweet = VALUES(tweet)';
ON DUPLICATE KEY UPDATE doesn't seem to be the right solution here, as you don't want to update if the value is already in the table.
I would use Twitter's own unique Status ID field (which should be unique for each tweet) instead of your hash id. Add that as a field on your table, and define it as the primary key (or as a unique index.) Then use REPLACE INTO, including the status ID from Twitter.
This has the advantage that you can always track your record back to a unique Tweet on twitter, so you could easily get more information about the Tweet later if you need to.