SQL to batch re-tag items - sql

I've got a MySQL database with typical schema for tagging items:
item (1->N) item_tag (N->1) tag
Each tag has a name and a count of how many items have that tag
ie:
item
(
item_id (UNIQUE KEY)
)
item_tag
(
item_id (NON-UNIQUE INDEXED),
tag_id (NON-UNIQUE INDEXED)
)
tag
(
tag_id (UNIQUE KEY)
name
count
)
I need to write a maintenance routine to batch re-tag one or more existing tags to a single new or existing other tag. I need to make sure that after the retag, no items have duplicate tags and I need to update the counts on each tag record to reflect the number of actual items using that tag.
Looking for suggestions on how to implement this efficiently...

if i understood you correctly then you could try something like this:
/* new tag/item table clustered PK optimised for group by tag_id
or tag_id = ? queries !! */
drop table if exists tag_item;
create table tag_item
(
tag_id smallint unsigned not null,
item_id int unsigned not null,
primary key (tag_id, item_id), -- clustered PK innodb only
key (item_id)
)
engine=innodb;
-- populate new table with distinct tag/items
insert ignore into tag_item
select tag_id, item_id from item_tag order by tag_id, item_id;
-- update counters
update tag inner join
(
select
tag_id,
count(*) as counter
from
tag_item
group by
tag_id
) c on tag.tag_id = c.tag_id
set
tag.counter = c.counter;

An index/constraint on the item_tag table can prevent duplicate tags; or create the table with a composite primary key using both item_id and tag_id.
As to the counts, drop the count column from the tag table and create a VIEW to get the results:
CREATE VIEW tag_counts AS SELECT tag_id, name, COUNT(*) AS count GROUP BY tag_id, name
Then your count is always up to date.

This is what I've got so far, which seems to work but I don't have enough data yet to know how well it performs. Comments welcome.
Some notes:
Had to add a unique id field to to the item_tags table get the duplicate tag cleanup working.
Added support for tag aliases so that there's a record of retagged tags.
I didn't mention this before but each item also has a published flag and only published items should affect the count field on tags.
The code uses C#, subsonic+linq + "coding horror", but is fairly self explanatory.
The code:
public static void Retag(string new_tag, List<string> old_tags)
{
// Check new tag name is valid
if (!Utils.IsValidTag(new_tag))
{
throw new RuleException("NewTag", string.Format("Invalid tag name - {0}", new_tag));
}
// Start a transaction
using (var scope = new SimpleTransactionScope(megDB.GetInstance().Provider))
{
// Get the new tag
var newTag = tag.SingleOrDefault(x => x.name == new_tag);
// If the new tag is an alias, remap to the alias instead
if (newTag != null && newTag.alias != null)
{
newTag = tag.SingleOrDefault(x => x.tag_id == newTag.alias.Value);
}
// Get the old tags
var oldTags = new List<tag>();
foreach (var old_tag in old_tags)
{
// Ignore same tag
if (string.Compare(old_tag, new_tag, true)==0)
continue;
var oldTag = tag.SingleOrDefault(x => x.name == old_tag);
if (oldTag != null)
oldTags.Add(oldTag);
}
// Redundant?
if (oldTags.Count == 0)
return;
// Simple rename?
if (oldTags.Count == 1 && newTag == null)
{
oldTags[0].name = new_tag;
oldTags[0].Save();
scope.Complete();
return;
}
// Create new tag?
if (newTag == null)
{
newTag = new tag();
newTag.name = new_tag;
newTag.Save();
}
// Build a comma separated list of old tag id's for use in sql 'IN' clause
var sql_old_tags = string.Join(",", (from t in oldTags select t.tag_id.ToString()).ToArray());
// Step 1 - Retag, allowing duplicates for now
var sql = #"
UPDATE item_tags
SET tag_id=#newtagid
WHERE tag_id IN (" + sql_old_tags + #");
";
// Step 2 - Delete the duplicates
sql += #"
DELETE t1
FROM item_tags t1, item_tags t2
WHERE t1.tag_id=t2.tag_id
AND t1.item_id=t2.item_id
AND t1.item_tag_id > t2.item_tag_id;
";
// Step 3 - Update the use count of the destination tag
sql += #"
UPDATE tags
SET tags.count=
(
SELECT COUNT(items.item_id)
FROM items
INNER JOIN item_tags ON item_tags.item_id = items.item_id
WHERE items.published=1 AND item_tags.tag_id=#newtagid
)
WHERE
tag_id=#newtagid;
";
// Step 4 - Zero the use counts of the old tags and alias the old tag to the new tag
sql += #"
UPDATE tags
SET tags.count=0,
alias=#newtagid
WHERE tag_id IN (" + sql_old_tags + #");
";
// Do it!
megDB.CodingHorror(sql, newTag.tag_id, newTag.tag_id, newTag.tag_id, newTag.tag_id).Execute();
scope.Complete();
}

Related

Trying to make query on condition

I read most of the solutions here with similar questions and it did not solve my problem and I cannot find anything online that can help me.
I am trying to make query on condition where user_id = session user_id but I get error when I make INNER join
ambiguous column name
for this
public List<CartModelClass>getCarts1(){
SQLiteDatabase db = getReadableDatabase();
SQLiteQueryBuilder qb = new SQLiteQueryBuilder();
String[] sqlSelect = { "ID" , "user_id", "food_id", "quantity", "price", "origin", "destination","description","company_name","search_id"};
String sqltable2 = "OrderDetails LEFT JOIN OrderDetails WHERE user_id LIKE '%%' ";
qb.setTables(sqltable2);
Cursor c = qb.query(db,sqlSelect, null, null ,null ,null ,null);
final List<CartModelClass> result = new ArrayList<>();
if (c.moveToFirst()) {
do {
result.add(new CartModelClass(
c.getString(c.getColumnIndex("user_id")),
c.getString(c.getColumnIndex("food_id")),
c.getString(c.getColumnIndex("quantity")),
c.getString(c.getColumnIndex("price")),
c.getString(c.getColumnIndex("origin")),
c.getString(c.getColumnIndex("destination")),
c.getString(c.getColumnIndex("description")),
c.getString(c.getColumnIndex("company_name")),
c.getString(c.getColumnIndex("search_id"))
));
} while (c.moveToNext());
}
return result;
}
so I changed InnerJoin and made it just table where user_id like"%%" but I only get the last user_id who added to cart and show all data for all users
I want to show only added cart for user_id = session user_id so i can use it in here
loadListFood
private void loadListFood(){
sessionManager= new SessionManager(getActivity());
final Hashmap<String, String> user = sessionManager.getUserDetail();
user.get(USER_ID);
listdata = new Database(this.getContext.getCarts1());
for(CartModelClass order : listdata)
user_id = order.getUser_id
if(user.get(USER_ID).equals(user_id)){
listdata = new Database(this.getContext()).getCarts();
adapter = new CartAdapter(listdata, this.getContext());
recyclerView.setAdapter(adapter);
int total = 0;
for (CartModelClass order : listdata) {
total += (Integer.parseInt(order.getPrice())) * (Integer.parseInt(order.getQuantity()));
Locale locale = new Locale("en", "US");
NumberFormat fmt = NumberFormat.getCurrencyInstance(locale);
txtTotalPrice.setText(fmt.format(total));
}
}else {
Toast.makeText(getContext(), "No Cart Added", Toast.LENGTH_LONG).show();
}
}
You are self joining the table OrderDetails.
In this case you must set aliases to both copies of the table, like:
OrderDetails as o1 LEFT JOIN OrderDetails as o2 ...
Now in the ON clause you must qualify the column names properly, like:
ON o1.user_id = o2.something
If you don't, you get that error message, because the column name user_id could belong to either of the 2 copies of the table.
Also:
What is session user_id? Is it a column name?
If it is then the problem is that it contains a space in its name.
Enclose it in square brackets, so the statemnet should be:
OrderDetails as o1 LEFT JOIN OrderDetails as o2
ON o1.user_id = o2.[session user_id]

Displaying different values from same table and column after comparing

public partial class UserProfile : System.Web.UI.Page
{
private static int _userId = 0 ;
protected void Page_Load(object sender, EventArgs e)
{
string FoB;
if (Session["user"] != null)
{
_userId = DataManager.GetUserId(Session["user"].ToString());
}
string connection = WebConfigurationManager.ConnectionStrings["Database"].ConnectionString;
SqlConnection conn = new SqlConnection(connection);
SqlCommand comm = new SqlCommand("SELECT * from dbo.Rating where UserID=_userId", conn);
SqlDataReader reader;
conn.Open();
reader = comm.ExecuteReader();
reader.Read();
FoB = reader["GenreID"].ToString();
if(FoB=="1" )
{
FB.Text = reader["RatingValue"].ToString();
};
}
while (reader.HasRows);
reader.Close();
conn.Close();
}
}
I have a table named rating. It has 4 columns
RatingId, UserID, GenreID,Rating value
I want to display rating value on a label based on the current user logged in and different rating value against different Genres. UserID and GenreID are foreign keys from table Genre and User.
Edit (comment)
CREATE TABLE [dbo].[Rating] (
[RatingID] INT IDENTITY (1, 1) NOT NULL,
[UserID] INT NULL, [GenreID] INT NOT NULL,
[RatingValue] INT NOT NULL,
PRIMARY KEY CLUSTERED ([RatingID] ASC),
CONSTRAINT [FK_Rating_Genre] FOREIGN KEY ([GenreID])
REFERENCES [dbo].[Genre] ([GenreID])
ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT [FK_Rating_User] FOREIGN KEY ([UserID])
REFERENCES [dbo].[User] ([UserID])
ON DELETE CASCADE ON UPDATE CASCADE );
I want to show 8 different rating values of 8 different genres by 1 single current user.
If I understand you correctly, you want to map your label to different columns, based on another, discriminating column:
using (var conn = new SqlConnection(connection))
using (var comm = new SqlCommand("SELECT * from dbo.Rating where UserID=#userId", conn))
{
comm.Parameters.AddWithValue("#userId", _userId);
conn.Open();
using (var reader = comm.ExecuteReader())
{
if (reader.HasRows && reader.Read())
{
FB.Text = reader["GenreID"].ToString() == "1"
? reader["RatingValue"].ToString();
: reader["SomeOtherColumn"].ToString();
}
}
}
If one or more of the columns to be mapped reside in another table other than Rating, you'll need to join to that table - we'll need to see your table structures to help you.
Edit, re displaying 8 Genres
I've assumed the User and Genre tables both have a column Name - join to these tables to look up the rating. The GROUP and MAX will eliminate any cases where the same user has more than one rating in the same Genre (switch the MAX to AVG or MIN if you need otherwise). Top 8 will restrict the genres. So adjust the Sql like so:
SELECT TOP 8 u.Name AS UserName, g.Name as GenreName, MAX(r.RatingValue) AS TopRating
FROM dbo.Rating r
INNER JOIN dbo.[User] u
ON r.UserId = u.UserID
INNER JOIN dbo.[Genre] g
ON r.GenreID = g.GenreID
WHERE UserID=#userId
GROUP BY u.Name, g.Name
ORDER BY g.Name;
Now, for user interface, you won't be able to display a table in a single label. The easiest would simply be to bind the result of the reader directly to a new GridView control on your WebForm
using (var reader = comm.ExecuteReader())
{
if (reader.HasRows)
{
gridView.DataSource = reader;
gridView.DataBind();
}
}
This will show a table with 3 columns matching the selected columns, and and up to 8 rows.

Relating ID's in relational table

I have these two insertion queries in Perl using the DBI module and DBD:mysql.
This one inserts fields url, html_extr_text, concord_file, and sys_time into table article:
my #fields = (qw(url html_extr_text concord_file sys_time));
my $fieldlist = join ", ", #fields;
my $field_placeholders = join ", ", map {'?'} #fields;
my $insert_query = qq{
INSERT INTO article ($fieldlist)
VALUES ($field_placeholders)
};
my $sth = $dbh->prepare($insert_query);
my $id_article;
my #id_articles;
foreach my $article_index (0 .. #output_concord_files_prepare) {
$field_placeholders = $sth->execute(
$url_prepare[$article_index],
$html_pages_files_extended[$article_index],
$output_concord_files_prepare[$article_index],
$sys_time_prepare[$article_index]);
$id_article = $dbh->last_insert_id(undef, undef, 'article', 'id_article');
push #id_articles, $id_article;
if ($field_placeholders != 1) {
die "Error inserting records, only [$field_placeholders] got inserted: " . $sth->insert->errstr;
}
}
print "#id_articles\n";
And this one inserts field event into table event:
#fields = (qw(event));
$fieldlist = join ", ", #fields;
$field_placeholders = join ", ", map {'?'} #fields;
$insert_query = qq{
INSERT INTO event ($fieldlist)
VALUES ($field_placeholders)
};
$sth = $dbh->prepare($insert_query);
my $id_event;
my #id_events;
foreach my $event_index (0 .. #event_prepare){
$field_placeholders = $sth->execute($event_prepare[$event_index]);
$id_event = $dbh->last_insert_id(undef, undef, 'event', 'id_event');
push #id_events, $id_event;
if ($field_placeholders != 1){
die "Error inserting records, only [$field_placeholders] got inserted: " . $sth->insert->errstr;
}
}
print "#id_events\n";
I'd like to create a third one-to-many relationship table. Because, one article contains multiple events, so I have this file :
output_concord/concord.0.txt -> earthquake
output_concord/concord.0.txt -> avalanche
output_concord/concord.0.txt -> snowfall
output_concord/concord.1.txt -> avalanche
output_concord/concord.1.txt -> rock fall
output_concord/concord.1.txt -> mud slide
output_concord/concord.4.txt -> avalanche
output_concord/concord.4.txt -> rochfall
output_concord/concord.4.txt -> topple
...
As you can see, I collect the IDs of each entry using the LAST_INSERT_ID. However I don't really know how to make the next step.
Using this file, how can I insert into a third table 'article_event_index' the ids of the two previous tables.
It would be something like this:
$create_query = qq{
create table article_event_index(
id_article int(10) NOT NULL,
id_event int(10) NOT NULL,
primary key (id_article, id_event),
foreign key (id_article) references article (id_article),
foreign key (id_event) references event (id_event)
)
};
$dbh->do($create_query);
Which will contain relationships following the pattern
1-1, 1-2, 1-3, 2-4, 3-5 ...
I'm a newbie to Perl and databases so it's hard to formulate what I want to do. I hope I was clear enough.
Something like this should do what you need (untested, but it does compile).
It starts by building Perl hashes to relate concord files to article IDs and events to event IDs. Then the file is read, and a pair of IDs is inserted into the new table for each relationship that can be found in the exisiting tables.
Note that the hashes are there only to avoid a long sequence of
SELECT id_article FROM article WHERE concord_file = ?
and
SELECT id_event FROM event WHERE event = ?
statements.
use strict;
use warnings;
use DBI;
use constant RELATIONSHIP_FILE => 'relationships.txt';
my $dbh = DBI->connect('DBI:mysql:database', 'user', 'pass')
or die $DBI::errstr;
$dbh->do('DROP TABLE IF EXISTS article_event_index');
$dbh->do(<< 'END_SQL');
CREATE TABLE article_event_index (
id_article INT(10) NOT NULL,
id_event INT(10) NOT NULL,
PRIMARY KEY (id_article, id_event),
FOREIGN KEY (id_article) REFERENCES article (id_article),
FOREIGN KEY (id_event) REFERENCES event (id_event)
)
END_SQL
my $articles = $dbh->selectall_hashref(
'SELECT id_article, concord_file FROM article',
'concord_file'
);
my $events = $dbh->selectall_hashref(
'SELECT id_event, event FROM event',
'event'
);
open my $fh, '<', RELATIONSHIP_FILE
or die sprintf qq{Unable to open "%s": %s}, RELATIONSHIP_FILE, $!;
my $insert_sth = $dbh->prepare('INSERT INTO article_event_index (id_article, id_event) VALUES (?, ?)');
while (<$fh>) {
chomp;
my ($concord_file, $event) = split /\s*->\s*/;
next unless defined $event;
unless (exists $articles->{$concord_file}) {
warn qq{No article record for concord file "$concord_file"};
next;
}
my $id_article = $articles->{$concord_file}{id_article};
unless (exists $events->{$event}) {
warn qq{No event record for event "$event"};
next;
}
my $id_event = $events->{$event}{id_event};
$insert_sth->execute($id_article, $id_event);
}

NHibernate Criteria API help needed please

I'm trying to create a criteria query that grabs "RejectedRecords" by useruploaded that are not flagged as being deleted or the Facility in the RejectedRecord is in a list of facilities that a user is assigned to (user.UserFacilities). I have the first part working fine (By User and Not Deleted) but I'm not sure how to add the OR clause to take records that are in the collection of user-facilities. In SQL it would look like:
SELECT *
FROM RejectedRecords
WHERE (UserUploaded = 1 AND IsDeleted = 0)
OR FacilityId IN (SELECT FacilityId FROM UserFacility WHERE UserId = 1)
Here's my attempt in C# (Not sure how to perform the subquery):
public IList<RejectedRecord> GetRejectedRecordsByUser(User u)
{
return base._session.CreateCriteria(typeof(RejectedRecord))
.Add(
(
Expression.Eq(RejectedRecord.MappingNames.UserUploaded, u)
&& Expression.Eq(RejectedRecord.MappingNames.IsDeleted, false)
)
)
.List<RejectedRecord>();
}
The Key is to use Disjunction and Conjunction combined with a Subquery.
var facilityIdQuery = DetachedCriteria.For<UserFacility>()
.Add(Expression.Eq("User.Id", u))
.SetProjection(Projections.Property("Facility.Id"));
var results = session.CreateCriteria<RejectedRecords>()
.Add(
Restrictions.Disjunction()
.Add(
Restrictions.And(
Restrictions.Eq(RejectedRecord.MappingNames.UserUploaded, u),
Restrictions.Eq(RejectedRecord.MappingNames.IsDeleted, false)
)
)
.Add(Subqueries.PropertyIn("FacilityId",facilityIdQuery))
).List();

How To Split Pipe-Delimited Column and insert each value into new table Once?

I have an old database with a gazillion records (more or less) that have a single tags column (with tags being pipe-delimited) that looks like so:
Breakfast
Breakfast|Brunch|Buffet|Burger|Cakes|Crepes|Deli|Dessert|Dim Sum|Fast Food|Fine Wine|Spirits|Kebab|Noodles|Organic|Pizza|Salad|Seafood|Steakhouse|Sushi|Tapas|Vegetarian
Breakfast|Brunch|Buffet|Burger|Deli|Dessert|Fast Food|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
Breakfast|Brunch|Buffet|Cakes|Crepes|Dessert|Fine Wine|Spirits|Salad|Seafood|Steakhouse|Tapas|Teahouse
Breakfast|Brunch|Burger|Crepes|Salad
Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Vegetarian
Breakfast|Brunch|Deli|Dessert|Organic|Salad
Breakfast|Brunch|Dessert|Dim Sum|Hot Pot|Seafood
Breakfast|Brunch|Dessert|Dim Sum|Seafood
Breakfast|Brunch|Dessert|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood
Breakfast|Brunch|Dessert|Fine Wine|Spirits|Salad|Vegetarian
Is there a way one could retrieve each tag and insert it into a new table tag_id | tag_nm using MySQL only?
Here is my attempt which uses PHP..., I imagine this could be more efficient with a clever MySQL query. I've placed the relationship part of it there too. There's no escaping and error checking.
$rs = mysql_query('SELECT `venue_id`, `tag` FROM `venue` AS a');
while ($row = mysql_fetch_array($rs)) {
$tag_array = explode('|',$row['tag']);
$venueid = $row['venue_id'];
foreach ($tag_array as $tag) {
$rs2 = mysql_query("SELECT `tag_id` FROM `tag` WHERE tag_nm = '$tag'");
$tagid = 0;
while ($row2 = mysql_fetch_array($rs2)) $tagid = $row2['tag_id'];
if (!$tagid) {
mysql_execute("INSERT INTO `tag` (`tag_nm`) VALUES ('$tag')");
$tagid = mysql_insert_id;
}
mysql_execute("INSERT INTO `venue_tag_rel` (`venue_id`, `tag_id`) VALUES ($venueid, $tagid)");
}
}
After finding there is no official split function I've solved the issue using only MySQL like so:
1: I created the function strSplit
CREATE FUNCTION strSplit(x varchar(21845), delim varchar(255), pos int) returns varchar(255)
return replace(
replace(
substring_index(x, delim, pos),
substring_index(x, delim, pos - 1),
''
),
delim,
''
);
Second I inserted the new tags into my new table (real names and collumns changed, to keep it simple)
INSERT IGNORE INTO tag (SELECT null, strSplit(`Tag`,'|',1) AS T FROM `old_venue` GROUP BY T)
Rinse and repeat increasing the pos by one for each collumn (in this case I had a maximum of 8 seperators)
Third to get the relationship
INSERT INTO `venue_tag_rel`
(Select a.`venue_id`, b.`tag_id` from `old_venue` a, `tag` b
WHERE
(
a.`Tag` LIKE CONCAT('%|',b.`tag_nm`)
OR a.`Tag` LIKE CONCAT(b.`tag_nm`,'|%')
OR a.`Tag` LIKE CONCAT(CONCAT('%|',b.`tag_nm`),'|%')
OR a.`Tag` LIKE b.`tag_nm`
)
)