Get and manipulate data from elasticsearch

Get and manipulate data from elasticsearch - asp.net-core

I am new to elasticsearch so I will need some help. Unfortunately, I didnt found the answer in other topics here on SO.
I have some .net core application which I inherited and now there is a need to implement some changes.
I already have a method of getting data from elasticsearch, but after getting them, I am not sure how to change it and use it in application.
To be precise, I need to parse first and last name and to remove special characters, specific serbian latin letters like "šđžčć" etc... I already have a method for this parsing written but not sure how to call it...
So, my question is can I and how can I do this?
What I have now is the following:
var result = await _elasticClient.SearchAsync<CachedUserEntity>(
s =>
s.Index(_aliasName)
.Query(q => andQuery));
CachedUserEntity, among others, contains property about FirstName and LastName.
Inside results.Documents, I am getting the data about FirstName and LastName from elasticsearch, but I am not sure how to access it in order to update it via aformentioned NameParser ...
Sorry if the question is too easy, not to say stupid :)

I wont use updateByQuery here, for some reasons. I would scroll on documents (i use matchAll on my exemple, you obviously need to replace it with your query), or, if you dont know how to identify documents to update, only update usefull documents in UpdateManyWithIndex/UpdateManyPartial function.
For performance, we have to update severals documents at once, so we use bulk/updateMany function.
You can use both solution, the classic update, or the second (partial update) with an object containing the targeteds fields.
On server sides, both solutions will have the same cost / performance.
var searchResponse = Client.Search<CachedUserEntity>(s => s
.Query(q => q
MatchAll()
)
.Scroll("10s")
);
while (searchResponse.Documents.Any())
{
List<CachedUserEntity> NewSearchResponse = RemoveChar(searchResponse);
UpdateManyWithIndex<CachedUserEntity>(NewSearchResponse, _aliasName);
searchResponse = Client.Scroll<Project>("2h", searchResponse.ScrollId);
}
public void UpdateManyWithIndex<C>(List<C> obj, string index) where C : class {
var bulkResponse = Client.Bulk(b => b
.Index(index).Refresh(Elasticsearch.Net.Refresh.WaitFor) // explicitly provide index name
.UpdateMany<C>(obj, (bu, d) => bu.Doc(d)));
}
Or, using partial update object
Note: in this case Indix is already set on my client (add .index if needed)
var searchResponse = Client.Search<CachedUserEntity>(s => s
.Query(q => q
MatchAll()
)
.Scroll("2h")
);
while (searchResponse.Documents.Any())
{
List<object> listPocoPartialObj = GetPocoPartialObjList(searchResponse);
UpdateManyPartial(listPocoPartialObj);
searchResponse = Client.Scroll<Project>("2h", searchResponse.ScrollId);
}
private List<object> GetPocoPartialObjList(List<CachedUserEntity> cachedList) {
List<object> listPoco = new List<object>();
//note if you dont have cachedList.Id, take a look at result.source, comments if needed
foreach (var eltCached in cachedList) {
listPoco.Add( new object() { Id = cachedList.Id, FirstName = YOURFIELDWITHOUTSPECIALCHAR, LastName = YOURSECONDFIELDWITHOUTSPECIALCHAR});
}
return listPoco;
}
public bool UpdateManyPartial(List<object> partialObj)
{
var bulkResponse = Client.Bulk(b => b
.Refresh(Elasticsearch.Net.Refresh.WaitFor)
.UpdateMany(partialObj, (bu, d) => bu.Doc(d))
);
if (!bulkResponse.IsValid)
{
GetErrorMsgs(bulkResponse);
}
return (bulkResponse?.IsValid == true);
}

Related

Update context in SQL Server from ASP.NET Core 2.2

_context.Update(v) ;
_context.SaveChanges();
When I use this code then SQL Server adds a new record instead of updating the
current context
[HttpPost]
public IActionResult PageVote(List<string> Sar)
{
string name_voter = ViewBag.getValue = TempData["Namevalue"];
int count = 0;
foreach (var item in Sar)
{
count = count + 1;
}
if (count == 6)
{
Vote v = new Vote()
{
VoteSarparast1 = Sar[0],
VoteSarparast2 = Sar[1],
VoteSarparast3 = Sar[2],
VoteSarparast4 = Sar[3],
VoteSarparast5 = Sar[4],
VoteSarparast6 = Sar[5],
};
var voter = _context.Votes.FirstOrDefault(u => u.Voter == name_voter && u.IsVoted == true);
if (voter == null)
{
v.IsVoted = true;
v.Voter = name_voter;
_context.Add(v);
_context.SaveChanges();
ViewBag.Greeting = "رای شما با موفقیت ثبت شد";
return RedirectToAction(nameof(end));
}
v.IsVoted = true;
v.Voter = name_voter;
_context.Update(v);
_context.SaveChanges();
return RedirectToAction(nameof(end));
}
else
{
return View(_context.Applicants.ToList());
}
}

You need to tell the DbContext about your entity. If you do var vote = new Vote() vote has no Id. The DbContext see this and thinks you want to Add a new entity, so it simply does that. The DbContext tracks all the entities that you load from it, but since this is just a new instance, it has no idea about it.
To actually perform an update, you have two options:
1 - Load the Vote from the database in some way; If you get an Id, use that to find it.
// Loads the current vote by its id (or whatever other field..)
var existingVote = context.Votes.Single(p => p.Id == id_from_param);
// Perform the changes you want..
existingVote.SomeField = "NewValue";
// Then call save normally.
context.SaveChanges();
2 - Or if you don't want to load it from Db, you have to manually tell the DbContext what to do:
// create a new "vote"...
var vote = new Vote
{
// Since it's an update, you must have the Id somehow.. so you must set it manually
Id = id_from_param,
// do the changes you want. Be careful, because this can cause data loss!
SomeField = "NewValue"
};
// This is you telling the DbContext: Hey, I control this entity.
// I know it exists in the DB and it's modified
context.Entry(vote).State = EntityState.Modified;
// Then call save normally.
context.SaveChanges();
Either of those two approaches should fix your issue, but I suggest you read a little bit more about how Entity Framework works. This is crucial for the success (and performance) of your apps. Especially option 2 above can cause many many issues. There's a reason why the DbContext keep track of entities, so you don't have to. It's very complicated and things can go south fast.
Some links for you:
ChangeTracker in Entity Framework Core
Working with Disconnected Entity Graph in Entity Framework Core

NHibernate check for same join in QueryOver

Using QueryOver I am creating a query like this
BulkActionItem bulkActionItemAlias1 = null;
BulkActionItem bulkActionItemAlias2 = null;
var query = GetSession().QueryOver<Student>(() => studentAlias)
.JoinAlias(() => studentAlias.BulkNotifications, () => bulkActionItemAlias1, NHibernate.SqlCommand.JoinType.LeftOuterJoin);
if (query.UnderlyingCriteria.GetCriteriaByAlias("bulkActionItemAlias2") == null
query = query.JoinAlias(() => studentAlias.BulkNotifications, () => bulkActionItemAlias2, NHibernate.SqlCommand.JoinType.LeftOuterJoin);
This will crash because I have the same join twice with different aliases. Is it possible to check if a join already exists on a query, even with a different alias?

I haven't found a built-in way to accomplish this. Typically I use an out parameter with extension methods to keep track of what tables are part of the query. For example:
bool joinedOnBulkNotifications;
BulkNotification notificationAlias = null;
Expression<Func<object>> aliasExpr = () => notificationAlias;
var query = GetSession().QueryOver<Student>(() => studentAlias)
.FilterByBulkNotificationStatus(
someCondition, aliasExpr, out joinedOnBulkNotifications);
public static class QueryExtensions
{
public static IQueryOver<Student, Student> FilterByBulkNotificationStatus(
this IQueryOver<Student, Student> query,
bool someCondition,
Expression<Func<object>> aliasExpr,
out bool joinedOnBulkNotifications)
{
joinedOnBulkNotifications = false;
if (someCondition)
{
joinedOnBulkNotifications = true;
query.JoinAlias(s => s.BulkNotifications, aliasExpr);
}
return query;
}
}
The issue is that you might need to reuse the alias you created later. You might be tempted to pass in a BulkNotification and use that, but this only works if the parameter name matches the name of the variable you pass to the extension method. NHibernate uses the name of the variable to create an alias name, so if these two do not match, you'll get an error. Because of this, you need to wrap the alias in an Expression and use that instead.
This isn't a very clean option, so I hope someone has a better solution.

Using boolean fields with Magento ORM

I am working on a backend edit page for my custom entity. I have almost everything working, including saving a bunch of different text fields. I have a problem, though, when trying to set the value of a boolean field.
I have tried:
$landingPage->setEnabled(1);
$landingPage->setEnabled(TRUE);
$landingPage->setEnabled(0);
$landingPage->setEnabled(FALSE);
None seem to persist a change to my database.
How are you supposed to set a boolean field using magento ORM?
edit
Looking at my database, mysql is storing the field as a tinyint(1), so magento may be seeing this as an int not a bool. Still can't get it to set though.

This topic has bring curiosity to me. Although it has been answered, I'd like to share what I've found though I didn't do intense tracing.
It doesn't matter whether the cache is enabled / disabled, the table schema will be cached.
It will be cached during save process.
Mage_Core_Model_Abstract -> save()
Mage_Core_Model_Resource_Db_Abstract -> save(Mage_Core_Model_Abstract $object)
Mage_Core_Model_Resource_Db_Abstract
public function save(Mage_Core_Model_Abstract $object)
{
...
//any conditional will eventually call for:
$this->_prepareDataForSave($object);
...
}
protected function _prepareDataForSave(Mage_Core_Model_Abstract $object)
{
return $this->_prepareDataForTable($object, $this->getMainTable());
}
Mage_Core_Model_Resource_Abstract
protected function _prepareDataForTable(Varien_Object $object, $table)
{
$data = array();
$fields = $this->_getWriteAdapter()->describeTable($table);
foreach (array_keys($fields) as $field) {
if ($object->hasData($field)) {
$fieldValue = $object->getData($field);
if ($fieldValue instanceof Zend_Db_Expr) {
$data[$field] = $fieldValue;
} else {
if (null !== $fieldValue) {
$fieldValue = $this->_prepareTableValueForSave($fieldValue, $fields[$field]['DATA_TYPE']);
$data[$field] = $this->_getWriteAdapter()->prepareColumnValue($fields[$field], $fieldValue);
} else if (!empty($fields[$field]['NULLABLE'])) {
$data[$field] = null;
}
}
}
}
return $data;
}
See the line: $fields = $this->_getWriteAdapter()->describeTable($table);
Varien_Db_Adapter_Pdo_Mysql
public function describeTable($tableName, $schemaName = null)
{
$cacheKey = $this->_getTableName($tableName, $schemaName);
$ddl = $this->loadDdlCache($cacheKey, self::DDL_DESCRIBE);
if ($ddl === false) {
$ddl = parent::describeTable($tableName, $schemaName);
/**
* Remove bug in some MySQL versions, when int-column without default value is described as:
* having default empty string value
*/
$affected = array('tinyint', 'smallint', 'mediumint', 'int', 'bigint');
foreach ($ddl as $key => $columnData) {
if (($columnData['DEFAULT'] === '') && (array_search($columnData['DATA_TYPE'], $affected) !== FALSE)) {
$ddl[$key]['DEFAULT'] = null;
}
}
$this->saveDdlCache($cacheKey, self::DDL_DESCRIBE, $ddl);
}
return $ddl;
}
As we can see:
$ddl = $this->loadDdlCache($cacheKey, self::DDL_DESCRIBE);
will try to load the schema from cache.
If the value is not exists: if ($ddl === false)
it will create one: $this->saveDdlCache($cacheKey, self::DDL_DESCRIBE, $ddl);
So the problem that occurred in this question will be happened if we ever save the model that is going to be altered (add column, etc).
Because it has ever been $model->save(), the schema will be cached.
Later after he add new column and "do saving", it will load the schema from cache (which is not containing the new column) and resulting as: the data for new column is failed to be saved in database

Delete var/cache/* - your DB schema is cached by Magento even though the new column is already added to the MySQL table.

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}

It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.

The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default

The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging

Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();

You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

Adding a index to collection using EF 4.1 and XAML (For a highscore table)

I have a webservice I call from a WP7 app. I get a list of high scores in a table (name/score).. What is the simpliest way to add a 3rd column on the far left which is simply the row?
Do I need to add a property to the entity? Is there someway to get the row #?
I tried these things below with no success..
[OperationContract]
public List<DMHighScore> GetScores()
{
using (var db = new DMModelContainer())
{
// return db.DMHighScores.ToList();
var collOrderedHighScoreItem = (from o in db.DMHighScores
orderby o.UserScore ascending
select new
{
o.UserName,
o.UserScore
}).Take(20);
var collOrderedHighScoreItem2 = collOrderedHighScoreItem.AsEnumerable().Select((x, i) => new DMHighScoreDTO
{
UserName = x.UserName,
UserScore = x.UserScore
}).ToList();
}
}
[DataContract]
public class DMHighScoreDTO
{
int Rank;
string UserName;
string UserScore;
}

So lets assume you want to load top 100 users in leaderboard and you want to have their rank included:
[OperationContract]
public List<ScoreDto> GetTop100()
{
// Linq to entities query
var query = (from u from context.Users
order by u.Score
select new
{
u.Name,
u.Score
}).Take(100);
// Linq to objects query from working on 100 records loaded from DB
// Select with index doesn't work in linq to entities
var data = query.AsEnumerable().Select((x, i) => new ScoreDto
{
Rank = i + 1,
Name = x.Name,
Score = x.Score
}).ToList();
return data;
}

what will the row number be used for? if this is for ordering might I suggest adding a column named Order, then map the column to your entity.
if you require a row index, you could also call the .ToList() on the query and fetch the index locations for each entity.
Edit:
you could add the Rank property and set it to Ignore. This will enable you to go through the collection set the rank with a simple for loop. This will also not be persisted in the database. It will also not have any required columns in the database.
It does add an extra iteration.
the other way to go about it. This would be to add the rank number in the generated UI and not in the data collection being used to bind.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get and manipulate data from elasticsearch - asp.net-core

Related

Update context in SQL Server from ASP.NET Core 2.2

NHibernate check for same join in QueryOver

Using boolean fields with Magento ORM

Proper Way to Retrieve More than 128 Documents with RavenDB

Adding a index to collection using EF 4.1 and XAML (For a highscore table)

Categories

Resources