Do I need Indexing required in case of less data returned by DynamoDb - indexing

Say, the dynamo db has data of format:-
{
"id":"<id>",
"field-1":"<field-1-value>",
"field-2":"<field-2-value>",
"field-3":"<field-3-value>",
"field-4":"<field-4-value>",
"metadata":{
"subfield-1":"<subfield-1-value>",
"subfield-2":"<subfield-2-value>"
}
}
So, I have a partition key on id column and sort key on field-1 say. Now, say, I have a requirement that for the same id, if we want a search capability on subfield-1 value, so can that be easily done in Dynamo Db without creating any index. The max. number of rows that would be there for each id would be 70. So, looks like a small set of data.
Please let me know your views.

Yes, this can be achieved without index. You can use FilterExpression to filter the data i.e. metadata.subfield-1.
Example:
var params = {
TableName : 'yourTableName',
KeyConditionExpression : 'id = :idval',
FilterExpression : '#metadata = :subField1Val',
ExpressionAttributeNames : {
'#metadata' : 'metadata.subfield-1'
},
ExpressionAttributeValues : {
':idval' : '7',
'subField1Val' : 'somevalue'
}
};

Related

How to update an certain row value using gota dataframe?

I'm trying to parse data from certain server, and need to export it to excel or csv.
before I export, I need to do some post processing such as merging values between parsed data.
for example,
There are two series out of all data.
series#1 - {Name: "MATH", Student:"Zay",Id:"MATH-123", Date:"12/25/2022", Status:"Good"}
series#2 - {Name: "MATH", Student:"Zay",Id:"MATH-124", Date:"12/26/2022", Status:"Bad"}
What I want to do is,
I want to update series#1's Status to
{Name: "MATH", Student:"Zay", Id:"MATH-123,MATH-124", Date:"12/25/2022,12/26/2022", Status:"Bad"}
Id, Date ==> combining with ","
Status ==> changing to latest result
Now I'm using Filter method of DataFrames,
type MyDataSet struct{
Name string
Student string
Id string
Date string
Status string
}
totalDF:=series1_result //overall result dataframe
df := dataframe.LoadStructs(series2_result) //new dataframe which needs to be compared to previous data `totalDF`
length := df.Nrow()
for i:=0;i<length;i++ {
name:=df.Subset(i).Col("Name")
student:=df.Subset(i).Col("Student")
query:= totalDF.Filter(
dataframe.F {
Colname:"Name",
Comparator:series.Eq,
Comparando:name,
},
).Filter(
dataframe.F {
Colname:"Student",
Comparator:series.Eq,
Comparando:student,
}
)
if query.Nrow()==0 {
totalDF = totalDF.Concat(df.Subset(i))
} else {
newDF:= dataframe.LoadStructs([]MyDataSet{
{
Name:query.Col("Name").String(),
Student:query.Col("Student").String(),
Id:query.Col("Id").String()+","+df.Subset(i).Col("Id").String(),
Date:query.Col("Date").String()+","+df.Subset(i).Col("Date").String(),
Status:df.Subset(i).Col("Status").String(),
}
})
query.Set(
series.Ints([]int{0}, newDF) //It's not updated as query was not an pointer
)
}
}
Even though I updated values on result of query, it's not updated on totalDF
How can I query the data from totalDF and update the data on that totalDF?
How can I get index number of Filtered item using Filter function?
Should I implement search function instead of using Filter function?
I would really appreciate it if you could help me.
Thanks everyone!
Merry Christmas!
*I tried to find out from official docs.
*But every method returns Value, not Pointer.

Filtering with Like operator on integer column

I'm using mikro-orm for db related opeartions. My db entity has a number field:
#Property({ defaultRaw: 'srNumber', type: 'number' })
srNumber!: number;
and corresponding db column (Postgresql) is:
srNumber(int8)
The query input for where param in mikro-orm EntityRepository's findAndCount(where, option) is:
repository.findAndCount({"srNumber":{"$like":"%1000%"}}, options)
It translates to:
select * from table1 where srNumber like '%1000%'
The problem here is since srNumber column is not a string, there is a type-mismatch and query fails. Casting it like CAST(srNumber AS TEXT) like '%1000%' should work in db.
Is there any way to somehow specify the field casting here?
You can use custom SQL fragments in the query. To get around strictly typed FilterQuery, you can use expr which is just an identity function (returns its parameter), so have effect only for TS checks.
Something like this should work:
import { expr } from '#mikro-orm/core';
const res = await repo.findAndCount({
[expr('cast(srNumber as text)')]: { $like: '%1000%' },
}, options);
https://mikro-orm.io/docs/entity-manager/#using-custom-sql-fragments

updating a value in an array in mongodb from java

I have couple of documens in mongodb as follow:
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Date" : ISODate("2014-10-20T04:00:00.000Z"),
"Type" : "Twitter",
"Entities" : [
{
"ID" : 4,
"Name" : "test1",
"Sentiment" : {
"Value" : 20,
"Neutral" : 1
}
},
{
"ID" : 5,
"Name" : "test5",
"Sentiment" : {
"Value" : 10,
"Neutral" : 1
}
}
]
}
Now I want to update the document that has Entities.ID=4 by adding (Sentiment.Value+4)/2 for example in the above example after update we have 12.
I wrote the following code but I am stuck in the if statement as you can see:
DBCollection collectionG;
collectionG = db.getCollection("GraphDataCollection");
int entityID = 4;
String entityName = "test";
BasicDBObject queryingObject = new BasicDBObject();
queryingObject.put("Entities.ID", entityID);
DBCursor cursor = collectionG.find(queryingObject);
if (cursor.hasNext())
{
BasicDBObject existingDocument = new BasicDBObject("Entities.ID", entityID);
//not sure how to update the sentiment.value for entityid=4
}
First I thought I should unwind the Entities array first to get the value of sentiment but if I do that then how can I wind them again and update the document with the same format as it has now but with the new sentiment value ?
also I found the this link as well :
MongoDB - Update objects in a document's array (nested updating)
but I could not understand it since it is not written in java query,
can anyone explain how I can do this in java?
You need to do this in two steps:
Get all the _id of the records which contain a Entity with sentiment
value 4.
During the find, project only the entity sub document that has
matched the query, so that we can process it to consume only its
Sentiment.Value. Use the positional operator($) for this purpose.
Instead of hitting the database every time to update each matched
record, use the Bulk API, to queue up the updates and execute it
finally.
Create the Bulk operation Writer:
BulkWriteOperation bulk = col.initializeUnorderedBulkOperation();
Find all the records which contain the value 4 in its Entities.ID field. When you match documents against this query, you would get the whole document returned. But we do not want the whole document, we would like to have only the document's _id, so that we can update the same document using it, and the Entity element in the document that has its value as 4. There may be n other Entity documents, but they do not matter. So to get only the Entity element that matches the query we use the positional operator $.
DBObject find = new BasicDBObject("Entities.ID",4);
DBObject project = new BasicDBObject("Entities.$",1);
DBCursor cursor = col.find(find, project);
What the above could would return is the below document for example(since our example assumes only a single input document). If you notice, it contains only one Entity element that has matched our query.
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Entities" : [
{
"ID" : 4,
"Name" : "test1",
"Sentiment" : {
"Value" : 12,
"Neutral" : 1
}
}
]
}
Iterate each record to queue up for update:
while(cursor.hasNext()){
BasicDBObject doc = (BasicDBObject)cursor.next();
int curVal = ((BasicDBObject)
((BasicDBObject)((BasicDBList)doc.get("Entities")).
get(0)).get("Sentiment")).getInt("Value");
int updatedValue = (curVal+4)/2;
DBObject query = new BasicDBObject("_id",doc.get("_id"))
.append("Entities.ID",4);
DBObject update = new BasicDBObject("$set",
new BasicDBObject("Entities.$.Sentiment.Value",
updatedValue));
bulk.find(query).update(update);
}
Finally Update:
bulk.execute();
You need to do a find() and update() and not simply an update, because currently mongodb does not allow to reference a document field to retrieve its value, modify it and update it with a computed value, in a single update query.

RavenDB Get document count after BulkInsertOperations

I am using RavenDB to bulk load some documents. Is there a way to get the count of documents loaded into the database?
For insert operations I am doing:
BulkInsertOperation _bulk = docStore.BulkInsert(null,
new BulkInsertOptions{ CheckForUpdates = true});
foreach(MyDocument myDoc in docCollection)
_bulk.Store(myDoc);
_bulk.Dispose();
And right after that I call the following:
session.Query<MyDocument>().Count();
but I always get a number which is less than the count I see in raven studio.
By default, the query you are doing limits to a sane number of results, part of RavenDB's promise to be safe by default and not stream back millions of records.
In order to get the number of a specific type of document in yoru database, you need a special map-reduce index whose job it is to track the counts for each document type. Because this type of index deals directly with document metadata, it's easier to define this in Raven Studio instead of trying to create it with code.
The source for that index is in this question but I'll copy it here:
// Index Name: Raven/DocumentCollections
// Map Query
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
// Reduce Query
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
Then to access it in your code you would need a class that mimics the structure of the anonymous type created by both the Map and Reduce queries:
public class Collection
{
public string Name { get; set; }
public int Count { get; set; }
}
Then, as Ayende notes in the answer to the previously linked question, you can get results from the index like this:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x => x.Name == "MyDocument")
.FirstOrDefault();
Keep in mind, however, that indexes are updated asynchronously so after bulk-inserting a bunch of documents, the index may be stale. You can force it to wait by adding .Customize(x => x.WaitForNonStaleResults()) right after the .Query(...).
Raven Studio actually gets this data from the index Raven/DocumentsByEntityName which exists for every database, by sidestepping normal queries and getting metadata on the index. You can emulate that like this:
QueryResult result = docStore.DatabaseCommands.Query("Raven/DocumentsByEntityName",
new Raven.Abstractions.Data.IndexQuery
{
Query = "Tag:MyDocument",
PageSize = 0
},
includes: null,
metadataOnly: true);
var totalDocsOfType = result.TotalResults;
That QueryResult contains a lot of useful data:
{
Results: [ ],
Includes: [ ],
IsStale: false,
IndexTimestamp: "2013-11-08T15:51:25.6463491Z",
TotalResults: 3,
SkippedResults: 0,
IndexName: "Raven/DocumentsByEntityName",
IndexEtag: "01000000-0000-0040-0000-00000000000B",
ResultEtag: "BA222B85-627A-FABE-DC7C-3CBC968124DE",
Highlightings: { },
NonAuthoritativeInformation: false,
LastQueryTime: "2014-02-06T18:12:56.1990451Z",
DurationMilliseconds: 1
}
A lot of that is the same data you get on any query if you request statistics, like this:
RavenQueryStatistics stats;
Session.Query<Course>()
.Statistics(out stats)
// Rest of query

PouchDB Query like sql

with CouchDB is possible do queries "like" SQL. http://guide.couchdb.org/draft/cookbook.html says that
How you would do this in SQL:
SELECT field FROM table WHERE value="searchterm"
How you can do this in CouchDB:
Use case: get a result (which can be a record or set of records) associated with a key ("searchterm").
To look something up quickly, regardless of the storage mechanism, an index is needed. An index is a data structure optimized for quick search and retrieval. CouchDB’s map result is stored in such an index, which happens to be a B+ tree.
To look up a value by "searchterm", we need to put all values into the key of a view. All we need is a simple map function:
function(doc) {
if(doc.value) {
emit(doc.value, null);
}
}
This creates a list of documents that have a value field sorted by the data in the value field. To find all the records that match "searchterm", we query the view and specify the search term as a query parameter:
/database/_design/application/_view/viewname?key="searchterm"
how can I do this with PouchDB? the API provide methods to create temp view, but how I can personalize the get request with key="searchterm"?
You just add your attribute settings to the options object:
var searchterm = "boop";
db.query({map: function(doc) {
if(doc.value) {
emit(doc.value, null);
}
}, { key: searchterm }, function(err, res) { ... });
see http://pouchdb.com/api.html#query_database for more info
using regex
import PouchDB from 'pouchdb';
import PouchDBFind from 'pouchdb-find';
...
PouchDB.plugin(PouchDBFind)
const db = new PouchDB(dbName);
db.createIndex({index: {fields: ['description']}})
....
const {docs, warning} = await db.find({selector: { description: { $regex: /OVO/}}})