Abstract view of how distinct queries are implemented in NoSQL - sql

I am developing a system using Google Data-Store, where there's a Kind - Posts and which has 2 properties
1. message (string)
2. hashtags (list)
I wanted to query the distinct hashtags with the number. For example
say The posts are
{
{
"message":"msg1",
"tags":["abc","cde","efr"]
},
{
"message":"msg2",
"tags":["abc,"efgh","efk"]
},
{
"message":"msg3",
"tags":["abc,"efgh","efr"]
}
}
The output should be
{
"abc":3
"cde":1
"efk":1
"efgh":2
"efr":2
}
But in NoSQL implementation Data-store I can't directly query this. In order to query I have to load all the messages and find distinct queries. It will be a time-consuming event.
But I have seen a distinct function db.collection.distinct() which I think might have optimize this problem. If It has to be done on any NoSQL what may be the optimum solution for this?

Unfortunately, projection queries with 'distinct on' will only return a single result per distinct value (https://cloud.google.com/datastore/docs/concepts/queries#projection_queries). It will not provide a count of each distinct value. You'll need to do the count yourself, but you can use a projection query to save cost by only returning the tag values instead of the full entities.

Related

CakePHP 3 ORM - Sorting based on two columns in two different related tables

Currently I'm trying to implement timeline functionality which requires to sort the created column in two related tables and update the parent table (in my case pictures) accordingly.
More specific, I have a pictures table which has many comments. I want to sort the pictures based on the most recent timestamp in the created column of both the comments and pictures table.
I have the following query which retrieves the necessary data but it isn't ordered properly:
public function getPicturesAndCommentsOfUser($userId){
return $this->find()
->contain([
'Comments' => function ($q){
return $q
->contain(['Users' => function ($q) {
return $q->select($this->select);
}])
->order(['Comments.created' => 'ASC']);
},
'Users' => function ($q) {
return $q->select($this->select);
},
'Albums'
])
->matching('Albums.Users', function ($q) use ($userId) {
return $q
->where(['Users.id' => $userId]);
});
}
My question is how to combine both the ordering of Pictures.created and Comments.created. I already tried to call the order function in both ->contain(['Comments']) and in the most outer part of the chain after the last matching call. I can't seem to figure out how to relate the two tables to each other so that I can sort on both of them.
Moreover, I read in other sources (like this one) that I could use an union statement but all the information I can find about that option is that it will work on unrelated tables, not related.
Anyone can give me some directions on how to solve this?
First of all your purpose should be clear. Which ordering has priority for you? Pictures.created or Comments.created? You can not sort the results based on both. When you do something like this:
$this->Comments->find()->order(['Pictures.created' => 'ASC'])->
order(['Comments.created' => 'ASC']);
The ordering is only guaranteed for Comments.created. If two comments had equal times then they are ordered by Pictures.created.
There's another thing to be considered. Build the query on the object that is more populated. If each Picture HasMany Comments try to build your query on Comments. I don't have any idea how the results get sorted when you are building the query on Pictures. Because the retrieved records are sorted not by Pictures.id but when the result is translated to object models, Comments are nested in Pictures and previous orderings get disposed.
And at last don't make things complicated. Using php logic in queries doesn't improve the performance, otherwise it decreases your code readability. So if you are about to perform php logic first retrieve all data then process it using several simple foreaches.

Query fast without search, slow with search, but with search fast in SSMS

I have this function that takes data from the database and also has search. The problem is that when I search with Entity framework it's slow, but if I use the same query I got from the log and use it in SSMS it's fast. I must also say that there are allot of movies, 388262. I also tried adding an index on title at movie, but didn't help.
Query I use in SSMS:
SELECT *
FROM Movie
WHERE title LIKE '%pirate%'
ORDER BY ##ROWCOUNT
OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
Entity code (_movieRepository.GetAll() returns Queryable not all movies):
public IActionResult Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetAll().AsNoTracking();
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.ToLower().Contains(vm.Search.ToLower()));
}
vm.TotalItemCount = query.Count();
vm.Movies = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize);
vm.PageSize = _pageSize;
return View(vm);
}
Caveat: I don't have much experience with the Entity framework.
However, you might find useful debugging tips available in the Entity Framework Performance Article from Simple talk. Looking at what you've posted you might be able to improve your query performance by:
Choosing only the specific column you're interested in (it sounds like you're only interested in querying for the 'Title' column).
Pay special attention to your data-types. You might want to convert your NVARCHAR variables to VARCHAR(40) (or some appropriate character limit)
try removing all of the ToLower() stuff,
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search)));
}
sql server (unlike c#) is not case sensitive by default (though you can configure it to be that way). Your query is forcing sql server to lower case every record in the table and then do the comparison.

Elastic search query filter

I am new in elastic search.
How I can convert the following SQL statement into elastic search query?
select sum(totaldevicecount),datasource from
(select distinct oskey,custkey,productkey,
timekey,totaldevicecount,datasource from es_reporting_data_new)
group by datasource;
Thanks
After simplifying your query to below
select sum(totaldevicecount),datasource from es_reporting_data_new group by datasource;
ES query would be
{
"aggs": {
"data_source": {
"terms": {
"field": "datasource"
},
"aggs": {
"total_device_count": {
"sum": {
"field": "totaldevicecount"
}
}
}
}
}
}
For more details see also Elastic Search Sum aggregation with group by and where condition.
terms query for distinct, filters for groupBy and aggregations for nested selects and sum if I remember correctly.
As answer is already provided. I'll add extra bit.
I would highly recommend reading elastic documenatation assuming you need to convert many queries like this.
As elastic query parameter is different from sql. So in depth understanding of elastic query param will help you convert SQL query if possible.
Logstash input support JDBC inputs. One way to execute the queries and load the resultset is supported by ELK. You can directly use the queries or preferred views for larger queries. Your output will be directly ingested into ELK.
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
The other way, if you want to query the indexes, you can writes queries directly in Dev tools like below:
POST _xpack/sql?format=txt
{
"query":"DESCRIBE \"indice-name-*\""
}
POST _xpack/sql
{
"query":"Select data1, data2 from \"indice-name-*\" where data1 = 'ABC' and ID = '11223333'"
}

Couchbase wildcard / variable keys in view

After doing some research and testing with Couchbase, I am getting some good results.
However, it seems strange that views must be created a head of time and are not very flexible.
Basically, if I have a view like this..
function(doc, meta) {
emit([doc.name, doc.location, doc.gender, doc.birthYear, doc.birthMonth], null);
}
And I want to query but different keys. Such as, maybe name = "John" and gender = "M"
It doesnt seem I can do startKey = ["John", {}, "M"], endKey = ["John", {}, "M", {}].
Similarly, what if I just want to filter the above by gender and birth month?
It seems i have to manually crate an individual view for every possible type of query, which with lots of data points if less than optimal.
I havnt seen any questions addressing this. Also, I looked into passing args to map or reduce to do any of it dynamically but that cant be done. I'd be stuck pulling ALL records across all group levels then having to manually sort/aggregate this data.
Can this be done?
Thank you
As of Couchbase version 4.x you have N1QL query language. You can specify a filter criteria to select your json objects without having any views in place.
So as per your example, you should be able issue a query like that:
SELECT *
FROM your_bucket_name
WHERE name = 'John' AND gender = 'M'
Here is a N1QL tutorial to get feel of it.
Yet another way is to use Couchbase integration with ElasticSearch and execute search query in ElasticSearch engine that will return you all the keys it found based on your search criteria.
http://www.couchbase.com/communities/n1ql
N1QL is more rich wuering language to couchbase data, which does not as limited as views

CakePHP query additions in controller

I am migrating raw PHP code to CakePHP and have some problems. As I have big problems with query to ORM transformation I temporary use raw SQL. All is going nice, but I met the ugly code and don't really know how to make it beautiful. I made DealersController and added function advanced($condition = null) (it will be called from AJAX with parameters 1-15 and 69). function looks like:
switch ($condition) {
case '1':
$cond_query = ' AND ( (d.email = \'\' OR d.email IS NULL) )';
break;
case '2':
$cond_query = ' AND (d.id IN (SELECT dealer_id FROM dealer_logo)';
break;
// There are many cases, some long, some like these two
}
if($user_group == 'group_1') {
$query = 'LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} elseif ($user_group == 'group_2'){
$query = 'A LITLE BIT DIFFERENT LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} else {
$query = 'A LITLE MORE BIT DIFFERENT LONG QUERY WITH 10+ TABLES JOINING' . $cond_query;
}
// THERE IS $this->Dealer->query($query); and so on
So.. As you see code looks ugly. I have two variants:
1) get out query addition and make model methods for every condition, then these conditions seperate to functions. But this is not DRY, because main 3 big queries is almost the same and if I will need to change something in one - I will need to change 16+ queries.
2) Make small reusable model methods/queries whitch will get out of DB small pieces of data, then don't use raw SQL but play with methods. It would be good, but the performance will be low and I need it as high as possible.
Please give me advice. Thank you!
If you're concerned about how CakePHP makes a database query for every joined table, you might find that the Linkable behaviour can help you reduce the number of queries (where the joins are simple associations on the one table).
Otherwise, I find that creating simple database querying methods at the Model level to get your smaller pieces of information, and then combining them afterwards, is a good approach. It allows you to clearly outline what your code does (through inline documentation). If you can migrate to using CakePHP's find methods instead of raw queries, you will be using the conditions array syntax. So one way you could approach your problem is to have public functions on your Model classes which append their appropriate conditions to an inputted conditions array. For example:
class SomeModel extends AppModel {
...
public function addEmailCondition(&$conditions) {
$conditions['OR'] = array(
'alias.email_address' => null,
'alias.email_address =' => ''
);
}
}
You would call these functions to build up one large conditions array which you can then use to retrieve the data you want from your controller (or from the model if you want to contain it all at the model layer). Note that in the above example, the conditions array is being passed by reference, so it can be edited in place. Also note that any existing 'OR' conditions in the array will be overwritten by this function: your real solution would have to be smarter in terms of merging your new conditions with any existing ones.
Don't worry about 'hypothetical' performance issues - if you've tried to queries and they're too slow, then you can worry about how to increase performance. But for starters, try to write the code as cleanly as possible.
You also might want to consider splitting up that function advanced() call into multiple Controller Actions that are grouped by the similarity of their condition query.
Finally, in case you haven't already checked it out, here's the Book's entry on retrieving data from models. There might be some tricks you hadn't seen before: http://book.cakephp.org/view/1017/Retrieving-Your-Data
If the base part of the query is the same, you could have a function to generate that part of the query, and then use other small functions to append the different where conditions, etc.