GSI on DynamoDB - indexing

GSI on DynamoDB - indexing

I have dynamoDB table for order processing with following structure.
I have few more properties and I want to filter orders based on the status. For now I used scan method to get orders based on the status. But I would like to implement query statement to achieve more efficiency during query execution for this. How can I implement query statement to filter orders based on the status property ?
Thank you

You need a GSI with STATUS as the partition key...
Then in the query, you tell it to use the GSI.
{
"TableName": "YourTable",
"IndexName": "StatusIndex",
"KeyConditionExpression": "status = :v_status",
"ExpressionAttributeValues": {
":v_status": {"S": "FILLED"}
},
}

Related

How to optimize EF query?

I have a query that returns 31,000 records but works very slowly with EF. I want to optimize this query. How should I optimize it?
[HttpGet]
public IActionResult SearchInTaminJobs(string term)
{
var query = _context.TaminJobs.Where(Ad => Ad.jobName.Contains(term)).Select(c =>
c.jobName + c.jobCode).ToList();
return Ok(query);
}
This query that I wrote goes and checks all the database records.But now I want a query to go and get a list of the first 20 records that are similar to jobName and not check all the database records

I believe it's not an EF problem. In your query you use contains and full table scan, i.e. you go through all records and for each record you check whether jobname contains a text fragment. And since it's a contains check you can't just build an index on this column. So your query is just slow. Also you return many records, which can make a contribution into the execution time too. Try to add more filtering before doing contains (by some indexed id or similar).
To be sure that your query is slow run this raw sql query in SSMS: select jobName + jobCode from TaminJobs where jobName like '%YOUR_TERM_HERE%'.
You can also try to limit the number of records which will be returned: _context.TaminJobs.Where(Ad => Ad.jobName.Contains(term)).Take(20)

Can I filter on a nested field as fast as on a top-level field?

I am not sure how to optimize table schema when using nested structs.
Imagine I have a table in BigQuery with the following schema:
USER
firstName: string
lastName: string
accountID: string
posts: [
{
title: string
body: string
postID: string
}
]
If I want to SELECT users who have a post with title = "Hello World!", will it be a much slower query than SELECTing users whose firstName = "Jose"? In other words, do I lose the speed benefits of columnar storage if I query a nested value?
Would it be better to create a separate table for each type of query? In other words, have a User table with nested Posts when I want to filter by the User's top level attributes, and also have a Post table with nested Users when I want to filter by Post attributes?

If I want to SELECT users who have a post with title = "Hello World!", will it be a much slower query than SELECTing users whose firstName = "Jose"?
No, it will not be much slower. Both will be equally slow. But please note: slow is quite relative notion - what one would consider fast - other would consider slow and vice verse. If you are looking for subseconds - BigQuery is not your choice! But if you are looking for seconds - you will get it and you will definitely enjoy power of BigQuery
In other words, do I lose the speed benefits of columnar storage if I query a nested value?
You actually leverage speed of columnar storage here, even for nested values
Would it be better to create a separate table for each type of query?
No, it will not be better - ideally (with BigQuery) you should keep your data as denormalized as you can. Obviously it is still up to you to have some level of normalization but cost of it will be JOINs performance and cost of redundantly stored data

Recommendation :
select
*
from
USER
where
exists(select 1 from unnest(posts) where title = 'Hello World!')
Comparison :
Filtering in nested structure is faster than creation of another POST table. This kind of strategy is also called denormalized table, you can check link below
Denormalization

Irrespective of the type of database, when filtering through a nested field (even in a columnar db system like BigQuery) you are essentially issuing an UNNEST statement to do any filtering from within the nested column. This means that you will be at least performing n x m operations (where n is number of rows and m is number of fields in your nested column).
For instance, to run your desired query, you will have to do:
select * from `mydataset.USERS`, unnest(posts) as x
where x.title = "Hello World!"
This being said, yes, the ideal way to have your data managed in a relational database system is to structure it accordingly. In your case you can always save the posts in a separate table, which can have the following structure:
select accountID, x.postID, x.title, x.body
from `mydataset.USERS`, UNNEST(posts) as x
And then use JOIN to get your desired data:
select U.accountID, P.postID, P.title, P.body
from `mydataset.USERS` U
join `mydataset.posts` P on U.accountID = P.accountID
where P.title = "Hello World!"
Hope it helps.

optimize sql scan query to get results from postgres db

I'm working on small sql logic
I've one table Messages contaning message_id, accountid as columns
Data is keep coming in this table with unique message id.
My target is to store these mesaages table data into another database. [From postgres(source) DB to postgres(destination) DB]
For this I have set up a ETL job. Which is helping me to transfer the data.
Here comes the problem, In postgres(source) DB where messages table is located, in that table message_id is not in sorted form. And data looks like this .....
And my etl job runs after in every half an hour, My motive is whenever etl job runs, takes the data from source db to destinaton db on the basis of message_id. In destination db, I'm having one stored procedure which helps me to get the max(message_id) from messages table and store that value in another table. So in ETL I use that value in query which I use to fire on source db for getting the data greater than message_id I got from destination db.
So its kind a load incremental process.using etl. But the query am using to get data from source db is like this http://prnt.sc/b3u5il
SELECT * FROM (SELECT * FROM MESSAGES ORDER BY message_id) as a WHERE message_id >"+context.vid+"
This query scans the all table every time it runs...so itakes so much time to execute. I'm getting my desired results. But is there any way so that I could perform this process in more faster way.
Can anyone help me to optimize this query (don't know whether its possible or not) ? or any other suggestions are welcome.
Thanks

The most efficient way to improve performance in your case is to a add a INDEX to your sort column in this case message_id for better performance.
In this way , your query will perform an index scan instead of a full table scan which hampers the performance.
You can create an index by using following statement:
CREATE INDEX index_name
ON table_name (column_name)

Yes.
If message_id is not the leading column in the primary key or a secondary index, then create an index:
... ON MESSAGES (message_id)
And eliminate the inline view:
SELECT m.*
FROM MESSAGES m
WHERE m.message_id > ?
ORDER BY m.message_id

Create a B-Tree Index :
You can adjust the ordering of a B-tree index by including the options ASC, DESC, NULLS FIRST, and/or NULLS LAST when creating the index; for example:
CREATE INDEX test2_info_nulls_low ON test2 (info NULLS FIRST);
CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);

Aggregation query from two tables

We have a schema:
stores:
{_id , name}
getways:
{_id, store_id }
txes:
{tx_id,getway_id}
A store has many getways, and getway has many txes.
We need count of txes for specific store.
SQL:
SELECT count
FROM txes
WHERE getway_id IN (SELECT _id
FROM getways
WHERE store_id = xxxx)
How can I write it in mongo query?
I write this query in jaspersoft studio mongo query.

I dont think that mongodb supports relational sub queries. You need to use Database references(Manual References or DBRefs) to accomplish your task.
Check out this link.
http://docs.mongodb.org/manual/reference/database-references/

Criteria API query generation

I have a pretty straight forward task to query table and filter using some parameter (this parameter is foreign key for other table).
As example Table1 contains following fields:
id, name , description, company_id;
I have a method that takes as input company_id (not company object) and returns all records from table1.
The criteria query looks as follows:
DetachedCriteria criteria = DetachedCriteria.forClass(Table1.class)
.add(Restrictions.eq("company.id", companyId));
The problem is that generated query is too complex because it joins a couple of tables to do this. And this query is not "production ready"
Is there any way to build criteria to have SQL query like this?:
SELECT * from table1 where company_id =?

I suppose you use EAGER instead of LAZY fetch on some your object mapping. If you don't actually need EAGER use LAZY and it should generate more simple query with your DetachedCriteria.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

GSI on DynamoDB - indexing

You need a GSI with STATUS as the partition key... Then in the query, you tell it to use the GSI. { "TableName": "YourTable", "IndexName": "StatusIndex", "KeyConditionExpression": "status = :v_status", "ExpressionAttributeValues": { ":v_status": {"S": "FILLED"} }, }

Related

How to optimize EF query?

Can I filter on a nested field as fast as on a top-level field?

optimize sql scan query to get results from postgres db

Aggregation query from two tables

Criteria API query generation

Categories

Resources