Thanks for looking at my query. I have 20k+ unique identification id that I is provided by client, I want to look for all these id's in MongoDB using single query. I tried looking using $in but then it does not seems feasible to put all the 20K Id in $in and search. Is there a better version of achieving.
If the id field is indexed, an IN query should be very fast, but i don't think it is a good idea to perform a query with 20k ids in one time, as it may consume quite a lot of resources like memory, you can split the ids into multiple groups with a reasonable size and do the query separately and you still can perform the queries parallelly in application level.
Consider importing your 20k+ id into a collection(says using mongoimport etc). Then perform a $lookup from your root collection to the search collection. Depending on whether the $lookup result is empty array or not, you can proceed with your original operation that requires $in.
Here is Mongo playground for your reference.
They say there are no stupid questions, but this might be an exception.
I understand that BigQuery, being a columnar database, does a full table scan for any query over a specific column.
I also understand that query results can be cached or a named table can be created with the results of a query.
However I also see tabledata.list() in the documentation, and I'm unsure of how this fits in with query costs. Once a table is created from a query, am I free to access that table without cost through the API?
Let's say, for example, I run a query that is grouped by UserID, and I want to then present the results of that query to individual users based on that ID. As far as I understand there are two obvious ways of getting out the appropriate row for doing so.
I can write another query over the destination table with a WHERE userID=xxx clause
I can use the tabledata.list() endpoint to get all the (potentially paginated) data and get the appropriate row myself in my code
Where situation 1 would incur a query cost, and situation 2 would not? Am I getting this right?
Tabledata.list API is free as it actually does not use BigQuery Engine at all
so you are right for both 1 and 2
I'm writing a report that pulls relational data from two tables in the same database via ORM. In my case, I have a groups table and a users table. I want to get all of the users in a certain group so I take a reference to the group entity I'm interested in and get the users in that group with call to one of ColdFusion's auto-generated getter methods (getUsers):
group.getUsers();
This returns an array of User entities. Great! That's exactly what I want.
My question is:
What have I really returned here? Is this an array of ALL of the users in this group? Or does this list have a default limit on it?
Bonus questions: If there is no limit, what if there are thousands (or hundreds of thousands) of users? Will that simply take longer to return? If there is a limit, can I set the limit myself or can I perform any other type of filtering on the returned array? How would one do that?
Thanks!
I an trying to find a way to determine whether or not an SQL SELECT query A is prone to return a subset of the results returned by another query B. Furthermore, this needs to be acomplished from the queries alone, without having access to the respective result sets.
For example, the query SELECT * from employee WHERE salary >= 1000 will return a subset of the results of query SELECT * from employee. I need to find an automated way to perform this validation for any two queries A and B, without accessing the database that stores the data.
If it is unfeasable to achieve this without the aid of an RDBMS, we can assume that I have access to a local, but empty RDBMS, but with the data stored somewhere else. In addition, this check must be done in code, either using an algorithm or a library. The language I am using is Java, but other language will also do.
Many thanks in advance.
I don't know how deep you want to get into parsing queries, but basically you can say that there are two general ways of making a subset of a query (given that source table and projection(select) staying the same):
using where clause to add condition to row values
using having clause to add conditions to aggregated values
So you can say that if you have two objects that represent queries and say they look something close to this:
{
'select': { ... },
'from': {},
'where': {},
'orderby': {}
}
and they have select, from and orderby to be the same, but one have extra condition in the where clause , you have a subset.
One way you might be able to determine if a query is a subset of another is by examining their source tables. If you don't have access to the data itself, this can be tricky. This question references using Snowflake joins to generate database diagrams based on a query without having access to the data itself:
Generate table relationship diagram from existing schema (SQL Server)
If your query is 800 characters or less, the tool is free to use: https://snowflakejoins.com/index.html
I tested it out using the AdventureWorks database and these two queries:
SELECT * FROM HumanResources.Employee
SELECT * FROM HumanResources.Employee WHERE EmployeeID < 200
When I plugged both of them into the Snowflake Joins text editor, this is what was generated:
SnowflakeJoins DB Diagram example
Hope that helps.
Suppose I have two tables:
Group
(
id integer primary key,
someData1 text,
someData2 text
)
GroupMember
(
id integer primary key,
group_id foreign key to Group.id,
someData text
)
I'm aware that my SQL syntax is not correct :) Hopefully is clear enough. My problem is this: I want to load a group record and all the GroupMember records associated with that group. As I see it, there are two options.
A single query:
SELECT Group.id, Group.someData1, Group.someData2 GroupMember.id, GroupMember.someData
FROM Group INNER JOIN GroupMember ...
WHERE Group.id = 4;
Two queries:
SELECT id, someData2, someData2
FROM Group
WHERE id = 4;
SELECT id, someData
FROM GroupMember
WHERE group_id = 4;
The first solution has the advantage of only being one database round trip, but has the disadvantage of returning redundant data (All group data is duplicated for every group member)
The second solution returns no duplicate data but involves two round trips to the database.
What is preferable here? I suppose there's some threshold such that if the group sizes become sufficiently large, the cost of returning all the redundant data is going to be greater than the overhead involved with an additional database call. What other things should I be thinking about here?
Thanks,
Jordan
If you actually want the results joined, I believe it is always more efficient to do the joining at the server level. The SQL processor is designed to match sets of data.
If you really want the results of 2 sql statements, you can always send two statements in one batch separated by a semicolon, and get two resultsets back with one round trip to the DB.
How the data is finally used is an important and unknown factor.
I suggest the single query method for most applications. Proper indexing will keep the query more efficient than the two query method.
The single query method also has the benefit of remaining valid if you need to select more than one group.
If you are only ever going to be retreiving a single group record with each request to the database then i would go with the second option. If you are retrieving multiple group records and associated group member records, go with the join as it will be much quicker.
In general, it depends on what type of data you are trying to display.
If you are showing a single group and all its members, performance differences between the two options would be negligible.
If you are showing many groups and all of their members, the overhead of having to make a roundtrip to the database for each successive group will quickly outweigh any benefit you got from receiving a little less data.
Some other things you might want to consider in you reasoning
Result Set Size - For many groups and members, your result set size may become a limiting factor as the size to retrieve and keep it in memory increases. This is likely to occur with the second option. You may want to consider paging the data, so that you are only retrieving a certain subset at a time.
Lazy Loading - If you are only getting the members of some groups, or a user is requesting the members one group at a time, consider Lazy Loading. This means only making the additional query to get the group's members when needed. This makes sense only in certain use cases, but it can be much more effective than retrieving all data up front.
Depending on the type of database and your frontend application, you can return the results of two SQL statements on one trip (A stored procedure in SQL Server 2005 for example).
If you are creating a report that requires many fields from the Group table, you may not want the increased amount of data with the first query.
If this is some type of data entry app, you've probably already presented the Group data to the user, so they could fill in the group id on the where clause (or preferably via some parameter) and now they need the member results.
It really, really, really depends on what use you will make of the data.
For insatnce, if you were assembling a list of group members for a mail shot, and you need the group name for each letter you're going to send to a member, and you have no use for the Group level then the single joined query makes a lot of sense.
But if, say, you're coding a master-detail screen or report, with a page for each group and displaying information at both the Group and the Member levels then the two separate queries is probably most useful.
Unless you are retrieving quite large amounts of data (tens of thousands of groups with hundreds of memebers per group, or similar orders of magnitude) it is unlikely you are going to see much difference between performances of the two approaches.
On a simple query like this I would to try to perform it in one query. The overhead of two database calls will probably exceed the additional SQL processing time from the query.
A UNION clause will do this for you:
SELECT id, someData1, someData2
FROM Group
WHERE id = 4
UNION
SELECT id, someData, null
FROM GroupMember
WHERE group_id = 4;