How to do Union Query using Jetbrains Exposed - kotlin

I am facing problem when trying to do union query. I have multiple database tables (sharded tables). I have a list of all the table names. I want to do a union query on all the tables by looping through the list and creating a union query.
Code example:
fun main() {
val tables = listOf("table_1", "table_2", "table_3")
var finalQuery: Query? = null;
for (table in tables) {
var query = Model.selectAll();
if(finalQuery == null){
finalQuery = query;
}else{
finalQuery.union(query);
}
}
println(finalQuery.get().toList())
}
The problem I am facing here is, at the end of the loop final query only has the value of the last query. So, basically the query object inside the loop is referencing the same memory every time.
I have tried to use clone() method but the result is same. Currently, I am doing separate queries to each table, so if there are 3 tables, I am doing 3 queries. I want to minimize it to one union call only.

The problem starts there:
finalQuery.union(query)
union() doesn't mutate the original query, it returns a new query.
This wouldn't work, though:
finalQuery = finalQuery.union(query)
That's because union() returns a Union, not a Query.
If you want to keep your approach, you can force cast:
finalQuery = finalQuery.union(query) as? Query
Sidenote:
I guess this
var query = Model.selectAll()
Is a typo, since by your logic, it should be
val query = table.selectAll()

Related

Efficient hash or something which represents a set of records

Imagine a table in SQL Server with following columns (this a simplification):
ID: int
Dimension1: int
Dimension2: int
Dimension3: string
Dimension4: string
...
Dimension30: string
Table can become large (millions of records). We often do queries like:
select ID from Table where Dimension1 = 1 and Dimension2 = 2
It is so easy for the SQL Server to get lost in criteria like this and pick the wrong query plan => performance problems.
I am wondering if there is some smart hash function or something similar which would allow efficiently search in the table like this? I.e. so when we want to find all records with Dimension1 and Dimension2 criteria we filter on some single and that would be able to understand what records to return.
If you do this really often and you have some memory to spare and you are able to catch all changes to dim1 and dim2 you could do it externally.
Totally untested code C++ code
using DimType = int;
using DimVec = std::vector<DimType>;
DimVec dim1, dim2; // prefilled and sorted
using intVit = DimVec ::const_iterator;
// in case we donĀ“t have std::span
using spanV = std::pair<intVit, intVit>;
spanV FindValue(const DimType& vec, DimType value) {
// is_sorted(vec)
return {std::lower_bound(vec.begin(), vec.end(), value),
std::upper_bound(vec.begin(), vec.end(), value) };
}
DimVec Intersection(spanV set1, spanV set2) {
// is_sorted(set1) && is_sorted(set2)
DimVec res;
res.resize(std::min(set1.size()), set2.size());
auto last=std::set_intersection (set1.first, set1.second, set2.first, set2.second, res.begin());
v.resize(last-res.begin());
return res;
}
DimVec Intersection(const DimVec& set1, DimVec& set2, DimType value1, DimType value2) {
return Intersection(FindValue(set1, value1), FindValue(set2, value2));
}
Add templates to generalize.
FindValue is O(lg N), set_intersection is O(N).
For this query:
select ID from Table where Dimension1 = 1 and Dimension2 = 2
You want an index on (Dimension1, Dimension2). The order is irrelevant and you can include ID as well.
I'm not sure what you mean by "get lost in criteria". I'm pretty sure SQL Server will identify the right index in this case, regardless of how many indexes you have.
What you might have is a more complicated WHERE clause. In particular if it contains more conditions -- particularly if those conditions are connected by OR -- then the indexing just might not work.
I suspect your actual query is more complicated, not that SQL Server is getting "confused".

Linq .Except behavior not as expected

I have two tables: tableA and tableB, both have an attribute "CommissionNumber" which contains strings in the form of D123456789 (one letter followed by a fixed number of digits).
I need to find the commissionnumbers in table A, that are not in table B.
In SQL, this could look like:
SELECT *
FROM tableA
WHERE CommissionNumber NOT IN
(
select CommissionNumber from tableB
)
Which gives me no results. However, if i try this:
var tableA= dbContext.tableA.Select(x => x.CommissionNumber).ToList();
var tableB= dbContext.tableB.Select(x => x.CommissionNumber).ToList();
IEnumerable<string> missingFiles = tableA.Except(tableB);
I get 92 hits. I don't understand what's wrong, my SQL query of the use of the .Except function.
Any ideas?
You have created two LINQ queries which retrieves all data from two tables into the memory and applied Except. It is wrong if you care about performance.
If you want to create SQL in the same way, LINQ query should be written accordingly. LINQ equivalent for IN operator is Contains.
var query = dbContext.tableA
.Where(x => !dbContext.tableB.Select(b => b.CommissionNumber).Contains(x.CommissionNumber));
Or by EXISTS which have analogue Any
var query = dbContext.tableA
.Where(x => !dbContext.tableB.Any(b => b.CommissionNumber == x.CommissionNumber));
Also the same result you can achieve by LEFT JOIN
var query =
from a in dbContext.tableA
join b in dbContext.tableB on a.CommissionNumber equals b.CommissionNumber into gj
from b in gj.DefaultIfEmpty()
where b.CommissionNumber == null
select a;
gsharp was on the right track. I had some case sensitivity issues in my data which was ignored in the native SQL query but taken seriously by EF core.

slow entity framework query , but fast Generated SQL

please consider this model
it's for a fitness center management app
ADHERANT is the members table
INSCRIPTION is the subscription table
SEANCE is the individual sessions table
the seance table contain very fews rows (around 7000)
now the query :
var q = from n in ctx.SEANCES
select new SeanceJournalType()
{
ID_ADHERANT = n.INSCRIPTION.INS_ID_ADHERANT,
ADH_NOM = n.INSCRIPTION.ADHERANT.ADH_NOM,
ADH_PRENOM = n.INSCRIPTION.ADHERANT.ADH_PRENOM,
ADH_PHOTO = n.INSCRIPTION.ADHERANT.ADH_PHOTO,
SEA_DEBUT = n.SEA_DEBUT
};
var h = q.ToList();
this take around 3 seconds wich is an eternity,
the same generated SQL query is almost instantaneous
SELECT
1 AS "C1",
"C"."INS_ID_ADHERANT" AS "INS_ID_ADHERANT",
"E"."ADH_NOM" AS "ADH_NOM",
"E"."ADH_PRENOM" AS "ADH_PRENOM",
"E"."ADH_PHOTO" AS "ADH_PHOTO",
"B"."SEA_DEBUT" AS "SEA_DEBUT"
FROM "TMP_SEANCES" AS "B"
LEFT OUTER JOIN "INSCRIPTIONS" AS "C" ON "B"."INS_ID_INSCRIPTION" = "C"."ID_INSCRIPTION"
LEFT OUTER JOIN "ADHERANTS" AS "E" ON "C"."INS_ID_ADHERANT" = "E"."ID_ADHERANT"
any idea on what's going on please, or how to fix that ?
thanks
it needs some research to optimize this :
if you neglect the data transfer from the db to the server then
as Ivan Stoev Suggested calling the ToList method is the expensive part
as for improving the performance it depends on your needs:
1.if you need add-delete functionality at the server side it is probably best to stick with the list
2.if no need for add-delete then consider ICollection ,, or even better
3.if you have more conditions which will customize the query even more best use IQuerable
customizing the query like selecting a single record based on a condition :
var q = from n in ctx.SEA.... // your query without ToList()
q.where(x=>"some condition") //let`s say x.Id=1
only one record will be transferred from the database to the server
but with the ToList Conversion all the records will be transferred to the server then the condition will be calculated
although it is not always the best to use IQuerable it depends on your business need
for more references check this and this

Create a LINQ for SQL query

I'm learning Linq and using MVC. I have written a SQL query which I need to convert to a LINQ query.
select TokenID,TokenAsset,packet from TokenTable where id = 6 and packet = ''
and TokenID not in (select TokenID from TokenTable where id=6 and packet <> '')
group by TokenID,TokenAsset,Packet
I kindly ask help to convert the above query to a LINQ query. I know that the SQL query isn't efficient. It would better if you can help me to fix it.
Try this one:
var result = Tokens.Where(x=>x.Id==6 &&
x.Packet=="" &&
!Tokens.Exists(y=>y.TokenID==x.TokenID &&
y.Id==6 &&
y.Packet!="")
)
.GroupBy(x=>x.ID)
.ThenGroupBy(x=>x.TokenAsset)
.ThenGroupBy(x=>x.Packet);
Note I suppose that collection Tokens holds all the tokens you have.
Firstly your SQL query can just be
select distinct TokenID, TokenAsset, packet
from TokenTable
where id = 6 and packet = ''
the group by is not that useful since there are no aggregated columns. All selected columns are in the group by clause. Use distinct to achieve the same.
the secondary AND condition for tokenid is also redundant. It is exclusive to the first condition and hence doesn't change the result.
use this LINQ query:
var results = dbcontext.TokenTables
.Where(t => t.id == 6 && t.Packet == "")
.Select(t => new { t.TokenId, t.TokenAsset, t.Packet }).Distinct();
project only columns you need for performant calls by avoiding extra data transfer.

Creating filter with SQL queries

I am trying to create a filter with SQL queries but am having trouble with numeric values linking to other tables.
Every time I try to link to another table, it takes the same record and repeats it for every element in the other table.
For example, here is query:
SELECT ELEMENTS.RID,TAXONOMIES.SHORT_DESCRIPTION,[type],ELEMENT_NAME,ELEMENT_ID,SUBSTITUTION_GROUPS.DESCRIPTION,namespace_prefix,datatype_localname
FROM ELEMENTS,SUBSTITUTION_GROUPS,TAXONOMIES,SCHEMAS,DATA_TYPES
WHERE ELEMENTS.TAXONOMY_ID = TAXONOMIES.RID AND ELEMENTS.ELEMENT_SCHEMA_ID = SCHEMAS.RID AND
ELEMENTS.DATA_TYPE_ID = DATA_TYPES.RID
AND ELEMENTS.SUBSTITUTION_GROUP_ID = 0
The last line is the actual filtering criteria.
Here is an example result:
There should only be ONE result (Item has an RID of 0). But it's repeating a copy of the one record for every result inside the substitution groups table (there's 4).
Here is my database schema for reference. The lines indicate relationships between tables and the circles indicate the values I want:
You're forgot to join between ELEMENTS and SUBSTITUTION_GROUPS in your query.
SELECT
ELEMENTS.RID,TAXONOMIES.SHORT_DESCRIPTION,[type],ELEMENT_NAME,ELEMENT_ID,SUBSTITUTION_GROUPS.DESCRIPTION,namespace_prefix,datatype_localname
FROM
ELEMENTS,SUBSTITUTION_GROUPS,TAXONOMIES,SCHEMAS,DATA_TYPES
WHERE
ELEMENTS.TAXONOMY_ID = TAXONOMIES.RID AND ELEMENTS.ELEMENT_SCHEMA_ID = SCHEMAS.RID
AND ELEMENTS.DATA_TYPE_ID = DATA_TYPES.RID
AND ELEMENTS.SUBSTITUTION_GROUP_ID = SUBSTITUTION_GROUPS.RID
AND ELEMENTS.SUBSTITUTION_GROUP_ID = 0