SQLite3 query slow in Swift - sql

The following SQLite3 statement is very slow in my Swift-Code (see below)
Any idea why ??
Currently, it takes approx 30 seconds from breakpoint-1 to breakpoint-2
// .. breakpoint-1
while ((sqlite3_step(statement) == SQLITE_ROW) {
// .. breakpoint-2
}
My SQLite-db is 450 MB in size and the query is quite complex.
I am simply not sure if this "while-loop" cannot be made faster somehow ??
Here is the Query:
var query1: String = ""
query1 =
"""
SELECT DISTINCT st.departure_time as time, t.trip_headsign, r.route_desc, t.trip_id
FROM stops s
INNER JOIN calendar c ON t.service_id = c.service_id
INNER JOIN stop_times st ON st.stop_id = s.stop_id
INNER JOIN trips t ON t.trip_id = st.trip_id
INNER JOIN routes r ON r.route_id = t.route_id
WHERE c.\(weekDay) = 1
AND st.departure_time > "\(departureTime)"
AND st.stop_id LIKE '\(stop_id)%'
AND s.stop_name <> t.trip_headsign
ORDER BY st.departure_time ASC
LIMIT 80
"""
Also, after maddy's input, I did create a new DB - this time "indexed" (see on which columns below..) . After indexing - the size of the DB increased to double its original size.
But with the above query and the newly indexed DB, the speed is unfortunately still identically slow (> 30 sec).
How would I need to change the query after having an "indexed" DB ???
How can I increase query-speed ???
These are the indexed columns:
CREATE INDEX departure_time_IDX ON stop_times(departure_time);
CREATE INDEX stop_times_id_IDX ON stop_times(stop_id);
CREATE INDEX stop_times_trip_id_IDX ON stop_times(trip_id);
CREATE INDEX stop_name_IDX ON stops(stop_name);
CREATE INDEX stop_id_IDX ON stops(stop_id);
CREATE INDEX trip_headsign_IDX ON trips(trip_headsign);
CREATE INDEX trip_id_IDX ON trips(trip_id);
CREATE INDEX trip_route_id_IDX ON trips(route_id);
CREATE INDEX service_id_IDX ON trips(service_id);
CREATE INDEX route_desc_IDX ON routes(route_desc);
CREATE INDEX route_id_IDX ON routes(route_id);
CREATE INDEX cal_service_id_IDX ON calendar(service_id);
CREATE INDEX monday_IDX ON calendar(monday);
CREATE INDEX tuesday_IDX ON calendar(tuesday);
CREATE INDEX wednesday_IDX ON calendar(wednesday);
CREATE INDEX thursday_IDX ON calendar(thursday);
CREATE INDEX friday_IDX ON calendar(friday);
CREATE INDEX saturday_IDX ON calendar(saturday);
CREATE INDEX sunday_IDX ON calendar(sunday);
Here is the EXPLAIN QUERY PLAN
I have the query down to 3 seconds if I eliminate the INNER JOIN's of trips and routes. Of course, I need those two for the result I expect out of this. But at least it shows that the INNER JOIN's of those two extra tables are slowing things down.
Also, it has to be noticed that stop_times is the biggest table of all !
(then comes trips - and the others are all smaller...)
The 3-sec query looks like this :
var faster_query: String = ""
faster_query =
"""
SELECT DISTINCT st.departure_time as time
FROM stops s
INNER JOIN stop_times st ON st.stop_id = s.stop_id
WHERE st.departure_time > "19:51:00"
AND st.stop_id LIKE '8505000%'
ORDER BY st.departure_time ASC
LIMIT 80
"""
The EXPLAIN QUERY PLAN for this faster_query looks like this:
Here is the whole code for the SQLite3 query
// Open SQLite database
var db: OpaquePointer? = nil
if let path = self.departure_filePath?.path {
if sqlite3_open(path, &db) == SQLITE_OK {
var statement: OpaquePointer? = nil
// Run SELECT query from db
if sqlite3_prepare_v2(db, query1, -1, &statement, nil) == SQLITE_OK {
// Loop through all results from query1
// .. breakpoint-1
while ((sqlite3_step(statement) == SQLITE_ROW) {
// .. breakpoint-2
let ID = sqlite3_column_text(statement, 0)
if ID != nil {
IDString = String(cString:ID!)
} else {
print("ID not found", terminator: "")
return nil
}
}
}
}
}

Related

How to get the number of rows returned in a query before fetching the rows?

I need to perform a query and store the result in an array, but I have to know the number of rows to build the array.
The first idea is to append the row count to each row:
with truequery as (
select ...
)
select q.*, (select count(*) from truequery)
from truequery q
The second idea is to move the cursor to the end and return to the beginning. The example is in Java
PreparedStatement stmt = conn.prepareStatement(query, ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
//...
ResultSet rs = stmt.executeQuery();
rs.last();
int size = rs.getRow();
rs.beforeFirst();
Entity[] list = new Entity[size];
for (int i=0; rs.next(); i++){
list[i] = new Entity(rs.getString(), ...);
}
If the query has order by or group by, the RDBMS is likely to know the number of rows before returning the first row.
You can store all the entities in a java.util.List and then convert it to an array. For example:
PreparedStatement stmt = conn.prepareStatement(query,
ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stmt.executeQuery();
List<Entity> l = new ArrayList<Entity>();
while (rs.next()) {
l.add(new Entity(rs.getString(), ...));
}
Entity[] list = l.toArray(new Entity[0]);
The overhead of the List should be minimal, since you want to load the whole query in memory anyway.

Methods to read hbase table using spark scala

I am using catalog method to read data from hbase and store it into dataframe using method described here Read HBase table with where clause using Spark,
but I am wondering if there is any other efficient way to this
problem statement is :
scan hbase table_a
scan hbase table_b(mapping table)
check if col_1 value present in table_b, if yes get the parent_id from mapping table
if not then check col_2 present in table_b, if yes then get the parent_id from mapping table
save the result in file.
I am able to do this using above method but as i am using join like below
select * from a join b where (case when a.duns is null then a.ig else a.duns end) = b.rowkey
it takes forever
please help
import org.apache.hadoop.hbase.{HBaseConfiguration,
HTableDescriptor,HColumnDescriptor,HConstants,TableName,CellUtil}
import org.apache.hadoop.hbase.client.{HBaseAdmin,
Result,Put,HTable,ConnectionFactory,Connection,Get,Scan}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.util.Bytes
val hconf = HBaseConfiguration.create()
hconf.set("hbase.zookee per.quorum","localhost")
hconf.set("hbase.zookeeper.property.clientPort","2181")
val admin = new HBaseAdmin(hconf)
val hconn=ConnectionFactory.createConnection(hconf)
var tabName_string= admin.getTableNames("student")(0) // enter table name
val table = new HTable(hconf,tabName_string) // create table connection
var data= table.get(new Get(Bytes.toBytes("row-id97"))) // row ID
def getHBaseRowData (x: org.apache.hadoop.hbase.Cell, hint: Int )= {
if(hint == 1){
((Bytes.toString(x.getRow())), Bytes.toString(CellUtil.cloneQualifier(x)))
} else if(hint == 2) {
((Bytes.toString(x.getRow())),Bytes.toString(CellUtil.cloneValue(x)))
} else if(hint == 3) {
((Bytes.toString(x.getRow())),Bytes.toString(CellUtil.cloneFamily(x)))
} else if(hint == 4) {
((Bytes.toString(x.getRow())),(Bytes.toString(CellUtil.cloneQualifier(x))), (Bytes.toString(CellUtil.cloneFamily(x))), (Bytes.toString(CellUtil.cloneValue(x))))
} else
("Wrong Hint")
}
data.rawCells().foreach(x=> println(getHBaseRowData(x,4)))

SQL performance linq indexes

I have a SQL database where i store routes. I store routeinformation in one table and the coordinates in another table.
Right now i have around 40 routes with 50k coordinates if you sum all routes.
I use the following LINQ code to get the data
var query = (from b in db.routes select new {
name = b.name,
id = b.route_id,
coor = b.coordinates.Select(c => new
{
seq = c.sequence,
lat = c.position.Latitude,
lon = c.position.Longitude
}) });
This query takes 4.5sec to execute, i find that to be kind.
I'm new to indexes, right now both primary-keys are clustered indexes, and primary keys are normal(??)-indexes that i created with the following SQL-command:
CREATE INDEX IX_route on [db].[coordinates] (route_id)
Is my database slow or is this normal for this amount of data?
If you can deal with your results flattened, this query might give you better performance:
var query = from b in db.routes
from c in b.coordinates
select new
{
name = b.name,
id = b.route_id,
coor = new
{
seq = c.sequence,
lat = c.position.Latitude,
lon = c.position.Longitude
}
};

Problems translating an SQL query to LINQ in VS Lightswitch

So, I am having an issue with an SQL query that is translated LINQ (and works - tested), but that same LINQ query does not work in Lightswitch. Of course I did not expect to work straight out, but I am struggling to properly convert it.
So here is a image of the tables that I base my query on:
http://dl.dropbox.com/u/46287356/tables.PNG
(sorry for outside link, but not enough rep points :))
The SQL query is the following:
SELECT WorkingUnits.Name AS WUName, ContractPositions.WUInstanceId,
Materials.Cost, BillingValues.Value, BillingValues.PricePerUnit
FROM WorkingUnits
INNER JOIN
Materials ON WorkingUnits.Id = Materials.Material_WorkingUnit
INNER JOIN ContractPositions ON
Materials.Id = ContractPositions.ContractPosition_Material
INNER JOIN BillingValues ON
ContractPositions.Id = BillingValues.BillingValue_ContractPosition
Now, I have transformed this to LINQ in the following way:
var query = from wu in this.DataWorkspace.ApplicationData.WorkingUnits
join m in this.DataWorkspace.ApplicationData.Materials on
new { Id = WorkingUnits.Id } equals new { Id = m.Material_WorkingUnit }
join cp in this.DataWorkspace.ApplicationData.ContractPositions on
new { Id = m.Id } equals new { Id = cp.ContractPosition_Material }
join bv in this.DataWorkspace.ApplicationData.BillingValues on
new { Id = cp.Id } equals new { Id = bv.BillingValue_ContractPosition }
select new
{
usage = bv.Value * bv.PricePerUnit,
totalCost = (bv.Value * bv.PricePerUnit) * m.Cost,
amount = (bv.Value*bv.PricePerUnit) * m.Cost / wu.WUPrice
};
Notice that I have changed a few things - like section of colums, as I do not need that in Lightswitch.
So while this works agains the SQL server, Lightswitch complains that I must consider explicitly specifying the type of the range variable 'WorkingUnits'.
I tried to cast it, but then there are other errors such as:
'int' does not contain a definition for 'Id' and no extension method 'Id'
accepting a first argument of type 'int' could be found (are you missing
a using directive or an assembly reference?)
So my questions is, how do I properly convert that query and expect it to work?
Also, If we take that my database is setup correctly, do I even need to use 'joins' in the LINQ?
Any ideas are appreciated!
try something like this
var query = from wu in this.DataWorkspace.ApplicationData.WorkingUnits
join m in this.DataWorkspace.ApplicationData.Materials on
wu.Id equals m.WorkingUnitID }
join cp in this.DataWorkspace.ApplicationData.ContractPositions on
m.Id equals cp.ContractPosition_Material
join bv in this.DataWorkspace.ApplicationData.BillingValues on
cp.Id equals bv.BillingValue_ContractPosition
select new
{
usage = bv.Value * bv.PricePerUnit,
totalCost = (bv.Value * bv.PricePerUnit) * m.Cost,
amount = (bv.Value*bv.PricePerUnit) * m.Cost / wu.WUPrice
};
What about starting at the bottom (BillingValues) and working your way up using the entity references?
eg
var query = from bv in this.DataWorkspace.ApplicationData.BillingValues
let m = bv.ContractPosition.Material
let wu = m.WorkingUnit
select new
{
usage = bv.Value * bv.PricePerUnit,
totalCost = (bv.Value * bv.PricePerUnit) * m.Cost,
amount = (bv.Value*bv.PricePerUnit) * m.Cost / wu.WUPrice
};

I have some quite simple SQL that I can trying to change to either Linq or LLBLGEN

I want to do something like this...
SELECT DISTINCT T1.*
FROM T1
INNER JOIN T2 ON T2.ID1 = T1.ID1
INNER JOIN T3 ON T3.ID2 = T2.ID2
--FOLLOWING CAN BE ADDED MULTIPLE TIMES (LOOPS IN C#?)
INNER JOIN T2 AS T2A ON T3.ID2 = T2A.ID2
INNER JOIN T1 AS T1A ON T1A.ID1 = T2A.ID1
--END MULTI
WHERE T1.ID1 = 1
AND T3.ID3 = 2
AND T3.ID4 = 3
--THE FOLLOWING CONDITIONS WILL ALSO BE FOR EVERY SET OF EXTRA JOINS (LOOPS IN C#?)
AND T1A.ID1 = 4
AND T1I.ID5 = 5
--END MULTI
...in either Linq or LLBLGen Code. Any help would be greatly appreciated!
Here is the LLBGen I have so far...
IPredicateExpression filter = new PredicateExpression();
filter.Add(ProductTypeOptionAttributeFields.OptionId == dl.Key);
filter.AddWithAnd(ProductTypeOptionAttributeCombinationFields.ProductTypeId == DataSource.DataItem.ProductTypeId);
filter.AddWithAnd(ProductTypeOptionAttributeCombinationFields.ProductId == DataSource.ProductID);
bucket.PredicateExpression.Add(filter);
bucket.Relations.Add(ProductTypeOptionAttributeEntity.Relations.ProductTypeOptionAttributeCombinationProfileEntityUsingProductTypeOptionAttributeId, JoinHint.Inner);
bucket.Relations.Add(ProductTypeOptionAttributeCombinationProfileEntity.Relations.ProductTypeOptionAttributeCombinationEntityUsingProductTypeOptionAttributeCombinationId, JoinHint.Inner);
var filtered = _dropdowns.Where(k => ((DropDownList)k.Value[1]).SelectedValue != "-1" && k.Key != dl.Key);
foreach (var filteredDdl in filtered)
{
IPredicateExpression subFilter = new PredicateExpression();
subFilter.AddWithAnd(ProductTypeOptionAttributeFields.AttributeId == int.Parse(((DropDownList)filteredDdl.Value[1]).SelectedValue));
subFilter.AddWithAnd(ProductTypeOptionAttributeFields.OptionId == filteredDdl.Key);
bucket.PredicateExpression.AddWithAnd(subFilter);
}
ProductTypeOptionAttributeCollection attrs = new ProductTypeOptionAttributeCollection();
attrs.GetMulti(bucket.PredicateExpression, -1, null, bucket.Relations);
And here is the actual query I want...
SELECT DISTINCT PTOA.*
FROM ProductTypeOptionAttribute AS PTOA
INNER JOIN ProductTypeOPtionAttributeCombinationProfile AS PTOACP ON PTOACP.ProductTypeOPtionAttributeID = PTOA.AttributeID
INNER JOIN ProductTypeOPtionAttributeCombination AS PTOAC ON PTOAC.CombinationID = PTOACP.ProductTypeOptionAttributeCombinationID
--FOLLOWING CAN BE ADDED MULTIPLE TIMES (LOOPS IN C#?)
INNER JOIN ProductTypeOPtionAttributeCombinationProfile AS PTOACP2 ON PTOAC.CombinationID = PTOACP2.ProductTypeOptionAttributeCombinationID
INNER JOIN ProductTypeOPtionAttribute AS PTOA2 ON PTOACP2.ProductTypeOPtionAttributeID = PTOA2.AttributeID
--END MULTI
WHERE PTOA.OptionID = 59
AND PTOAC.ProductTypeID = 11
AND PTOAC.ProductID = 218
--THE FOLLOWING CONDITIONS WILL ALSO BE FOR EVERY SET OF EXTRA JOINS (LOOPS IN C#?)
AND PTOA2.AttributeID = 42
AND PTOA2.OptionID = 58
--END MULTI
Cheers
LLBLGen Tips:
Use SQL Server Profiler to view the SQL emitted (set a breakpoint right after your call to GetMulti, then watch your trace)
You have a lot of complicated UI login and cast/converts that could fail -- my personal preference would be to move those to separate code
You don't need JoinHint.Inner as that is the default
(Personal preference) use a RelationCollection instead of the bucket.
I don't entirely understand your situation (especially the multiple joins to the same table?), but this may work. I think you want subFilter.AddWithOr instead of .AddWithAnd.
IPredicateExpression filter = new PredicateExpression();
filter.Add(ProductTypeOptionAttributeFields.OptionId == dl.Key);
filter.AddWithAnd(ProductTypeOptionAttributeCombinationFields.ProductTypeId == DataSource.DataItem.ProductTypeId);
filter.AddWithAnd(ProductTypeOptionAttributeCombinationFields.ProductId == DataSource.ProductID);
IRelationCollection relations = new RelationCollection();
relations.Add(ProductTypeOptionAttributeEntity.Relations.ProductTypeOptionAttributeCombinationProfileEntityUsingProductTypeOptionAttributeId);
relations.Add(ProductTypeOptionAttributeCombinationProfileEntity.Relations.ProductTypeOptionAttributeCombinationEntityUsingProductTypeOptionAttributeCombinationId);
var filtered = _dropdowns.Where(k => ((DropDownList)k.Value[1]).SelectedValue != "-1" && k.Key != dl.Key);
foreach (var filteredDdl in filtered)
{
IPredicateExpression subFilter = new PredicateExpression();
subFilter.AddWithOr(ProductTypeOptionAttributeFields.AttributeId == int.Parse(((DropDownList)filteredDdl.Value[1]).SelectedValue));
subFilter.AddWithOr(ProductTypeOptionAttributeFields.OptionId == filteredDdl.Key);
filter.AddWithAnd(subFilter);
}
ProductTypeOptionAttributeCollection attrs = new ProductTypeOptionAttributeCollection();
attrs.GetMulti(filter, relations)
The basic SQL query you presented should be faily easy to reproduce in LINQ.
From t in T1
Where T1.ID == 1
Select;
if your not already using it download the free LINQPAD http://www.linqpad.net/ Its got loads of examples to get you up to speed.