SQLite3 query takes a lot of time, how would you do it? - sql

I have a sqlite3 DB table of around 250 000 rows, my code is written in python. I need to filter it in very specific wat, and it takes much too long time.
Table is as follows:
self.cur.execute("""create table DetectedVehicles(IdD INTEGER PRIMARY KEY,
CLCode INT,
DetectionTime INT,
PlateNo VARCHAR)""")
It's a table for automatic plate number recognition result filtering.
And I need to filter it to get (native sql-like statements :) ):
Get rows from table DetectedVehicles where vehicles were observed at
CLCode="X" before they were observed at CLCode="Y".
(implicite: they were observed at both of them)
So I need to get list of detectedvehicles, that crossed specific CLCodes in proper sequence, i.e. Y before X.
I managed to create something that is working, but it takes about 10seconds for the query. Is there a faster way?
The code goes here:
self.cur.execute('select distinct PlateNo from DetectedVehicles where CLCode=? intersect select PlateNo from DetectedVehicles where CLCode=?',(CountLocationNo[0],CountLocationNo[1]))
PlatesTab=list(self.cur)
Results=[]
for Plate in PlatesTab:
PlateQ1='select * from DetectedVehicles where PlateNo in (?) and ((select DetectionTime from DetectedVehicles where CLCode = ? and PlateNo in (?) ) < (select DetectionTime from DetectedVehicles where CLCode = ? and PlateNo in (?)))'
R=list(self.cur.execute(PlateQ1,(Plate,CountLocationNo[0],Plate,CountLocationNo[1],Plate)))
if R:
TimesOD=self.curST2.execute('select DetectionTime from DetectedVehicles where PlateNo in (?) and (CLCode= ? or CLCode=?)',(Plate,CountLocationNo[0],CountLocationNo[1])).fetchall()
if TimesOD:
TravelTimes.append(TimesOD[1][0]-TimesOD[0][0])
DetectionTimes.append(TimesOD[0][0])
for i in R:
Results.append(i[0])
Results=tuple(Results)
QueryCL=' intersect select * from DetectedVehicles where IDd in ' + str(Results)
Thanks in advance

You can do it all in a single query.
select
dv1.PlateNo, dvPoint1.DetectionTime, dvPoint2.DetectionTime
from
DetectedVehicles dvPoint1
inner join DetectedVehicles dvPoint2
on dvPoint1.PlateNo = dvPoint2.PlateNo
and dvPoint1.CLCode = ? and dvPoint2.CLCode = ?
and dvPoint1.DetectionTime < dvPoint2.DetectionTime
You will want an index on (PlateNo, DetectionTime, CLCode), or (CLCode, PlateNo). Try them both to see which is faster. PlateNo on it's own may do.

Try:
select distinct x.*
from DetectedVehicles x
join DetectedVehicles y
on x.PlateNo = y.PlateNo and
x.DetectionTime < y.DetectionTime
where x.CLCode=? and y.CLCode=?
or:
select x.*
from DetectedVehicles x
where exists
(select 1
from DetectedVehicles y
where x.PlateNo = y.PlateNo and
x.DetectionTime < y.DetectionTime and
x.CLCode=? and y.CLCode=?)
I would normally expect the latter query to execute more quickly, but it would be worth running both to check.

Thank You guys for this feedback.
I post it as an answer, and present time results:
1. fastest total (query 1.80s, fetchall 0.20s, total: 2s)
select distinct x.*
from DetectedVehicles x
join DetectedVehicles y
on x.PlateNo = y.PlateNo and
x.DetectionTime < y.DetectionTime
where x.CLCode=? and y.CLCode=?
2. (query 1.83s, fetchall 0.19s, total: 2.02s)
select
dvPoint1.PlateNo, dvPoint1.DetectionTime, dvPoint2.DetectionTime
from
DetectedVehicles dvPoint1
inner join DetectedVehicles dvPoint2
on dvPoint1.PlateNo = dvPoint2.PlateNo
and dvPoint1.CLCode = ? and dvPoint2.CLCode = ?
and dvPoint1.DetectionTime < dvPoint2.DetectionTime
3. (query 1.82s, fetchall 1.09s, total: 2.91s)
select x.*
from DetectedVehicles x
where exists
(select 1
from DetectedVehicles y
where x.PlateNo = y.PlateNo and
x.DetectionTime < y.DetectionTime and
x.CLCode=? and y.CLCode=?)
So thanks #Mark Bannister for Your answer, I'm going to accept it.
However one issue remains:
cur.fetchall() takes enourmously long time.. and I need to get the results, how should I do it ? (For just a 100 of rows it takes around 2 minutes for each of your solutions).
Solved issue: download new sqlite.dll to your python/dlls folder ... don't ask me why: Join with Pythons sqlite module is slower than doing it manually

Related

Query always gives a result although it shouldn't

I'd like to print the number of a room (stanza) only if that room has a reservation (prenotazione) between two dates in order to display an error message in php when the variable with the result is set. The problem is my query seems to check if any room has a reservation set between these dates and always gives output for any room asked.
SELECT stanze.num_stanza
FROM stanze, prenotazioni
WHERE prenotazioni.num_stanza=stanze.num_stanza
AND prenotazioni.check_in
BETWEEN '20190615'
AND '20190620'
OR prenotazioni.check_out
BETWEEN '20190615'
AND '20190620'
AND stanze.num_stanza='100'
Usually it is not good to have WHERE clauses like x OR y AND z. Try (x OR y) AND z. Otherwise if x is true then it will select no matter what z is.
e.g.
WHERE prenotazioni.num_stanza = stanze.num_stanza
AND (prenotazioni.check_in BETWEEN '20190615' AND '20190620' OR prenotazioni.check_out BETWEEN '20190615' AND '20190620')
AND stanze.num_stanza = '100'
It is not clear what you want exactly -- a full overlap or a partial overlap.
Either way, you can use EXISTS. The following is for a partial overlap:
SELECT s.num_stanza
FROM stanze s
WHERE EXISTS (SELECT 1
FROM prenotazioni p
WHERE p.num_stanza = s.num_stanza AND
p.check_in < '20190620' AND
p.check_out >= '20190615'
) AND
s.num_stanza = 100 -- looks like number so it probably is

Can't get all of the data I want out of a join

I have a hive table that has some http sessions that I need to analyze. One column has a http session ID that is consistent throughout the entire session.
I'm trying find all rows that are part of sessions where one of a certain set of actions was performed AND the session ended in a timeout.
set hive.cli.print.header=true;
SELECT * FROM
(SELECT DISTINCT id, x_date, y
FROM log
WHERE ((to_date(x_date)) >= (date_sub(current_date, 1)))
AND y like '%timeout%') u
JOIN
(SELECT id, x_date, y, z, q, a
FROM log
WHERE ((to_date(x_date)) >= (date_sub(current_date, 1)))
AND z in ('1', '2', '3', '4')) o
ON u.id = o.id
ORDER BY u.id, o.x_date;
What I'm trying to find is all rows where
id = 123 and y like '%timeout%'
AND (id = 123 and z in('1','2','3','4')
What I am currently getting is something like
if (id = 123 and y like %timeout%)
select * where (id = 123 and z in ('1','2','3','4'))
The expected output should be much larger than the actual output, as I should get many lines that only has ID = 123.
The problem is I need this for all IDs that meet both criteria, so I have to actually find all of the IDs first :)
I hope this makes sense, I feel like I may have worded the question in a confusing manner.
Try this it would work in SQL, I'm not super versed in Hive, but it should work based on what I've read.
SELECT id, x_date, y, z, q, a
FROM log
WHERE z IN ('1','2','3','4','5')
AND id IN (
SELECT id
FROM log
WHERE ((to_date(x_date)) >= (date_sub(current_date, 1)))
AND y like '%timeout%')

get most current record if*

i have a set of data for service jobs and i want to identify customers that still have an old part installed (part x) but if that customer has the new replacement part (part y) then i dont want them to populate in my data. The best way i can describe it is think of a recall. Now every Job has a number, that number is always increasing with new jobs across the customer. So im looking for where (part x) has been installed (part y) has not. Customers all have a customer number that any jobs are associated to. In my example below Customers (12373,12369,12349) would all show up on my list but customer (12365,would not because they were upgraded to part y on a numerically higher job #.
Any help would be great, new to sql
My version :)
SELECT
t1.*
FROM
`table` AS t1
LEFT JOIN (
SELECT
`Customer Number`
FROM
`table`
WHERE
`Parts` > 'part x'
) AS t2
ON ( t1.`Customer Number` = t2.`Customer Number` )
WHERE
t1.`Parts` = 'part x'
AND t2.`Customer Number` IS NULL
General sql syntax. Can be a bit shorter in some modern DBMSes, but perfomance should be good enough when index on (customerNumber, jobNumber) exists.
select customerNumber, jobNumber, parts
from theTable t1
where parts='part x' and not exists (
select 1
from theTable t2
where t2.customerNumber = t1.customerNumber
and t2.jobNumber > t1.jobNumber
and t2.parts='part y')

Database (Oracle 11g) query optimization for joins

So I am trying to optimize a bunch of queries which are taking a lot of time. What I am trying to figure out is how to create an index on columns from different tables.
Here is a simple version of my problem.
What I did
After Googling I looked into bitmap index but I am not sure if this is the right way to solve the issue
Issue
There is a many to many relationship b/w Student(sid,...) and Report(rid, year, isdeleted)
StudentReport(id, sid, rid) is the join table
Query
Select *
from Report
inner join StudentReport on Report.rid = StudentReport.rid
where Report.isdeleted = 0 and StudentReport.sid = x and Report.year = y
What is the best way to create an index?
Please try this:
with TMP_REP AS (
Select * from Report where Report.isdeleted = 0 AND Report.year = y
)
,TMP_ST_REP AS(
Select *
from StudentReport where StudentReport.sid = x
)
SELECT * FROM TMP_REP R, TMP_ST_REP S WHERE S.rid = R.rid

Using output of one sql query into another

I have 2 SQL queries in single line as follows:
SELECT * FROM (SELECT NameCode,Name FROM tblNames) AS X, (SELECT SUM(Mo+Tu) FROM tblFieldDays WHERE tblFieldDays.NameCode =36)
The first query i.e. (SELECT NameCode,Name FROM tblNames) gives a list of users.
Now I want to calculate sum of Mo+Tu i.e. SUM(Mo+Tu) for each user generated by first query.
i.e. I want to provide NameCode generated in first query instead of current 36 value which static just for example
I also tried to use IN statement as follows:
SELECT * FROM (SELECT NameCode,Name FROM tblNames) AS X, (SELECT SUM(Mo+Tu) FROM tblFieldDays WHERE tblFieldDays.NameCode IN (X.NameCode)) AS Y
But didnt work.
Can anyone help?
Thanks.
SELECT NameCode,
Name,
UserFieldDays = SUM(fieldDays.Mo + fieldDays.Tu)
FROM tblNames users
JOIN tblFieldDays fieldDays ON users.NameCode = fieldDays.NameCode
GROUP BY users.NameCode, users.Name
This is probably what you are looking for:
SELECT NameCode, Name,
(SELECT SUM(MO + TU)
FROM tblFieldDays Y
WHERE Y.NameCode IN (X.NameCode))
FROM tblNames X;
This statement selects all names and code from your table tblNames and adds the sum with a sub select.
Check out this Fiddle.
Hope this helps ... Cheers!