JOINS works very slow - sql

In my SQL-Server 2008 R2 i have a SQL query:
SELECT
IceCrossing.WaterwayName as WaterWayName,
IceCrossing.Segment_ID as Segment_ID,
the_geom = Track.Track
FROM dbo.IceCrossing
LEFT JOIN Track ON IceCrossing.Segment_ID=Track.Segment_ID
There i want to select all rows from IceCrossing and if in Track exists row with same Segment_ID show it in result. And there is problem with JOIN. Becouse its query works 4-5 seconds for return me my 260 rows. I was tried to change it:
SELECT
IceCrossing.WaterwayName as WaterWayName,
IceCrossing.Segment_ID as Segment_ID,
the_geom = Track.Track
FROM dbo.Track
RIGHT JOIN IceCrossing ON Track.Segment_ID=IceCrossing.Segment_ID
But same time.
Its possible to make it faster without make a any things with data base and table structures?
UPDATE
More info.
Track - 209 rows.
IceCrossing - 259 rows.
Segment_ID type - [uniqueidentifier]
How to know about indexes on this?
UPDATE2
How i understand my problem in the the_geom field. Becouse query:
SELECT
IceCrossing.WaterwayName as WaterWayName,
IceCrossing.Segment_ID as Segment_ID,
FROM dbo.IceCrossing
LEFT JOIN Track ON IceCrossing.Segment_ID=Track.Segment_ID
Works within a second.
the_geom type - geometry its like a very long string.
What can i do in this case?

The join is fine. You may need an index, either on Track(Segment_ID) or IceCrossing(Segment_ID).
With that volume of data, I'm surprised that the query could take so long. Have you run the query multiple times and gotten consistent results? Is anything else running on the server?
There is no difference in performance between the left outer join and right outer join. They do the same thing.

have your tried simple select * from Track, select * from IceCrossing? If you have huge amount of data in one of your columns (for example, varbinary(max)), it could be not the query that is slow, but receiving all data at client side.
Try that query
select
I.Segment_ID,
T.Segment_ID
from dbo.IceCrossing as I
left outer join Track as T on T.Segment_ID = I.Segment_ID
How long does it executing?

Related

SQL Query - Joining and Aggregating

I need to run a query every hour against a table that joins and aggregates data from another table with millions of rows.
select f.master_con,
s.containers
from
(
select master_con
from shipped
where start_time >= a and start_time <= a+1
) f,
(
select master_con,
count(distinct container) as containers
from picked
) s
where f.master_con = s.master_con
This query above sorta works, the exact syntax may not be correct because I wrote it from memory.
In the sub query 's' I only want to count container for each master_con in the 'f' query, and I think my query runs for a long time because I'm counting container for all master_con but then joining only to master_con from 'f'
Is there a better, more efficient way to write this type of query?
(In the end, I'll sum(containers) from this query above to get total containers shipped during that hour)
Most likely, there is. Can you provide some simplified sample table structures? Additionally, the join method being used has been moving towards deprecation for some time. You should declare your joins explicitly. The below should be an improvement. Left outer join was used so that you get all of the shipper records that meet your criteria and keep them even if they aren't in the picked table. Change that to inner join if you want them gone.
SELECT shipped.master_con,
COUNT(DISTINCT picked.containers) AS containers
FROM shipped LEFT OUTER JOIN
Picked ON picked.master_con = shipped.master_con
WHERE shipped.start_time BETWEEN a AND a+1
GROUP BY shipped.master_con

Why multiple joins take forever?

I believe I found a bug in Google BigQuery but I'm not sure.
I am hoping someone could offer a workaround.
The table I'm running on a table with only 200K of data.
On my attempts to do a funnel analysis I stumbled upon the following bizarre behaviour:
This takes ~3 seconds:
SELECT
COUNT(DISTINCT Q0._user_id) AS step0
FROM
(SELECT _user_id FROM [5629499534213120.201501]) AS Q0
LEFT OUTER JOIN
(SELECT _user_id, _time FROM [5629499534213120.201501] WHERE _os=='Windows') AS Q1
ON (Q0._user_id=Q1._user_id)
This takes ~3 Minutes:
SELECT
COUNT(DISTINCT Q0._user_id) AS step0
FROM
(SELECT _user_id FROM [5629499534213120.201501]) AS Q0
LEFT OUTER JOIN
(SELECT _user_id, _time FROM [5629499534213120.201501] WHERE _os=='Windows') AS Q1
ON (Q0._user_id=Q1._user_id)
LEFT OUTER JOIN
(SELECT _user_id, _time FROM [5629499534213120.201501] WHERE _country=='de') AS Q2
ON (Q0._user_id=Q2._user_id)
Meaning adding one more Left Join makes the query unbelievably slow (we're talking about only 200k of data).
Obviously, I have simplified the Select statement so you could focus on the main issue (The real select statement I used is far more complicated)
Does anyone know what's the problem, or a workaround for it?
I responded to this on the BigQuery issue tracker, but I'm reposting my answer here:
I'm a bigquery engineer and I looked up your query in our logs.
What you're seeing is a join explosion.
You did a 3-way self-join with non-unique keys. The field "_user_id" had a single value that matched 3937 rows on the left, 1388 rows in the first join, and 1388 rows in the second join.
That means you're creating 3937*1388*1488 or 7.5 billion output rows. (you then did a count distinct over them to reduce the output size, but the intermediate values needed to be created first).
It is not surprising that creating 7.5 billion intermediate rows would take a couple of minutes, especially since they were all from a single key, and hence had to be produced by a single worker task.
My guess is that it would be possible to restructure your query to avoid the join explosion.
I am not familiar with BigQuery specifically, but I suspect the inner queries (SELECT _user_id, _time FROM [...) are retrieving the entire table.
What about rewording the query as follows:
SELECT
COUNT(DISTINCT Q0._user_id) AS step0
FROM
[5629499534213120.201501] AS Q0
LEFT OUTER JOIN [5629499534213120.201501] AS Q1
ON (Q0._user_id=Q1._user_id)
LEFT OUTER JOIN [5629499534213120.201501] AS Q2
ON (Q0._user_id=Q2._user_id)
WHERE Q1._os=='Windows'
AND Q2._country=='de'
As far as I can tell, the result should be the same; wording it like this should hopefully allow the database to use indexes (if the database is properly normalized).

query behave not as expected

I have a query:
select count(*) as total
from sheet_record right join
(select * from sheet_record limit 10) as sr
on 1=1;
If i understood correct (which i think i did not), right join suppose to return all row from right table in conjunction with left table. it suppose to be at list 10 row. But query returns only 1 row with 1 column 'total' . And it doesn't matter left full inner join it will be, result is the same always.
If i reverse tables and use left join with small modification of query, then it work correct (Modifications have no matter because in this case i get exactly what i expected to get). But I am interested to find what i actually didn't understand about join and why this query works not as expected.
You are returning one column because the select contains an aggregation function, turning this into an aggregation query. The query should be returning 10 times the number of rows in the sheet_record table.
Your query is effectively a cross join. So, if you did:
select *
from sheet_record right join
(select * from sheet_record limit 10) as sr
on 1=1;
You would get 10 rows for each record in sheet_record. Each of those records would have additional columns from one of ten records from the same table.
You are using a count(*) function, without any groupings. This will pretty much will result in retrieving a single row back. Try running your query without the count() to see if you get something closer to what you expect.
Eventually with help of commentators I did understood what was wrong. Not wrong actually, but what exactly i was not catching.
// this code below is work fine. query will return page 15 with 10 records in.
select *from sheet_record inner join (select count(*) as total from sheet_record) as sr on 1=1 limit 10 offset 140;
I was thinking that join takes table from left and join with the right table. But the moment i was working on script(above) I had on right side a view(table built by subquery) instead of pure table and i was thinking that left side as well a view, made by (select * from sheet_record) which is a mistake.
Idea is to get set of records from table X with additional column having value of total number of records in table.
(This is common problem when there is a demand to show table in UI using paging. To know how many pages still should be available i need to know how many record in total so i can calculate how many pages still available)
I think it should be something
select * from (
(here is some subquery which will give a view using count(*) function on some table X and it will be used as left table)
right join
(here is some subquery which will get some set or records from table X with limit and offset)
on 1=1 //becouse i need all row from right table(view) in all cases it should be true
)
Query with right join will a bit complicated.
I am using postgres.
So eventually i managed to get result with right join
select * from (select count(*) as total from sheet_record) as srt right join (select * from sheet_record limit 10 offset 140) as sr on 1=1;

Why is an IN statement with a list of items faster than an IN statement with a subquery?

I'm having the following situation:
I've got a quite complex view from which I've to select a couple of records.
SELECT * FROM VW_Test INNER JOIN TBL_Test ON VW_Test.id = TBL_Test.id
WHERE VW_Test.id IN (1000,1001,1002,1003,1004,[etc])
This returns a result practically instantly (currently with 25 items in that IN statement). However when I use the following query it slows down really fast.
SELECT * FROM VW_Test INNER JOIN TBL_Test ON VW_Test.id = TBL_Test.id
WHERE VW_Test.id IN (SELECT id FROM TBL_Test)
With 25 records in the TBL_Test this query takes about 5 seconds. I've got an index on that id in the TBL_Test.
Anyone got an idea why this happens and how to get performance up?
EDIT: I forgot to mention that this subquery
SELECT id FROM TBL_Test
returns a result instantly as well.
Well, when using a subquery the database engine will first have to generate the results for the subquery before it can do anything else, which takes time. If you have a predefined list, this will not need to happen and the engine can simply use those values 'as is'. At least, this is how I understand it.
How to improve performance: do away with the subquery. I don't think you even need the IN clause in this case. The INNER JOIN should suffice.

SQL Server, fetching data from multiple joined tables. Why is slow?

I have problem with performance when retrieving data from SQL Server.
My sql query looks something like this:
SELECT
table_1.id,
table_1.value,
table_2.id,
table_2.value,...,
table_20.id,
table_20.value
From table_1
INNER JOIN table_2
ON table_1.id = table_2.table_1_id
INNER JOIN table_3
ON table_2.id = table_3.table_2_id...
WHERE table_1.row_number BETWEEN 1 AND 20
So, I am fetching 20 results.
This query takes about 5 seconds to execute.
When I select only table_1.id, it returns results instantly.
Because of that, I guess that problem is not in JOINs, it is in retrieving data from multiple tables.
Any suggestions how I would speed up this query?
Assuming your tables are designed properly (have a useful primary key etc.), then the first thing I would check is this:
are there indices on each of the foreign key columns in the child tables?
SQL Server does not automatically create indices on the foreign key columns - yet those are indeed very helpful for speeding up your JOINs.
Other than that: just look at the query plans! They should tell you everything about this query - what indices are being used (or not), what operations are being executed to get the results....
Without knowing a lot more about your tables, their structure and the data they contain (how much? What kind of values? etc.), there's really not much we can do to help here....
Between can really slow a query, what do you want to achieve with it
also
Do you have an on the columns you are joining on
You could use with(nolock) on the table
Check to execution plan to see whats taking so long
How about this one:
SELECT
table_1.id,
table_1.value,
table_2.id,
table_2.value,...,
table_20.id,
table_20.value
FROM
table_1
INNER JOIN table_2 ON table_1.id = table_2.id AND table_1.row_Number between 1 and 20
INNER JOIN table_3 ON table_2.id = table_3.id
I mean before joining to another table, you choose range of data.