Speed up SQLite query, can I do it without a union? - sql

Hi everybody of the stackoverflow community! I've been visiting this site for years and here comes my first post
Lets say I have a database with three tables:
groups (GroupID,GroupType,max1,size)
candies (candyID,name,selected)
members (groupID,nameID)
Example: The candy factory.
In the candy factory 10 types of candy bags are produced out of 80 different candies.
So: There are 10 unique group types(bags) with 3 different sizes: (4,5,6); a group is combination out of 80 unique candies.
Out of this I make a database, (with some rules about which candy combinations gets into a group).
At this point I have a database with 40791 unique candy bags.
Now I want to compare a collection of candies with all the candy bags in the DB, as a result I want the bags out of the DB which are missing 3 or less candies with the compare collection.
-- restore candy status
update candies set selected = 0, blacklisted = 0;
-- set status for candies to be selected
update candies set selected = 1 where name in ('candy01','candy02','candy03','candy04');
select groupId, GroupType, max, count(*) as remainingNum, group_concat(name,', ') as remaining
from groups natural join members natural join candies
where not selected
group by groupid having count(*) <= 3
UNION -- Union with groups which dont have any remaining candies and have a 100% match
select groupid, GroupType, max, 0 as remainingNum, "" as remaining
from groups natural join members natural join candies
where selected
group by groupid having count(*) =groups.size;
The above query does this. But the thing I am trying to accomplish is to do this without the union, because speed is of the essence. And also I am new to sql and are very eager to learn/see new methods.
Greetings, Rutger

I'm not 100% sure about what you are accomplishing through these queries, so I haven't looked at a fundamentally different approach. If you can include example data to demonstrate your logic, I can have a look at that. But, in terms of simply combining your two queries, I can do that. There is a note of caution first, however...
SQL is compiled in to query plans. If the query plan for each query is significantly different from the other, combining them into a single query may be a bad idea. What you may end up with is a single plan that works for both cases, but is not very efficient for either. One poor plan can be a lot worse than two good plans => Shorter, more compact, code does not always give faster code.
You can put selected in to your GROUP BY instead of your WHERE clause; the fact that you have two UNIONed queries shows that you are treating them as two separate groups already.
Then, the only difference between your queries is the filter on count(*), which you can accommodate with a CASE WHEN statement...
SELECT
groups.groupID,
groups.GroupType,
groups.max,
CASE WHEN Candies.Selected = 0 THEN count(*) ELSE 0 END as remainingNum,
CASE WHEN Candies.Selected = 0 THEN group_concat(candies.name,', ') ELSE '' END as remaining
FROM
groups
INNER JOIN
members
ON members.GroupID = groups.GroupID
INNER JOIN
candies
ON Candies.CandyID = members.CandyID
GROUP BY
Groups.GroupID,
Groups.GroupType,
Groups.max,
Candies.Selected
HAVING
CASE
WHEN Candies.Selected = 0 AND COUNT(*) <= 3 THEN 1
WHEN Candies.Selected = 1 AND COUNT(*) = Groups.Size THEN 1
ELSE 0
END
=
1
The layout changes are simply because I disagree with using NATURAL JOIN for maintenance reasons. They are a short-cut in initial build and a potential disaster in later development. But that's a different issue, you can read about it on line if you feel you want to.

Don't update the database when you're doing a select, your first update update candies set selected = 0, blacklisted = 0; will apply to the entire table, and rewrite every record. You should try without using selected and also changing your union to UNION ALL. Further to this, you try inner join instead of natural join (but I don't know your schema for candy to members)
select groupId, GroupType, max, count(*) as remainingNum, group_concat(name,', ') as remaining
from groups
inner join members on members.groupid = groups.groupid
inner join candies on candies.candyid = member.candyid
where name NOT in ('candy01','candy02','candy03','candy04')
group by groups.groupid
having count(*) <= 3
UNION ALL -- Union with groups which dont have any remaining candies and have a 100% match
select groupid, GroupType, max, 0 as remainingNum, "" as remaining
from groups
inner join members on members.groupid = groups.groupid
inner join candies on candies.candyid = member.candyid
where name in ('candy01','candy02','candy03','candy04')
group by groupid
having count(*) = groups.size;
This should at least perform better than updating all records in the table before querying it.

Related

Sql Left or Right Join One To Many Pagination

I have one main table and join other tables via left outer or right outer outer join.One row of main table have over 30 row in join query as result. And I try pagination. But the problem is I can not know how many rows will it return for one main table row result.
Example :
Main table first row result is in my query 40 rows.
Main table second row result is 120 row.
Problem(Question) UPDATE:
For pagination I need give the pagesize the count of select result. But I can not know the right count for my select result. Example I give page no 1 and pagesize 50, because of this I cant get the right result.I need give the right pagesize for my main table top 10 result. Maybe for top 10 row will the result row count 200 but my page size is 50 this is the problem.
I am using Sql 2014. I need it for my ASP.NET project but is not important.
Sample UPDATE :
it is like searching an hotel for booking. Your main table is hotel table. And the another things are (mediatable)images, (mediatable)videos, (placetable)location and maybe (commenttable)comments they are more than one rows and have one to many relationship for the hotel. For one hotel the result will like 100, 50 or 10 rows for this all info. And I am trying to paginate this hotels result. I need get always 20 or 30 or 50 hotels for performance in my project.
Sample Query UPDATE :
SELECT
*
FROM
KisiselCoach KC
JOIN WorkPlace WP
ON KC.KisiselCoachId = WP.WorkPlaceOwnerId
JOIN Album A
ON KC.KisiselCoachId = A.AlbumId
JOIN Media M
ON A.AlbumId = M.AlbumId
LEFT JOIN Rating R
ON KC.KisiselCoachId = R.OylananId
JOIN FrUser Fr
ON KC.CoachId = Fr.UserId
JOIN UserJob UJ
ON KC.KisiselCoachId = UJ.UserJobOwnerId
JOIN Job J
ON UJ.JobId = J.JobId
JOIN UserExpertise UserEx
ON KC.KisiselCoachId = UserEx.UserExpertiseOwnerId
JOIN Expertise Ex
ON UserEx.ExpertiseId = Ex.ExpertiseId
Hotel Table :
HotelId HotelName
1 Barcelona
2 Berlin
Media Table :
MediaID MediaUrl HotelId
1 www.xxx.com 1
2 www.xxx.com 1
3 www.xxx.com 1
4 www.xxx.com 1
Location Table :
LocationId Adress HotelId
1 xyz, Berlin 1
2 xyz, Nice 1
3 xyz, Sevilla 1
4 xyz, Barcelona 1
Comment Table :
CommentId Comment HotelId
1 you are cool 1
2 you are great 1
3 you are bad 1
4 hmm you are okey 1
This is only sample! I have 9999999 hotels in my database. Imagine a hotel maybe it has 100 images maybe zero. I can not know this. And I need get 20 hotels in my result(pagination). But 20 hotels means 1000 rows maybe or 100 rows.
First, your query is poorly written for readability flow / relationship of tables. I have updated and indented to try and show how/where tables related in hierarchical relativity.
You also want to paginate, lets get back to that. Are you intending to show every record as a possible item, or did you intend to show a "parent" level set of data... Ex so you have only one instance per Media, Per User, or whatever, then once that entry is selected you would show details for that one entity? if so, I would do a query of DISTINCT at the top-level, or at least grab the few columns with a count(*) of child records it has to show at the next level.
Also, mixing inner, left and right joins can be confusing. Typically a right-join means you want the records from the right-table of the join. Could this be rewritten to have all required tables to the left, and non-required being left-join TO the secondary table?
Clarification of all these relationships would definitely help along with the context you are trying to get out of the pagination. I'll check for comments, but if lengthy, I would edit your original post question with additional details vs a long comment.
Here is my SOMEWHAT clarified query rewritten to what I THINK the relationships are within your database. Notice my indentations showing where table A -> B -> C -> D for readability. All of these are (INNER) JOINs indicating they all must have a match between all respective tables. If some things are NOT always there, they would be changed to LEFT JOINs
SELECT
*
FROM
KisiselCoach KC
JOIN WorkPlace WP
ON KC.KisiselCoachId = WP.WorkPlaceOwnerId
JOIN Album A
ON KC.KisiselCoachId = A.AlbumId
JOIN Media M
ON A.AlbumId = M.AlbumId
LEFT JOIN Rating R
ON KC.KisiselCoachId = R.OylananId
JOIN FrUser Fr
ON KC.CoachId = Fr.UserId
JOIN UserJob UJ
ON KC.KisiselCoachId = UJ.UserJobOwnerId
JOIN Job J
ON UJ.JobId = J.JobId
JOIN UserExpertise UserEx
ON KC.KisiselCoachId = UserEx.UserExpertiseOwnerId
JOIN Expertise Ex
ON UserEx.ExpertiseId = Ex.ExpertiseId
Readability of a query is a BIG help for yourself, and/or anyone assisting or following you. By not having the "on" clauses near the corresponding joins can be very confusing to follow.
Also, which is your PRIMARY table where the rest are lookup reference tables.
ADDITION PER COMMENT
Ok, so I updated a query which appears to have no context to the sample data and what you want in your post. That said, I would start with a list of hotels only and a count(*) of things per hotel so you can give SOME indication of how much stuff you have in detail. Something like
select
H.HotelID,
H.HotelName,
coalesce( MedSum.recs, 0 ) as MediaItems,
coalesce( LocSum.recs, 0 ) as NumberOfLocations,
coalesce( ComSum.recs, 0 ) as NumberOfLocations
from
Hotel H
LEFT JOIN
( select M.HotelID,
count(*) recs
from Media M
group by M.HotelID ) MedSum
on H.HotelID = MedSum.HotelID
LEFT JOIN
( select L.HotelID,
count(*) recs
from Location L
group by L.HotelID ) LocSum
on H.HotelID = LocSum.HotelID
LEFT JOIN
( select C.HotelID,
count(*) recs
from Comment C
group by C.HotelID ) ComSum
on H.HotelID = ComSum.HotelID
order by
H.HotelName
--- apply any limit per pagination
Now this will return every hotel at a top-level and the total count of things per the hotel per the individual counts which may or not exist hence each sub-check is a LEFT-JOIN. Expose a page of 20 different hotels. Now, as soon as one person picks a single hotel, you can then drill-into the locations, media and comments per that one hotel.
Now, although this COULD work, having to do these counts on an every-time query might get very time consuming. You might want to add counter columns to your main hotel table representing such counts as being performed here. Then, via some nightly process, you could re-update the counts ONCE to get them primed across all history, then update counts only for those hotels that have new activity since entered the date prior. Not like you are going to have 1,000,000 posts of new images, new locations, new comments in a day, but of 22,000, then those are the only hotel records you would re-update counts for. Each incremental cycle would be short based on only the newest entries added. For the web, having some pre-aggregate counts, sums, etc is a big time saver where practical.

Count of how many times id occurs in table SQL regexp

Hi I have a redshift table of articles that has a field on it that can contain many accounts. So there is a one to many relationship between articles to accounts.
However I want to create a new view where it lists the partner id's in one column and in another column a count of how many times the partner id appears in the articles table.
I've attempted to do this using regex and created a new redshift view, but am getting weird results where it doesn't always build properly. So one day it will say a partner appears 15 times, then the next 17, then the next 15, when the partner id count hasn't actually changed.
Any help would be greatly appreciated.
SELECT partner_id,
COUNT(DISTINCT id)
FROM (SELECT id,
partner_ids,
SPLIT_PART(partner_ids,',',i) partner_id
FROM positron_articles a
LEFT JOIN util.seq_0_to_500 s
ON s.i < regexp_count (partner_ids,',') + 2
OR s.i = 1
WHERE i > 0
AND regexp_count (partner_ids,',') = 0
ORDER BY id)
GROUP BY 1;
Let's start with some of the more obvious things and see if we can start to glean other information.
Next GROUP BY 1 on your outer query needs to be GROUP BY partner_id.
Next you don't need an order by in your INNER query and the database engine will probably do a better job optimizing performance without it so remove ORDER BY id.
If you want your final results to be ordered then add an ORDER BY partner_id or similar clause after your group by of your OUTER query.
It looks like there are also problems with how you are splitting a partnerid from partnerids but I am not positive about that because I need to understand your view and the data it provides to know how that affects your record count for partnerid.
Next your LEFT JOIN statement on the util.seq_0_to_500 I am pretty sure you can drop off the s.i = 1 as the first condition will satisfy that as well because 2 is greater than 1. However your left join really acts more like an inner join because you then exclude any non matches from positron_articles that don't have a s.i > 0.
Oddly then your entire join and inner query gets kind of discarded because you only want articles that have no commas in their partnerids: regexp_count (partner_ids,',') = 0
I would suggest posting the code for your util.seq_0_to_500 and if you have a partner table let use know about that as well because you can probably get your answer a lot easier with that additional table depending on how regexp_count works. I suspect regex_count(partnerids,partnerid) exampleregex_count('12345,678',1234) will return greater than 0 at which point you have no choice but to split the delimited strings into another table before counting or building a new matching function.
If regex_count only matches exact between commas and you have a partner table your query could be as easy as this:
SELECT
p.partner_id
,COUNT(a.id) AS ArticlesAppearedIn
FROM
positron_articles a
LEFT JOIN PARTNERTABLE p
ON regexp_count(a.partnerids,p.partnerid) > 0
GROUP BY
p.partner_id
I will actually correct myself as I just thought of a way to join a partner table without regexp_count. So if you have a partner table this might work for you. If not you will need to split strings. It basically tests to see if the partnerid is the entire partnerids, at the beginning, in the middle, or at the end of partnerids. If one of those is met then the records is returned.
SELECT
p.partner_id
,COUNT(a.id) AS ArticlesAppearedIn
FROM
PARTNERTABLE p
INNER JOIN positron_articles a
ON
(
CASE
WHEN a.partnerids = CAST(p.partnerid AS VARCHAR(100)) THEN 1
WHEN a.partnerids LIKE p.partnerid + ',%' THEN 1
WHEN a.partnerids LIKE '%,' + p.partnerid + ',%' THEN 1
WHEN a.partnerids LIKE '%,' + p.partnerid THEN 1
ELSE 0
END
) = 1
GROUP BY
p.partner_id

Flatten multiple query results with same ID to single row?

I'm curious about something in a SQL Server database. My current query pulls data about my employer's items for sale. It finds information for just under 105,000 items, which is correct. However, it returns over 155,000 rows, because each item has other things related to it. Right now, I run that data through a loop in Python, manually flattening it out by checking if the item the loop is working on is the same one it just worked on. If it is, I start filling in that item's extra information. Ideally, the SQL would return all this data already put into one row.
Here is an overview of the setup. I'm leaving out a few details for simplicity's sake, since I'm curious about the general theory, not looking for something I can copy and paste.
Item: contains the item ID, SKU, description, vendor ID, weight, and dimensions.
AttributeName: contains attr_id and attr_text. For instance, "color", "size", or "style".
AttributeValue: contains attr_value_id and attr_text. For instance, "blue" or "small".
AttributeAssign: contains item_id and attr_id. This ties attribute names to items.
attributeValueAssign: contains item_id and attr_value_id, tying attribute values to items.
A series of attachments is set up in a similar way, but with attachment and attachmentAssignment. Attachments can have only values, no names, so there is no need for the extra complexity of a third table as there is with attributes.
Vendor is simple: the ID is used in the item table. That is:
select item_id, vendorName
from item
join vendor on vendor_id = item.vendorNumber
gets you the name of an item's vendor.
Now, the fun part: items may or may not have vendors, attributes, or attachments. If they have either of the latter two, there's no way to know how many they have. I've seen items with 0 attributes and items with 5. Attachments are simpler, as there can only be 0 or 1 per item, but the possibility of 0 still demands an outer left join so I am guaranteed to get all the items.
That's how I get multiple rows per item. If an item has three attrigbutes, I get either four or seven rows for just that item--I'm not sure if it's a row per name/value or a row per name AND a row per value. Either way, this is the kind of thing I'd like to stop. I want each row in my result set to contain all attributes, with a cap at seven and null for any missing attribute. That is, something like:
item_id; item_title; item_sku; ... attribute1_name; attribute1_value; attribute2_name; attribute2_value; ... attribute7_value
1; some random item; 123-45; ... color; blue; size; medium; ... null
Right now, I'd get multiple rows for that, such as (only ID and attributes):
ID; attribute 1 name; attribute 1 value; attribute 2 name; attribute 2 value
1; color; blue; null; null
1; color; blue; size; medium
I'm after the second row only--all the information put together into one row per unique item ID. Currently, though, I get multiple rows, and Python has to put everything together. I'm outputting this to a spreadsheet, so information about an item has to be on that item's row.
I can just keep using Python if this is too much bother. But I wondered if there was a way to do it that would be relatively easy. My script works fine, and execution time isn't a concern. This is more for my own curiosity than a need to get anything working. Any thoughts on how--or if--this is possible?
Here is #WCWedin's answer modified to use a CTE.
WITH attrib_rn as
(
select
*, row_number() over(partition by item_id order by name, attribute_id) as row_number
from attributes
)
select
i.item_id,
attr1.name as attribute1_name, attr1.value as attribute1_value,
...
attr7.name as attribute7_name, attr7.value as attribute7_value
from items i
left join attrib_rn as attr1 ON attr1.item_id = i.item_id AND attr1.row_number = 1
left join attrib_rn as attr2 ON attr2.item_id = i.item_id AND attr2.row_number = 2
left join attrib_rn as attr3 ON attr3.item_id = i.item_id AND attr3.row_number = 3
left join attrib_rn as attr4 ON attr4.item_id = i.item_id AND attr4.row_number = 4
left join attrib_rn as attr5 ON attr5.item_id = i.item_id AND attr5.row_number = 5
left join attrib_rn as attr6 ON attr6.item_id = i.item_id AND attr6.row_number = 6
left join attrib_rn as attr7 ON attr7.item_id = i.item_id AND attr7.row_number = 7
Since you only want the first 7 attributes and you want to keep all of the logic in the SQL query, you're probably looking at using row_number. Subqueries will do the job directly with multiple joins, and the performance will probably be pretty good since you're only joining so many times.
select
i.item_id,
attr1.name as attribute1_name, attr1.value as attribute1_value,
...
attr7.name as attribute7_name, attr7.value as attribute7_value
from
items i
left join (
select
*, row_number() over(partition by item_id order by name, attribute_id) as row_number
from
attributes
) AS attr1 ON
attr1.item_id = i.item_id
AND attr1.row_number = 1
...
left join (
select
*, row_number() over(partition by item_id order by name, attribute_id) as row_number
from
attributes
) AS attr7 ON
attr7.item_id = i.item_id
AND attr7.row_number = 7
In SQL Server, you can tackle this with a subquery containing 'ROW_NUMBER() OVER', and a few CASE statements to map the top 7 into columns.
A little tricky, but post your full query that returns the big list and I'll demonstrate how to transpose it.

How to make a query to obtain only results that have N number within a range of values?

I'm trying to extract nutrient data in MS Access 2007 from the USDA food database, freely available at http://www.ars.usda.gov/Services/docs.htm?docid=24912
I need records that have ALL nutrients from NUT_DATA.Nutr_No . Those records have values between '501' and '511' . But I wish to exclude incomplete records that have missing values.
Currently, Baby food banana has all from nutrient 501 to 511, but Baby food Beverage has only 9 of the nutrients listed, and many others are like that.
As a last resort, I guess it would be acceptable to have all records, showing null for missing values, as long as each FOOD_DES.Long_Desc has exactly 11 records, one for each NUT_DATA.Nutr_No OR NUTR_DEF.NutrDesc (which correspond to each other).
SELECT
FOOD_DES.NDB_No, FOOD_DES.FdGrp_Cd, FOOD_DES.Long_Desc, NUT_DATA.Nutr_No, NUTR_DEF.NutrDesc, NUT_DATA.Nutr_Val, WEIGHT.Amount, WEIGHT.Msre_Desc, WEIGHT.Gm_Wgt, [WEIGHT]![Amount] & " " & [WEIGHT]![Msre_Desc] AS msre
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE
(NUT_DATA.Nutr_No between '501' and '511' ) and ((WEIGHT.Seq)="1") and NUT_DATA.Nutr_Val > '0' and
// this part is me out of ideas trying stuff, but didn't help
EXISTS (SELECT 1
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE count FOOD_DES.Long_Desc = "11" )
//end wild of experimentation
ORDER BY FOOD_DES.Long_Desc, NUTR_DEF.SR_Order;
This is a sample of the data. I just copied the most important columns. The red is not what I'm looking for because it doesn't have all 11 nutrients. I can paste on the google doc the whole table if someone thinks that would help.
https://docs.google.com/spreadsheets/d/1FghDD59wy2PYlpsqUlYVc3Ulwvy4MMLagpBUYtvLBfI/edit?usp=sharing
As your starting point, identify which food items have values > 0 for all 11 of those nutrients. Check whether this simpler GROUP BY query shows you the correct items:
SELECT ndat.NDB_No
FROM
NUT_DATA AS ndat
INNER JOIN WEIGHT AS wt
ON ndat.NDB_No = wt.NDB_No
WHERE
ndat.Nutr_Val>0
AND ndat.Nutr_No IN('501','502','503','504','505','506','507','508','509','510','511')
AND wt.Seq='1'
GROUP BY ndat.NDB_No
HAVING Count(ndat.Nutr_No)=11;
Note you could use Val(ndat.Nutr_No) Between 501 And 511 as the Nutr_No restriction, which would give you a more concise statement. However, evaluating Val() for every row of the table means that approach would forego the performance benefit of indexed retrieval ... so that version of the query should be noticeably slower.
Save that query and create a new query which joins it to the base tables for the additional data you need from other columns. Or use it as a subquery instead of a named query if you prefer.

How do you explicitly show rows which have count(*) equal to 0

The query I'm running in DB2
select yrb_customer.name,
yrb_customer.city,
CASE count(*) WHEN 0 THEN 0 ELSE count(*) END as #UniClubs
from yrb_member, yrb_customer
where yrb_member.cid = yrb_customer.cid and yrb_member.club like '%Club%'
group by yrb_customer.name, yrb_customer.city order by count(*)
Shows me people which are part of clubs which has the word 'Club' in it, and it shows how many such clubs they are part of (#UniClubs) along with their name and City. However for students who are not part of such a club, I would still like for them to show up but just have 0 instead of them being hidden which is what's happening right now. I cannot get this functionality with count(*). Can somebody shed some light? I can explain further if the above is not clear enough.
I'm not familiar with DB2 so I'm taking a stab in the dark, but try this:
select yrb_customer.name,
yrb_customer.city,
CASE WHEN yrb_member.club like '%Club% THEN count(*) ELSE 0 END as #UniClubs
from yrb_member, yrb_customer
where yrb_member.cid = yrb_customer.cid
group by yrb_customer.name, yrb_customer.city order by count(*)
Basically you don't want to filter for %Club% in your WHERE clause because you want ALL rows to come back.
You're going to want a LEFT JOIN:
SELECT yrb_customer.name, yrb_customer.city,
COUNT(yrb_member.club) as clubCount
FROM yrb_customer
LEFT JOIN yrb_member
ON yrb_member.cid = yrb_customer.cid
AND yrb_member.club LIKE '%Club%
GROUP BY yrb_customer.name, yrb_customer.city
ORDER BY clubCount
Also, if the tuple (yrb_customer.name, yrb_customer.city) is unique (or is supposed to be - are you counting all students with the same name as the same person?), you might get better performance out of the following:
SELECT yrb_customer.name, yrb_customer.city,
COALESCE(club.count, 0)
FROM yrb_customer
LEFT JOIN (SELECT cid, COUNT(*) as count
FROM yrb_member
WHERE club LIKE '%Club%
GROUP BY cid) club
ON club.cid = yrb_customer.cid
ORDER BY club.count
The reason that your original results were being hidden was because in your original query, you have an implicit inner join, which of course requires matching rows. The implicit-join syntax (comma-separated FROM clause) is great for inner (regular) joins, but is terrible for left-joins, which is what you really needed. The use of the implicit-join syntax (and certain types of related filtering in the WHERE clause) is considered deprecated.