Parsing simplified SQL queries with Perl into SQLite - sql

I am trying to turn a "simplified" sql query into a working SQLite one to use against XnViews databases, meaning the database layout is at best suboptimal for what I'm trying to do AND I can't change anything about that.
Example would be "(cat_10 and cat_12) and (cat_5 or cat_7)".
This should be used against the table "t3", which has the fields "if" (fileID) and "ic" (categoryID).
The entries look like this:
if, ic
7, 10
7, 12
7, 4
9, 10
9, 12
9, 5
10, 10
10, 12
10, 7
The simplified query above should only select the files 9 and 10 as 7 does have the wanted categories 10 and 12 but has neither 5 nor 7.
The actual problem now is building that hell of a query statement because it took me already some hours to simply get an AND between two categories working.
SELECT if FROM t3 WHERE ic IN (10, 12) GROUP BY if HAVING count(if) = 2
This gives me all fileIDs that contain category 10 and 12, but I have no idea how I should combine that with the remaining " and (cat_5 or cat_7)".
When I planned these simplified sql statements (made by a click-it-together-builder made in html and js) I was planning to simply replace "cat_5" with "t3.ic = 5" and leave the rest as it is.
Of course I didn't forsee that it wouldn't work as where checks the entry as a whole and there can't be ic = 5 AND ic = 7. That pretty much broke everything.
So I'm wondering if anyone would have an idea how I could translate these simple queries in actual working ones, keeping in mind that it might not be limited to ( x and y ) pairs.
Edit: I worked out how to do the example I've given, I think atleast:
SELECT if FROM t3 WHERE ic IN (10, 12) GROUP BY if HAVING count(if) = 2
INTERSECT
SELECT if FROM t3 WHERE ic IN (5, 7) GROUP BY if
But the main problem now is resolving the ( ) in the right order.
Edit 2: I think I'm giving grouping the categories into one field with group_concat() a try, then I should be able to simply to cats LIKE " " AND which would be small blocks I could easy throw together, then just the brackets and it should work. Highlighting the 'should'.

Your original query doesn't do what is intended. WHERE ic IN (10, 12) GROUP BY if HAVING count(if) = 2 would yield the right result even when you have ics in 10 and 10 again but not 12 at all. This is against your textual description of what you want. This is where an inner query to fetch results for 12 and 10 is needed. You can test your query to fail in the fiddle link I have posted below.
Bit tricky, but this is how I would interpret it straightforward.
SELECT DISTINCT ifc
FROM t3
WHERE ifc IN (
SELECT ifc
FROM t3
WHERE ic = 10
GROUP BY ifc
HAVING COUNT(*) > 0
INTERSECT
SELECT ifc
FROM t3
WHERE ic = 12
GROUP BY ifc
HAVING COUNT(*) > 0
)
AND ic IN (5, 7)
Try fiddle
I did not bring in any optimization, you may try yours. The fiddle link is of Postgres but this should work ( did not get SQLite to work in my browser :( )
Edit: CL. points out an interesting thing about not having to include HAVING clauses in the inner query which is true. I was interpreting OP's requirement in SQL terms with an intent to make things clear without thinking of any optimizations.
Here is a better looking query:
SELECT DISTINCT ifc
FROM t3
WHERE ifc IN (
SELECT ifc
FROM t3
WHERE ic = 10
INTERSECT
SELECT ifc
FROM t3
WHERE ic = 12
)
AND ic IN (5, 7)

Ok I got it working as I originally planned surprisingly.
SELECT Folders.Pathname || Images.Filename AS File FROM Images
JOIN Folders ON Images.FolderID = Folders.FolderID
LEFT JOIN (
SELECT f, Cats, t4.if AS Tagged FROM t2
JOIN (
SELECT if, ' ' || group_concat(ic,' ') || ' ' AS Cats FROM t3 GROUP BY if
) st3 ON t2.i = st3.if
LEFT JOIN t4 ON t2.i = t4.if
) st2 ON File = st2.f
$selectWhereImage $sqlqry
ORDER BY ModifiedDate $order LIMIT $offset, $limit
I know this is one hell of a query but it combines all things I'd be looking for (category ids, tagged or not, rating, color) sortable by date with the full filepath as result.
It's probably a horrible way to do it but if anyone finds a better working way where I can simply replace placeholders like "cat_5" while keeping the rest like it is, needed for brackets and operators, then I'm all ears :D
Oh and $selectWhereImage contains just a longer WHERE that limits File to be ending with an imageformat, $sqlqry is the refittet thing from above, cat_5 would just turn into cats LIKE '% 5 %', due to the additional spaces left and right of cats I can match any number without finding "1" in "10" since " 1 " isn't in " 10 " :D

A hackish approach which would be simpler and I believe faster is
SELECT DISTINCT ifc
FROM t3
WHERE ifc IN (
SELECT ifc
FROM t3
WHERE ic = 10
)
AND ifc IN (
SELECT ifc
FROM t3
WHERE ic = 12
)
AND ic IN (5, 7)

If you have to use an intersect as you have done you should change your upper query which is wrong. Since you have to ensure every if has a 10 and 12 as ic then you can't get away without two separate queries for that. Something like:
SELECT ifc
FROM t3
WHERE ifc IN (
SELECT ifc
FROM t3
WHERE ic = 10
)
AND ifc IN (
SELECT ifc
FROM t3
WHERE ic = 12
)
INTERSECT
SELECT ifc FROM t3 WHERE ic IN (5, 7)
The INTERSECT will handle the group by here so you dont have to explicitly add but this will not be as efficient as my other queries. If you have to get away with subqueries, you can use JOIN:
SELECT DISTINCT t.ifc
FROM t3 AS t
JOIN t3 AS v ON v.ifc = t.ifc
JOIN t3 AS p ON p.ifc = t.ifc
WHERE v.ic = 10 AND p.ic = 12 AND t.ic IN (5, 7)
The second one has the advantage that it works on databases that doesn't know INTERSECT like MySQL.

Related

Improve the speed of this string_agg?

I have data of the following shape:
BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols
Yet when I try to do string_agg, my code will take me well over 10 minutes to execute on 500 rows. How can I improve this query (the data is not available).
select
BOM.*,
childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
FROM (
SELECT DISTINCT PROJECT3.NAME
FROM [dbo].[Project] PROJECT3
LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3
The subquery (within a subquery) you have has a code "smell" to it that it's been written with intention, but not correctly.
Firstly you have 2 LEFT JOINs in the subquery, however, both the tables aliased as P3 and PP3 are required to have a non-NULL value; that is impossible if no related row is found. This means the JOINs are implicit INNER JOINs.
Next you have a DISTINCT against a single column when SELECTing from multiple tables; this seems wrong. DISTINCT is very expensive and the fact you are using it implies that either NAME is not unique or that due to your implicit INNER JOINs you are getting duplicate rows. I assume it's the latter. As a results, very likely you should actually be using an EXISTS, not LEFT JOINs INNER JOINs.
The following is very much a guess, but I suspect it will be more performant.
SELECT BOM.*, --Replace this with an explicit list of the columns you need
SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
FROM dbo.Project Prj --Don't use an alias that is longer than the object name
WHERE EXISTS (SELECT 1
FROM dbo.Part P
JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
WHERE PP.Related_ID = Prg.ID
AND P.ITEM_NUMBER = BOM.childParentPart
AND P.Current = 1)) SA;

SQL query: Iterate over values in table and use them in subquery

I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.
SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.
In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];
It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value

SQL SERVER SELECT sum values

Problem is when I try get sum values from 2 different tables, but using condition from table 3 result are corrupted by wrong sum result . So I tried Select sum() as t1 (select sum()...) as t2 and I want to sum t1 and t2, in this way t1 and t2 result are correct
so there are code
SELECT
SUM(daa.[price]) AS t1,
(
SELECT SUM(dap.[price]) AS suma
FROM fydtr.dbo.[sales] AS dap,
[fydtr].[dbo].[work info] AS di
WHERE YEAR(di.[end of work datetime]) = 2013
AND MONTH(di.[end of work datetime]) = 12
AND di.[state] = 'e'
AND di.[reg. nr.] = dap.[reg. nr.]
) AS t2
FROM [fydtr].[dbo].[work sale] AS daa,
fydtr.dbo.[work info] AS dbi
WHERE YEAR(dbi.[end of work datetime]) = 2013
AND MONTH(dbi.[end of work datetime]) = 12
AND dbi.[state] = 'e'
AND dbi.[reg. nr.] = daa.[reg. nr.]
It gives result
t1 340
t2 509
And I need sum these and get 849 as t3.
What about something like this.
select t1, t2, t1 + t2 t3
from (
the query from your question
) temp
Not completely clear, due to missing input data. But I assume you're searching for something like this:
select sum(sale.pice) as t1
, sum(sales.price) as t2
, sum(sales.price) + sum(sale.price) as t3
from [work info] as info
left outer join [work sale] as sale on (info.state = 'e' and info.[reg. nr.] = sale.[reg. nr.])
left outer join [work sales] as sales on (info.state = 'e' and info.[reg. nr.] = sales.[reg. nr.])
where year(info.[end of work datetime]) = 2013
and month(info.[end of work datetime]) = 12
This depends on the relations between your tables, e.g. in my example, I'm assuming that there's only one entry per [reg. nr.] in all tables. Otherwise you could use "window functions", or UNIONS or CTE's (http://msdn.microsoft.com/en-us/library/ms175972.aspx). You might need to supply more context to get the answer you're searching for.
My given query is probably a little bit cleaner than your query. If that's not an issue, or if my assumption is wrong, then Dan Bracuk's answer helps you out.
And you should probably look at the column names, too. They're a little bit too complex in my opinion :)

SQL Query Assistnace

This query gives all the templateId that are assigned to product id less than 5 and that's the expected output. But we wanted to achieve that without using the sub query in the where clause (Highlighted in red).
If I just remove the subquery then the output will be all the templateId from templateproduct table. we don't wnat that. what we want is the template id that's only assigned from product 1 to 5. so our expected output is:
100
102
today we are acheiving this using additional subquery, we wanted to acheive the same result without using the subquery.
we are using sql 2008
You can do this using a LEFT JOIN in place of a subquery, and check for NULL.
SELECT DISTINCT tp.template_id
FROM templateproduct tp
LEFT JOIN templateproduct tp2
ON tp.template_id = tp2.template_id AND tp2.prod_id IN (6, 7, 8, 9, 10)
WHERE tp.prod_id < 5
AND tp2.template_id IS NULL
You can do a similar thing using GROUP BY and check that there are 0 matching templates linking to the excluded product ids:
SELECT tp.template_id
FROM templateproduct tp
LEFT JOIN templateproduct tp2
ON tp.template_id = tp2.template_id AND tp2.prod_id IN (6, 7, 8, 9, 10)
WHERE tp.prod_id < 5
GROUP BY tp.template_id
HAVING COUNT(tp2.template_id) = 0
Depending on your data and indexing, this may or may not be more efficient than a subquery - I suggest you try it out. In any event, there is no reason to have any INNER JOINs at all to get the results you are seeking:
SELECT DISTINCT tp.template_id
FROM templateproduct tp
WHERE tp.prod_id < 5
AND tp.template_id NOT IN (
SELECT tp2.template_id FROM templateproduct tp2 WHERE tp2.prod_id IN (6, 7, 8, 9, 10)
)
Try these out and see which performs better for you. And of course, check the query plan to see why this is.
It's because your where clause should be tp.prod_id <= 5
Not 100% sure about the layout of your columns but something like this perhaps:
SELECT
TemplateID,
--Other fields here
FROM
Product pr INNER JOIN TemplateProduct tp on pr.prod_id = tp.prod_id
INNER JOIN Template tr on tp.templateID = tr.templateID AND tr.prod_id = pr.prod_id
WHERE
tp.prod_id < 5

sql parameterised cte query

I have a query like the following
select *
from (
select *
from callTableFunction(#paramPrev)
.....< a whole load of other joins, wheres , etc >........
) prevValues
full join
(
select *
from callTableFunction(#paramCurr)
.....< a whole load of other joins, wheres , etc >........
) currValues on prevValues.Field1 = currValues.Field1
....<other joins with the same subselect as the above two with different parameters passed in
where ........
group by ....
The following subselect is common to all the subselects in the query bar the #param to the table function.
select *
from callTableFunction(#param)
.....< a whole load of other joins, wheres , etc >........
One option is for me to convert this into a function and call the function, but i dont like this as I may be changing the
subselect query quite often for.....or I am wondering if there is an alternative using CTE
like
with sometable(#param1) as
(
select *
from callTableFunction(#param)
.....< a whole load of other joins, wheres , etc >........
)
select
sometable(#paramPrev) prevValues
full join sometable(#currPrev) currValues on prevValues.Field1 = currValues.Field1
where ........
group by ....
Is there any syntax like this or technique I can use like this.
This is in SQL Server 2008 R2
Thanks.
What you're trying to do is not supported syntax - CTE's cannot be parameterised in this way.
See books online - http://msdn.microsoft.com/en-us/library/ms175972.aspx.
(values in brackets after a CTE name are an optional list of output column names)
If there are only two parameter values (paramPrev and currPrev), you might be able to make the code a little easier to read by splitting them into two CTEs - something like this:
with prevCTE as (
select *
from callTableFunction(#paramPrev)
.....< a whole load of other joins, wheres , etc
........ )
,curCTE as (
select *
from callTableFunction(#currPrev)
.....< a whole load of other joins, wheres , etc
........ ),
select
prevCTE prevValues
full join curCTE currValues on
prevValues.Field1 = currValues.Field1 where
........ group by
....
You should be able to wrap the subqueries up as parameterized inline table-valued functions, and then use them with an OUTER JOIN:
CREATE FUNCTION wrapped_subquery(#param int) -- assuming it's an int type, change if necessary...
RETURNS TABLE
RETURN
SELECT * FROM callTableFunction(#param)
.....< a whole load of other joins, wheres , etc ........
GO
SELECT *
FROM
wrapped_subquery(#paramPrev) prevValues
FULL OUTER JOIN wrapped_subquery(#currPrev) currValues ON prevValues.Field1 = currValues.Field1
WHERE ........
GROUP BY ....
After failing to assign scalar variables before with, i finally got a working solution using a stored procedure and a temp table:
create proc hours_absent(#wid nvarchar(30), #start date, #end date)
as
with T1 as(
select c from t
),
T2 as(
select c from T1
)
select c from T2
order by 1, 2
OPTION(MAXRECURSION 365)
Calling the stored procedure:
if object_id('tempdb..#t') is not null drop table #t
create table #t([month] date, hours float)
insert into #t exec hours_absent '9001', '2014-01-01', '2015-01-01'
select * from #t
There may be some differences between my example and what you want depending on how your subsequent ON statements are formulated. Since you didn't specify, I assumed that all the subsequent joins were against the first table.
In my example I used literals rather than #prev,#current but you can easily substitute variables in place of literals to achieve what you want.
-- Standin function for your table function to create working example.
CREATE FUNCTION TestMe(
#parm int)
RETURNS TABLE
AS
RETURN
(SELECT #parm AS N, 'a' AS V UNION ALL
SELECT #parm + 1, 'b' UNION ALL
SELECT #parm + 2, 'c' UNION ALL
SELECT #parm + 2, 'd' UNION ALL
SELECT #parm + 3, 'e');
go
-- This calls TestMe first with 2 then 4 then 6... (what you don't want)
-- Compare these results with those below
SELECT t1.N AS AN, t1.V as AV,
t2.N AS BN, t2.V as BV,
t3.N AS CN, t3.V as CV
FROM TestMe(2)AS t1
FULL JOIN TestMe(4)AS t2 ON t1.N = t2.N
FULL JOIN TestMe(6)AS t3 ON t1.N = t3.N;
-- Put your #vars in place of 2,4,6 adding select statements as needed
WITH params
AS (SELECT 2 AS p UNION ALL
SELECT 4 AS p UNION ALL
SELECT 6 AS p)
-- This CTE encapsulates the call to TestMe (and any other joins)
,AllData
AS (SELECT *
FROM params AS p
OUTER APPLY TestMe(p.p)) -- See! only coded once
-- Add any other necessary joins here
-- Select needs to deal with all the columns with identical names
SELECT d1.N AS AN, d1.V as AV,
d2.N AS BN, d2.V as BV,
d3.N AS CN, d3.V as CV
-- d1 gets limited to values where p = 2 in the where clause below
FROM AllData AS d1
-- Outer joins require the ANDs to restrict row multiplication
FULL JOIN AllData AS d2 ON d1.N = d2.N
AND d1.p = 2 AND d2.p = 4
FULL JOIN AllData AS d3 ON d1.N = d3.N
AND d1.p = 2 AND d2.p = 4 AND d3.p = 6
-- Since AllData actually contains all the rows we must limit the results
WHERE(d1.p = 2 OR d1.p IS NULL)
AND (d2.p = 4 OR d2.p IS NULL)
AND (d3.p = 6 OR d3.p IS NULL);
What you want to do is akin to a pivot and so the complexity of the needed query is similar to creating a pivot result without using the pivot statement.
Were you to use Pivot, duplicate rows (such as I included in this example) would be aggreagted. This is also a solution for doing a pivot where aggregation is unwanted.