SQL query: Iterate over values in table and use them in subquery - sql

I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.

SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.

In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];

It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value

Related

How to get different data from two different tables in SQL query?

I have two table named Soft and Web, table containing multiple data in that which data is different that data I want. For Ex :
In soft table containing 5 data i.e.
Also in Web table containing 5 data i.e.
Now I want output i.e.
I have done query but unfortunately didnt succed, lets see my query i.e.
SELECT DISTINCT soft.GSTNo AS SoftGST
,web.GSTNo AS WebGST
,soft.InvoiceNumber AS SoftInvoice
,web.InvoiceNumber AS WebInvoice
,soft.Rate AS SoftRate
,web.Rate AS WebRate
FROM soft
LEFT OUTER JOIN web ON web.GstNo = soft.GSTNo
AND web.InvoiceNumber = soft.invoicenumber
AND web.rate = soft.rate
Also I apply inner join bt same thing didnt work.
You can achieve this by
;WITH cte_soft AS
(SELECT * FROM soft
EXCEPT
SELECT * FROM web)
,cte_web AS
(SELECT * FROM web
EXCEPT
SELECT * FROM soft)
SELECT *
FROM
(SELECT gst softgst, NULL webgst, invoice softinvoice, NULL webinvoice, rate softrate, NULL webrate
FROM cte_soft
UNION ALL
SELECT NULL, gst, NULL, invoice, NULL , rate
FROM cte_web) tbl
ORDER BY coalesce(softgst, webgst),coalesce(softinvoice,webinvoice)
Fiddle
You can use full join:
SELECT s.gst as softgst, w.gst as webgst,
s.invoice as softinvoice, w.invoice as webinvoice,
s.rate as softrate, w.rate as webrate
FROM soft s FULL JOIN
web w
ON s.gst = w.gst AND s.invoice = w.invoice AND s.rate = w.rate
WHERE s.gst IS NULL OR w.gst IS NULL
ORDER BY COALESCE(s.gst, w.gst), COALESCE(s.invoice, w.invoice);
No subqueries are CTEs are needed. This is really just a slight variant of your query.

Problems shortening a SQL query

I am trying to make a query that works with a temp table, work without that temp table
I tried doing a join in the subquery without the temp table but I don't get the same results as the query with the temp table.
This is the query with the temp table that works as I want:
create table #results(
RowId id_t,
LastUpdatedAt date_T
)
insert into #results
select H.RowId, H.LastUpdatedAt from MemberCarrierMap M Join MemberCarrierMapHistory H on M.RowId = H.RowId
update MemberCarrierMap
set CreatedAt = (select MIN(LastUpdatedAt) from #results r where r.rowId = MemberCarrierMap.rowId)
Where CreatedAt is null;
and here is the query I tried without the temp table that doesn't work like the above:
update MemberCarrierMap
set CreatedAt = (select MIN(MH.LastUpdatedAt) from MemberCarrierMapHistory MH join MemberCarrierMap M on MH.RowId = M.RowId where MH.RowId = M.RowId )
Where CreatedAt is null;
I was expecting the 2nd query to work as the first but It is not. Any suggestions on how to achieve what the first query does without the temp table?
This should work:
update M
set M.CreatedAt = (select MIN(MH.LastUpdatedAt) from MemberCarrierMapHistory MH WHERE MH.RowId = M.RowId)
FROM MemberCarrierMap M
Where M.CreatedAt is null;
Your question is more or less a duplicate of this answer. There, you will find multiple solutions. But the ones that implement correlated subqueires are less performant than the one that simply uses an uncorrelated aggregation subquery inside a join.
Applying it to your situation, you will have this:
update m
set m.createdDate = hAgg.maxVal
from memberCarrierMap m
join (
select rowId, max(lastUpdatedAt) as maxVal
from memberCarrierMapHistory
group by rowId
) as hAgg
on m.rowId = hAgg.rowId
where m.createdAt is null;
Basically, it's more performant because it is more expensive to run aggregations and filterings on a row-by-row basis (which is what happens in a correlated subquery) than to just get the aggregations out of the way all at once (joins tend to happen early in processing) and perform the match afterwards.

Best way to compare two sets of data w/ SQL

What I have is a query that grabs a set of data. This query is ran at a certain time. Then, 30 minutes later, I have another query (same syntax) that runs and grabs that same set of data. Finally, I have a third query (which is the query in question) that compares both sets of data. The records it pulls out are ones that agree with: if "FEDVIP_Active" was FALSE in the first data set and TRUE in the second data set, OR "UniqueID" didn't exist in the first data set and does in the second data set AND FEDVIP_Active is TRUE. I'm questioning the performance of the query below that does the comparison. It times out after 30 minutes. Is there anything you can see that I shouldn't be doing in order to be the most efficient to run? The two identical-ish data sets I'm comparing have around a million records each.
First query that grabs the initial set of data:
select Unique_ID, First_Name, FEDVIP_Active, Email_Primary
from Master_Subscribers_Prospects
Second query is exactly the same as the first.
Then, the third query below compares the data:
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b
on 1 = 1
where a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0 or
(b.Unique_ID not in (select Unique_ID from Master_Subscribers_Prospects_1) and b.FEDVIP_Active = 1)
If I understand correctly, you want all records from the second data set where the corresponding unique id in the first data set is not active (either by not existing or by having the flag set to not active).
I would suggest exists:
select a.*
from Master_Subscribers_Prospects_1 a
where a.FEDVIP_Active = 1 and
not exists (select 1
from Master_Subscribers_Prospects_2 b
where b.Unique_ID = a.Unique_ID and
b.FEDVIP_Active = 1
);
For performance, you want an index on Master_Subscribers_Prospects_2(Unique_ID, FEDVIP_Active).
An inner join on 1 = 1 is a disguised cross join and the number of rows a cross join produces can grow rapidly. It's the product of the number of rows in both relations involved. For performance you want to keep intermediate results as small as possible.
Then instead of IN EXISTS is often performing better, when the number of rows of the subquery is large.
But I think you don't need IN or EXITS at all.
Assuming unique_id identifies a record and is not null, you could left join the first table to the second one on common unique_ids. Then if and only if no record for an unique_id in the second table exits the unique_id of the first table in the result of the join is null, so you can check for that.
SELECT b.fedvip_active,
b.unique_id,
b.first_name,
b.email_primary
FROM master_subscribers_prospects_2 b
LEFT JOIN master_subscribers_prospects_1 a
ON b.unique_id = a.unique_id
WHERE a.fedvip_active = 1
AND b.fedvip_active = 0
OR a.unique_id IS NULL
AND b.fedvip_active = 1;
For that query indexes on master_subscribers_prospects_1 (unique_id, fedvip_active) and master_subscribers_prospects_2 (unique_id, fedvip_active) might also help to speed things up.
Doing an inner select in where sats is always bad.
Here is a same version with a left join, that might work for you.
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b on 1 = 1
left join Master_Subscribers_Prospects_1 sa on sa.Unique_ID = b.Unique_ID
where (a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0) or
(sa.Unique_ID is null and b.FEDVIP_Active = 1)

Exists, or Within

I'm writing a some filtering logic that basically wants to first check if there's a value in the filter table, then if there is return the filtered values. When there isn't a value in the filter table just return everything. The following table does this correctly but I have to write the same select statement twice.
select *
from personTbl
where (not exists (select filterValue from filterTable where filterType = 'name') or
personTbl.name in (select filterValue from filterTable where filterType = 'name'))
Is there some better way to do this that will return true if the table is empty, or the value is contained within it?
One approach is to do a left outer join to your filter-subquery, and then select all the rows where the join either failed (meaning that the subquery returned no rows) or succeeded and had the right value:
SELECT personTbl.*
FROM personTbl
LEFT
OUTER
JOIN ( SELECT DISTINCT filterValue
FROM filterTable
WHERE filterType = 'name'
) filter
ON 1 = 1
WHERE filter.filterValue = personTbl.name
OR filter.filterValue IS NULL
;
To be honest, I'm not sure if the above is a good idea — it's not very intuitive1, and I'm not sure how well it will perform — but you can judge for yourself.
1. As evidence of its unintuitiveness, witness the mistaken claim below that it doesn't work. As of this writing, that comment has garnered two upvotes. So although the query is correct, it clearly inspires people to great confidence that it's wrong. Which is a nice party trick, but not generally desirable in production code.
You can use a collection to try to make the query more intuitive (and only require a single select from the filter table):
CREATE TYPE filterlist IS TABLE OF VARCHAR2(100);
/
SELECT p.*
FROM PersonTbl p
INNER JOIN
( SELECT CAST(
MULTISET(
SELECT filterValue
FROM filterTable
WHERE filterType = 'name'
)
AS filterlist
) AS filters
FROM DUAL ) f
ON ( f.filters IS EMPTY OR p.name MEMBER OF f.filters );

How do I INSERT INTO where many fields have their own Select Statements?

I created a table and i am in the process of inserting rows from another table into it. However, some of these rows require joins from other tables. To my knowledge, this means using a subquery select statement in the statement. the problem is subqueries only return one result, where i may have many. I am wanting to return a -1 where no records exists. Here is an example i am using but it is not working:
INSERT INTO [BDW_ReportPrototype].[dbo].[CustomerCreditFact]
( [MortgageDimID]
,[LeaseDimID]
,[OREODimID]
,[OfficerTypeDimID] )
SELECT
--[MortgageDimID]
-2
--LeaseDimID
,-2
--OREODimID
,-2
,CASE WHEN OfficerTypeDimID IS NULL THEN -1 ELSE OfficerTypeDimID END
FROM Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN ERMA..OfficerTypeDim OTD on OTD.OfficerNum = LCD.OFFICER
FROM dbo.Staging_FDB_LN_CPDM_Daily
Try this sql statement
SELECT CASE WHEN OfficerTypeDimID IS NULL THEN -1 ELSE OfficerTypeDimID END
FROM Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN ERMA..OfficerTypeDim OTD on OTD.OfficerNum = LCD.OFFICER
I would rework your query like the following.
First of all, use a LEFT OUTER JOIN in your query instead of the subqueries. This type of join says a row might exist in the "other" table but it might not but I want a row back regardless.
Now that you know you'll have all your rows, you'll want to see if there is a value there or not. Use the shorthand and easier to maintain check via the coalesce function. It basically is a list of values (column names, variables or hard coded values) and the optimizer will pick the first non-null value from the list and use it. Here we supply -1 for your query
INSERT INTO
[BDW_ReportPrototype].[dbo].[CustomerCreditFact]
(
[OfficerTypeDimID]
)
SELECT
-- coalesce returns the first non-null value
COALESCE(OTD.OfficerTypeDimID, -1) AS OfficerTypeDimID
FROM
dbo.Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN
ERMA..OfficerTypeDim OTD
ON OTD.OfficerNum = LCD.OFFICER
maybe something along these lines...
INSERT INTO [BDW_ReportPrototype].[dbo].[CustomerCreditFact]
([OfficerTypeDimID])
Select OfficerTypeDimID
from ERMA..OfficerTypeDim OTD
inner JOIN Staging_FDB_LN_CPDM_Daily LCD
on OTD.OfficerNum = LCD.OFFICER
UNION ALL
SELECT -1
FROM dbo.Staging_FDB_LN_CPDM_Daily LCD
WHERE NOT EXISTS
(
Select OfficerTypeDimID from ERMA..OfficerTypeDim
OTD
WHERE
OTD.OfficerNum = LCD.OFFICER
)