I have a somewhat large query (~8 tables being joined, 100 lines of sql or so) that I use frequently in order to join many different data sources together. I would like to be able to pass a parameter into the query so that myself and other members of my team can work off the same code base and just change the time period that we are looking at.
Example code (obviously my actual code is more involved than this, but these are some of the things that I need to do):
SELECT t1.* , x.col1, x.SUM_col3
FROM table1 t1
LEFT JOIN
(
SELECT t2.col1, t3.col2, SUM(t3.col3) as SUM_col3
FROM table2 t2
INNER JOIN table3 t3
ON t2.PI = t3.SI
WHERE t3.col2 NOT LIKE :parameter1
GROUP BY 1,2
QUALIFY ROW_NUMBER() OVER(PARTITION BY t2.col1 ORDER BY t3.col1) = 1
) x
ON t1.col1 = x.col1
WHERE t1.START_DATE >= :parameter2
Solutions I have considered:
Using the '?' prefix in order to enter the parameters at run time. I find this method to be quite inefficient as the entered parameter is retained in the code after running the code.
Using Dynamic SQL to create a procedure which can then be called. I'm struggling to get this method to work as its intended purpose seems to be relatively short queries.
How can I best structure my query and code in order to be able to run something like CALL MY_QUERY(:parameter1,:parameter2) in such a way that it either creates the resulting table in memory (less preferred) or returns a result set that I can then store or use myself (more preferred).
What you are wanting is a Macro in Teradata. A Macro is pretty much just a parameterized view, which is exactly what you are talking about here.
CREATE MACRO myMacro (parameter1 VARCHAR(20), parameter2 DATE)
AS
(
SELECT t1.*
FROM table1 t1
LEFT JOIN
(
SELECT t2.col1, t3.col2, SUM(t3.col3)
FROM table2 t2
INNER JOIN table3 t3
WHERE t3.col2 NOT LIKE :parameter1
GROUP BY 1,2
QUALIFY ROW_NUMBER() OVER(PARTITION BY t2.col1 ORDER BY t3.col1) = 1
) x
ON t1.col1 = x.col1
WHERE t1.START_DATE >= :parameter2;
);
To call it:
Execute myMacro('Harold', DATE '2015-01-01');
Related
I'm new to SQL, and I'm trying to determine if you can define objects in it like you can in R. For instance, I have a query where I need to make sure the date is the same in each table.
SELECT *
FROM table_1 AS t1
WHERE t1.date = '2021-02-14'
LEFT JOIN table_2 AS t2
ON t1.id = t2.id
WHERE t2.date = '2021-02-14';
I want to run this query over and over, but it's cumbersome to change the date each time and opens the chance for typos. If this were R, I could do something like:
my_date <- '2021-02-14'
SELECT *
FROM table_1 AS t1
WHERE t1.date = my_date
LEFT JOIN table_2 AS t2
ON t1.id = t2.id
WHERE t2.date = my_date;
And only have to change it in one place. Is something like this possible in SQL? If not, what would you recommend as the easiest way to avoid copy-and-pasting or using Control + F.
Currently, my main focus is on dates, but if you have a solution that would work for any data type (i.e., a number value that's used in many WHERE statements), that'd be awesome.
Update:
A coworker shared with me his solution that works and is extendable beyond dates. You make a common-table expression (CTE) at the start of your query that contains your filters. Then, plug it in throughout, so you only need to change it in one place. For instance:
WITH filt AS (
SELECT
*
FROM (
VALUES
('2021-02-14')
) AS tbl (date_filter)
)
SELECT
*
FROM table_1
WHERE
date IN (
SELECT
date_filter
FROM filt
)
Always had a hard time wrapping my head around GROUP BY functionality, and this one is no exception.
I have a simple Join query as such
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Order By t2.datetime, t2.id
This returns my data as expected. Here are some sample rows that illustrate what I am trying to retrieve with Group By...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
32
9/12/2021
nameA
726
4506
33
9/12/2021
nameB
726
4506
30
9/13/2021
nameC
I want to grab ONLY the first row in each Group of t1.g1, t1.g2.
So, I try the following:
Select t1.g1, t1.g2, FIRST(t2.id), FIRST(t2.datetime), FIRST(t3.name)
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Group By t1.g1, t1.g2
Order By FIRST(t2.datetime), FIRST(t2.id)
For the example Group above, this returns the following record...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
30
9/13/2021
nameC
So, Order By operates after the Grouping is done, not before. Or so it seems. Perhaps the reason for the order of the SQL keywords (Select, From, Where, Group By, Order By). Ok, makes sense if my assumption is correct. I think it finds t2.id=30 ahead of the other 726/4506 records because t2.id is a primary key on table2.
So, now I try a nested Query, wherein my first query above returns the data in the correct order and the outside query groups and grabs the first record.
Select t1.g1, t1.g2, FIRST(t2.id), FIRST(t2.datetime), FIRST(t3.name)
FROM (
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Order By t2.datetime, t2.id
)
Group By t1.g1, t1.g2
Order By FIRST(t2.datetime), FIRST(t2.id)
Same results! I am at a loss to understand how this is happening. So, if anyone can shed light on the order of functioning under-the-covers for Access SQL in this instance I would love to know. On my 2nd query (nested Select), it seems as though I am ordering the target data such that after Grouping the FIRST() aggregate function should select the first row found in the inner result set. But that is not happening.
And of course, if anyone can tell me how to return the row I am after ...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
32
9/12/2021
nameA
That is all I really need.
I want to grab ONLY the first row in each Group of t1.g1, t1.g2.
You don't want aggregation. You want to filter the data. In this case, a correlated subquery does what you want:
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From (table1 t1 Inner Join
table2 t2
on t1.fld1 = t2.fld1
) Inner Join
table3 t3
on t1.fld2 = t3.fld2
where t2.id = (select top 1 tt2.id
from (table1 tt1 Inner Join
table2 tt2
on tt1.fld1 = tt2.fld1
) Inner Join
table3 tt3
on tt1.fld2 = tt3.fld2
where tt1.g1 = t1.g1 and tt1.g2 = t1.g2
order by tt2.datetime, tt2.id
);
Here is a solution that scales well (6s on 250k recs in t2) and does what I am asking for.
I could not get Gordon's answer to work in Access. Seems like it should have however. And I have my doubts about how well it would perform with 250k recs in t2. I would love to test a solution like Gordon's, if I could figure out how to get Access to take it.
See problem description for an example on exactly which record I am after. I only need t2.id from the result set. This was not stated originally, but I don't see how that changes the problem statement or solution. I could be wrong there. I still need t3.name, but it can be retrieved later using t2.id.
But I still need to pick the record GROUP'd BY t1.g1, t1.g2 that comes first when all records are sorted by t2.dateandtime, t2.id. Or stated another way, amongst all records with the same t1.g1+t1.g2, I need exactly the first record when the group is sorted by "t2.dateandtime, t2.id".
Perhaps I am thinking about this solution to my problem all wrong, and there are better ways to resolve this with SQL; if so, I would love to hear it.
I seem to have learned that GROUP BY does group records together based on this SQL clause, but this grouping loses any concept of individual records at this point; e.g. you can only extract other fields by using an Aggregate Function (MIN, MAX, SUM, etc), but - and importantly - FIRST does not get the value of the record that you can predict, as the ORDER BY clause has not been performed yet.
With all that said, here is my solution that works.
I removed reference to the Join on t3 as with t2.id I can retrieve all the other info I need from t3 after the fact, using t2.id.
Don't need to select 't1.g1, t1.g2', that is superfluous. I originally thought that any Group By fields had to be specified in the Select clause also.
I combine t2.dateandtime and t2.id into a Text field and use MIN to Select the data/record I am after once it is GROUP'd BY. No need to Sort my result set, as the record with the MIN value of t2.dateandtime, then t2.id has been chosen! Thus satisfying my condition and selection of the correct record.
Since all I need is t2.id returned for further processing, I extract t2.id from the String built in #3 and convert back to Long data type.
Here is the brief and simple query:
Select
MIN(Format(t2.dateandtime, "yyyymmddhhmmss") & '_' & Format(t2.id, '000000')) as dt_id,
CLNG(MID(dt_id, INSTR(dt_id, '_') + 1)) as id
From
(table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Group By
t1.g1, t1.g2
The following query takes approximately 30 seconds to give results.
table1 contains ~20m lines
table2 contains ~10000 lines
I'm trying to find a way to improve performances. Any ideas ?
declare #PreviousMonthDate datetime
select #PreviousMonthDate = (SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19000101', GETDATE()) - 1, '19000101') as [PreviousMonthDate])
select
distinct(t1.code), t1.ent, t3.lib, t3.typ from table1 t1, table2 t3
where (select min(t2.dat) from table1 t2 where t2.code=t1.code) >#PreviousMonthDate
and t1.ent in ('XXX')
and t1.code=t3.cod
and t1.dat>#PreviousMonthDate
Thanks
This is your query, more sensibly written:
select t1.code, t1.ent, t2.lib, t2.typ
from table1 t1 join
table2 t2
on t1.code = t2.cod
where not exists (select 1
from table1 tt1
where tt1.code = t1.code and
tt1.dat <= #PreviousMonthDate
) and
t1.ent = 'XXX' and
t1.dat > #PreviousMonthDate;
For this query, you want the following indexes:
table1(ent, dat, code) -- for the where
table1(code, dat) -- for the subquery
table2(cod, lib, typ) -- for the join
Notes:
Table aliases should make sense. t3 for table2 is cognitively dissonant, even though I know these are made up names.
not exists (especially with the right indexes) should be faster than the aggregation subquery.
The indexes will satisfy the where clause, reducing the data needed for filtering.
select distinct is a statement. distinct is not a function, so the parentheses do nothing.
Never use comma in the FROM clause. Always use proper, explicit, standard JOIN syntax.
In Oracle 11g Express, I have the following query:
select t1.product_name, SUM(t1.product_cost_per_month)
FROM table1 t1 INNER JOIN table2 t2 on (t1.product_name = t2.product_name)
WHERE t2.date > sysdate
GROUP BY t1.product_name
This works, it returns the sum of the cost of products per month, group by product after a certain date (I just use sysdate here as an example).
However, I would like to display some additional description about each product, i.e the vendor. So I use this code:
select t1.product_name, SUM(t1.product_cost_per_month), t2.vendor
FROM table1 t1 INNER JOIN table2 t2 ON (t1.product_name = t2.product_name)
WHERE t2.date > sysdate
GROUP BY t1.product_name
This doesn't work because all variables need to have an aggregation function applied to them to use "Group by", but an aggregation function for something like "vendor" seems meaningless... So is there a way to do this?
I am probably going to write a short pl/sql routine to solve, but I am wondering if there is a purely SQL way to do this?
Vendor should also be included in the GROUP BY clause.
GROUP BY t1.product_name, t2.vendor
Another technique to achieve what you're doing would be a nested query:
SELECT t1.product_name,
(
select sum(product_cost_per_month)
from table2 t2
where
t1.product_name = t2.product_name
and t2.date > sysdate
) as total_product_cost,
t1.another_field,
t1.another_field2,
t1.another_field3
FROM table1
(Apologies for any errors, I didn't test this but this should give you the gist of it)
I have to execute a SQL made from some users and show its results. An example SQL could be this:
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where table1.id = table2.id
This SQL works fine as it is, but I need to manually add pagination and show the rownum, so the SQL ends up like this.
SELECT z.*
FROM(
SELECT y.*, ROWNUM rn
FROM (
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where table1.id = table2.id
) y
WHERE ROWNUM <= 50) z
WHERE rn > 0
This throws an exception: "ORA-00918: column ambiguously defined" because both Table1 and Table2 contains a field with the same name ("id").
What could be the best way to avoid this?
Regards.
UPDATE
In the end, we had to go for the ugly way and parse each SQL coming before executing them. Basically, we resolved asterisks to discover what fields we needed to add, and alias every field with an unique id. This introduced a performance penalty but our client understood it was the only option given the requirements.
I will mark Lex answer as it´s the solution we ended up working on.
I think you have to specify aliasses for (at least one of) table1.id and table2.id. And possibly for any other corresponding columnnames as well.
So instead of SELECT t1.*, t2.* FROM table1 t1, table2 use something like:
SELECT t1.id t1id, t2.id t2id [rest of columns] FROM table1 t1, table2 t2
I'm not familiar with Oracle syntax, but I think you'll get the idea.
I was searching for an answer to something similar. I was referencing an aliased sub-query that had a couple of NULL columns. I had to alias the NULL columns because I had more than one;
select a.*, t2.column, t2.column, t2.column
(select t1.column, t1.column, NULL, NULL, t1.column from t1
where t1='VALUE') a
left outer join t2 on t2.column=t1.column;
Once i aliased the NULL columns in the sub-query it worked fine.
If you could modify the query syntactically (or get the users to do so) to use explicit JOIN syntax with the USING clause, this would automatically fix the problem at hand:
SELECT t1.*, t2.*
FROM table1 t1
JOIN table2 t2 USING (id)
The USING clause does the same as ON t1.id = t2.id (or the implicit JOIN you have in the question), except that only one id column remains in the result, thereby eliminating your problem.
You would still run into problems if there are more columns with identical names that are not included in the USING clause. Aliases as described by #Lex are indispensable then.
Use replace null values function to fix this.
SELECT z.*
FROM(
SELECT y.*, ROWNUM rn
FROM (
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where
NVL(table1.id,0) = NVL(table2.id,0)
) y
WHERE ROWNUM <= 50) z
WHERE rn > 0