How to reuse a sub query in sql? - sql

I have query like the following
select columns
from (select columns1
from result_set
where condition_common and condition1) as subset1
join
(select columns2
from result_set
where condition_common and condition2) as subset2
on subset1.somekey = subset2.somekey
I want to somehow reuse
select columns
from result_set
where condition_common
I have oversimplified the above query, but the above select in reality is huge and complicated. I dont want to have the burden of making sure both are in sync
I dont have any means of programmatically reusing it. T-SQL is ruled out. I can only write simple queries. This is an app limitation.
Is there a way to reuse same subquery, in a single statement

Use a Common Table Expression (CTE) if you're using SQL Server 2005+:
with cte as (
select columns
from result_set
where condition_common
)
select columns
from cte as subset1
join
cte as subset2
on subset1.somekey = subset2.somekey
where otherconditions

Related

sql slow postgresql dbeaver

I am using DBeaver to query a PostgreSQL database.
I have this query, it simply selects the highest id per Enterprise_Nbr. The query works but is really slow. Is there any way I can rewrite the query to improve performance.
I am using the querytool DBeaver because I don't have direct access to PostgreSQL. The ultimate goal is to link the PostgreSQL with PowerBi.
select *
from public.address 
where "ID"  in (select max("ID")
from public.address a 
group by "Enterprise_Nbr")
Queries for greatest-n-per-group problems are typically faster if done using Postgres' proprietary distinct on () operator
select distinct on ("Enterprise_Nbr") *
from public.address
order by "Enterprise_Nbr", "ID" desc;
Your query could rewrite as: per each value of Enterprise_Nbr, retrieve row which there is not exists other rows that have same Enterprise_Nbr and greater ID.
SELECT *
FROM public.address a
WHERE NOT EXISTS (
SELECT 1
FROM public.address b
WHERE b.Enterprise_Nbr = a.Enterprise_Nbr AND b.ID > a.ID
)

SQL CTE for a numbers table is fast for static value, but slow for table value

I'm trying to output the values of a table multiple times, based on a column in that table.
I tried to use CTE to make a numbers table on the fly:
WITH cte AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (select 0)) AS i
FROM
sys.columns c1 CROSS JOIN sys.columns c2 CROSS JOIN sys.columns c3
)
select *
from myTable, cte
WHERE i <= myTable.timesToRepeatColumn
and myTable.id = '209386'
This SQL seems to take forever to run, so it seems to be trying to run the entire CTE before joining.
If I replace myTable.timesToRepeatColumn with a static value (say 10000), the query returns virtually instantly. So it seems to be doing the where i <= before fully cross-joining the CTE's table.
How can I tell SQL to do the where statement first like it does with a static number?
you can use recursive cte to achieve your goal
WITH cte AS (
SELECT
*
, timesToRepeatColumn as level
FROM
myTablewhere
WHERE myTable.id = '209386'
UNION ALL
SELECT
*
, level -1 as level
FROM
cte
WHERE
level > 0
)
SELECT * FROM cte
CTEs in SQL Server are not necessarily run 'independently'. SQL (in SQL Server, etc) is declarative, which means you tell it what you want, not how to do it.
It the query optimiser determines that it can do it better by doing something differently, it will.
A good example is
IF EXISTS(SELECT * FROM test) PRINT 'X';
IF (SELECT COUNT(*) FROM test) > 0 PRINT 'Y';
IF (SELECT COUNT(*) FROM test) > 1 PRINT 'Z';
If it was doing what you told it do, the query plans for the second and third would basically be the same. However, when you run it, the query plans for the first and second are the same; the third differs.
When you hard-code the value (e.g., 10,000), the query optimiser can use that hardcoded value to determine what to do. In this case, it probably determines it doesn't need to run the full CTE, just run it until you get 10,000 rows.
However, if you use a value that can vary (e.g., myTable.timesToRepeatColumn), then the query optimiser often makes a query plan that would word for any value. As such, it makes a query plan that is not fantastic for your situation - probably creating the full CTE in memory before using it. If sys.columns has 100 rows, that's 100^3 rows it creates. If it's 1000, it's 1000^3 e.g., 1,000,000,000. Likely you have more than 1000 rows.

SQL: how do you look for missing ids?

Suppose I have a table with lots of rows identified by a unique ID. Now I have a (rather large) user-input list of ids (not a table) that I want to check are already in the database.
So I want to output the ids that are in my list, but not in the table. How do I do that with SQL?
EDIT: I know I can do that with a temporary table, but I'd really like to avoid that if possible.
EDIT: Same comment for using an external programming language.
Try with this:
SELECT t1.id FROM your_list t1
LEFT JOIN your_table t2
ON t1.id = t2.id
WHERE t2.id IS NULL
It is hardly possible to make a single pure and general SQL query for your task, since it requires to work with a list (which is not a relational concept and standard set of list operations is too limited). For some DBMSs it is possible to write a single SQL query, but it will utilize SQL dialect of the DBMS and will be specific to the DBMS.
You haven't mentioned:
which RDBMS will be used;
what is the source of the IDs.
So I will consider PostgreSQL is used, and IDs to be checked are loaded into a (temporary) table.
Consider the following:
CREATE TABLE test (id integer, value char(1));
INSERT INTO test VALUES (1,'1'), (2,'2'), (3,'3');
CREATE TABLE temp_table (id integer);
INSERT INTO temp_table VALUES (1),(5),(10);
You can get your results like this:
SELECT * FROM temp_table WHERE NOT EXISTS (
SELECT id FROM test WHERE id = temp_table.id);
or
SELECT * FROM temp_table WHERE id NOT IN (SELECT id FROM test);
or
SELECT * FROM temp_table LEFT JOIN test USING (id) WHERE test.id IS NULL;
You can pick any option, depending on your volumes you may have different performance.
Just a note: some RDBMS may have limitation on the number of expressions specified literally inside IN() construct, keep this in mind (I hit this several times with ORACLE).
EDIT: In order to match constraints of no temp tables and no external languages you can use the following construct:
SELECT DISTINCT b.id
FROM test a RIGHT JOIN (
SELECT 1 id UNION ALL
SELECT 5 UNION ALL
SELECT 10) b ON a.id=b.id
WHERE a.id IS NULL;
Unfortunately, you'll have to generate lot's of SELECT x UNION ALL entries to make a single-column and many-rows table here. I use UNION ALL to avoid unnecessary sorting step.

SQL select from data in query where this data is not already in the database?

I want to check my database for records that I already have recorded before making a web service call.
Here is what I imagine the query to look like, I just can't seem to figure out the syntax.
SELECT *
FROM (1,2,3,4) as temp_table
WHERE temp_table.id
LEFT JOIN table ON id IS NULL
Is there a way to do this? What is a query like this called?
I want to pass in a list of id's to mysql and i want it to spit out the id's that are not already in the database?
Use:
SELECT x.id
FROM (SELECT #param_1 AS id
FROM DUAL
UNION ALL
SELECT #param_2
FROM DUAL
UNION ALL
SELECT #param_3
FROM DUAL
UNION ALL
SELECT #param_4
FROM DUAL) x
LEFT JOIN TABLE t ON t.id = x.id
WHERE x.id IS NULL
If you need to support a varying number of parameters, you can either use:
a temporary table to populate & join to
MySQL's Prepared Statements to dynamically construct the UNION ALL statement
To confirm I've understood correctly, you want to pass in a list of numbers and see which of those numbers isn't present in the existing table? In effect:
SELECT Item
FROM IDList I
LEFT JOIN TABLE T ON I.Item=T.ID
WHERE T.ID IS NULL
You look like you're OK with building this query on the fly, in which case you can do this with a numbers / tally table by changing the above into
SELECT Number
FROM (SELECT Number FROM Numbers WHERE Number IN (1,2,3,4)) I
LEFT JOIN TABLE T ON I.Number=T.ID
WHERE T.ID IS NULL
This is relatively prone to SQL Injection attacks though because of the way the query is being built. It'd be better if you could pass in '1,2,3,4' as a string and split it into sections to generate your numbers list to join against in a safer way - for an example of how to do that, see http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
All of this presumes you've got a numbers / tally table in your database, but they're sufficiently useful in general that I'd strongly recommend you do.
SELECT * FROM table where id NOT IN (1,2,3,4)
I would probably just do:
SELECT id
FROM table
WHERE id IN (1,2,3,4);
And then process the list of results, removing any returned by the query from your list of "records to submit".
How about a nested query? This may work. If not, it may get you in the right direction.
SELECT * FROM table WHERE id NOT IN (
SELECT id FROM table WHERE 1
);

SQLServer SQL query with a row counter

I have a SQL query, that returns a set of rows:
SELECT id, name FROM users where group = 2
I need to also include a column that has an incrementing integer value, so the first row needs to have a 1 in the counter column, the second a 2, the third a 3 etc
The query shown here is just a simplified example, in reality the query could be arbitrarily complex, with several joins and nested queries.
I know this could be achieved using a temporary table with an autonumber field, but is there a way of doing it within the query itself ?
For starters, something along the lines of:
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
However, it's important to note that the ROW_NUMBER() OVER (ORDER BY ...) construct only determines the values of Row_Counter, it doesn't guarantee the ordering of the results.
Unless the SELECT itself has an explicit ORDER BY clause, the results could be returned in any order, dependent on how SQL Server decides to optimise the query. (See this article for more info.)
The only way to guarantee that the results will always be returned in Row_Counter order is to apply exactly the same ordering to both the SELECT and the ROW_NUMBER():
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
ORDER BY my_order_column -- exact copy of the ordering used for Row_Counter
The above pattern will always return results in the correct order and works well for simple queries, but what about an "arbitrarily complex" query with perhaps dozens of expressions in the ORDER BY clause? In those situations I prefer something like this instead:
SELECT t.*
FROM
(
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY ...) AS Row_Counter -- complex ordering
FROM my_table
) AS t
ORDER BY t.Row_Counter
Using a nested query means that there's no need to duplicate the complicated ORDER BY clause, which means less clutter and easier maintenance. The outer ORDER BY t.Row_Counter also makes the intent of the query much clearer to your fellow developers.
In SQL Server 2005 and up, you can use the ROW_NUMBER() function, which has options for the sort order and the groups over which the counts are done (and reset).
The simplest way is to use a variable row counter. However it would be two actual SQL commands. One to set the variable, and then the query as follows:
SET #n=0;
SELECT #n:=#n+1, a.* FROM tablename a
Your query can be as complex as you like with joins etc. I usually make this a stored procedure. You can have all kinds of fun with the variable, even use it to calculate against field values. The key is the :=
Heres a different approach.
If you have several tables of data that are not joinable, or you for some reason dont want to count all the rows at the same time but you still want them to be part off the same rowcount, you can create a table that does the job for you.
Example:
create table #test (
rowcounter int identity,
invoicenumber varchar(30)
)
insert into #test(invoicenumber) select [column] from [Table1]
insert into #test(invoicenumber) select [column] from [Table2]
insert into #test(invoicenumber) select [column] from [Table3]
select * from #test
drop table #test