SQL Select/Group by a list of column names - sql

is there a way to do such thing?
suppose that i have the following table called columnstogroupby
ColumnsToGroupBy
Column1
Column2
Coulmn3
and i have this query
Select sum(someColumn) Total,Column1,Column2,Column3 From MyTable Group By Column1,Column2,Column3
is there a way the i can create this query dynamically using the columnstogroupby table?
note that the sum(someColumn) Total is always the same, i just need to include all columns in the columnstogroupby in my select and group by clauses, is that possible?

Yes, this is possible using dynamic SQL.
See the MSDN help page for sp_executesql and be sure to consider the security implications.
A Google search for SQL Server dynamic returns plenty of examples too.

Related

When does the aliasing take effect? [duplicate]

I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.

Is it possible to have an SQL query that uses AGG functions in this way?

Assuming I have the following aggregate functions:
AGG1
AGG2
AGG3
AGG4
Is it possible to write valid SQL (in a db agnostic way) like this:
SELECT [COL1, COL2 ....], AGG1(param1), AGG2(param2) FROM [SOME TABLES]
WHERE [SOME CRITERIA]
HAVING AGG3(param2) >-1 and AGG4(param4) < 123
GROUP BY COL1, COL2, ... COLN
ORDER BY COL1, COLN ASC
LIMIT 10
Where COL1 ... COLN are columns in the tables being queried, and param1 ... paramX are parameters passed to the AGG funcs.
Note: AGG1 and AGG2 are returned in the results as columns (but do not appear in the HAVING CLAUSE, and AGG3 and AGG4 appear in the HAVING CLAUSE but are not returned in the result set.
Ideally, I want a DB agnostic answer to the solution, but if I have to be tied to a db, I am using PostgreSQL (v9.x).
Edit
Just a matter of clarification: I am not opposed to using GROUP BY in the query. My SQL is not very good, so the example SQL above may have been slightly misleading. I have edited the pseudo sql statement above to hopefully make my intent more clear.
The main thing I wanted to find out was whether a select query that used AGG functions could:
Have agg functions values in the returned column without them being specified in a HAVING clause.
Have agg functions specified in a HAVING clause, but are not returned in the result set.
From the answers I have received so far, it would seem the answer to both questions is YES. The only think I have to do to correct my SQL is to add a GROUP BY clause to make sure that the returned rows are unique.
PostgreSQL major version include the first digit after the dot, thus "PostgreSQL (v9.x)" is not specific enough. As #kekekela said, there is no (cheap) completely db agnostic way. Even between PostgreSQL 9.0 and 9.1 there is an important syntactical difference.
If you had only the grouped values AGG1(param1), AGG2(param2) you would get away without providing an explicit GROUP BY clause. Since you mix grouped and non-grouped columns you have to provide a GROUP BY clause with all non-grouped columns that appear in the SELECT. That's true for any version of PostgreSQL. Read about GROUP BY and HAVING it in the manual.
Starting with version 9.1, however, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Concerning parameters
Do you intend to feed a constant value to an aggregate function? What's the point? The docs tell us
An aggregate function computes a single result from multiple input rows.
Or do you want those parameters to be column names? That kind of dynamic SQL works as long as the statement is generated before committing to the database. Does not work for prepared statements or simple sql or plpgsql functions. You have to use EXECUTE in a plpgsql function for that purpose.
As safeguard against SQLi use the USING $1, $2 syntax for values and quote_ident() for your column or table names.
The only way to aggregate over columns without using GROUP BY is to use windowing functions. You left out details of your problem, but the following might be what you are looking for:
SELECT *
FROM (
SELECT [COL1, COL2 ....],
AGG1(param1) over (partition by some_grouping_column) as agg1,
AGG2(param2) over (partition by some_grouping_column) as agg2,
row_number() over () as rn
FROM [SOME TABLES]
WHERE [SOME CRITERIA]
ORDER BY COL1
) t
WHERE AGG3 >-1
AND AGG4 < 123
AND rn <= 10
ORDER BY col1
This is standard ANSI SQL and works on most database including PostgreSQL (since 8.4).
Note that you do not need to use the same grouping column for both aggregates in the partition by clause.
If you want to stick with ANSI SQL then you should use the row_number() function to limit the result. If you run this only on PostgreSQL (or other DBMS that support LIMIT in some way) move the LIMIT cause into the derived table (the inner query)
That should work from a high level perspective, except you'd need COL1, COL2 etc in a GROUP BY statement or else they won't be valid in the SELECT list. Having AGG1, etc in the SELECT list and not in the HAVING is not a problem.
As far as db agnostic, you're going to have to tweak syntax no matter what you do (the LIMIT for example is going to be different in PostgreSQL, SQL SERVER and Oracle that I know off the top of my head), but you could build logic to construct the statements properly for each provided your high-level representation is solid.

Using alias in query and using it

I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.

Use a LIKE clause in part of an INNER JOIN

Can/Should I use a LIKE criteria as part of an INNER JOIN when building a stored procedure/query? I'm not sure I'm asking the right thing, so let me explain.
I'm creating a procedure that is going to take a list of keywords to be searched for in a column that contains text. If I was sitting at the console, I'd execute it as such:
SELECT Id, Name, Description
FROM dbo.Card
WHERE Description LIKE '%warrior%'
OR
Description LIKE '%fiend%'
OR
Description LIKE '%damage%'
But a trick I picked up a little while go to do "strongly typed" list parsing in a stored procedure is to parse the list into a table variable/temporary table, converting it to the proper type and then doing an INNER JOIN against that table in my final result set. This works great when sending say a list of integer IDs to the procedure. I wind up having a final query that looks like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblExclusiveCard ON dbo.Card.Id = #tblExclusiveCard.CardId
I want to use this trick with a list of strings. But since I'm looking for a particular keyword, I am going to use the LIKE clause. So ideally I'm thinking I'd have my final query look like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' + #tblKeyword.Value + '%'
Is this possible/recommended?
Is there a better way to do something like this?
The reason I'm putting wildcards on both ends of the clause is because there are "archfiend", "beast-warrior", "direct-damage" and "battle-damage" terms that are used in the card texts.
I'm getting the impression that depending on the performance, I can either use the query I specified or use a full-text keyword search to accomplish the same task?
Other than having the server do a text index on the fields I want to text search, is there anything else I need to do?
Try this
select * from Table_1 a
left join Table_2 b on b.type LIKE '%' + a.type + '%'
This practice is not ideal. Use with caution.
Your first query will work but will require a full table scan because any index on that column will be ignored. You will also have to do some dynamic SQL to generate all your LIKE clauses.
Try a full text search if your using SQL Server or check out one of the Lucene implementations. Joel talked about his success with it recently.
try it...
select * from table11 a inner join table2 b on b.id like (select '%'+a.id+'%') where a.city='abc'.
Its works for me.:-)
It seems like you are looking for full-text search. Because you want to query a set of keywords against the card description and find any hits? Correct?
Personally, I have done it before, and it has worked out well for me. The only issues i could see is possibly issues with an unindexed column, but i think you would have the same issue with a where clause.
My advice to you is just look at the execution plans between the two. I'm sure that it will differ which one is better depending on the situation, just like all good programming problems.
#Dillie-O
How big is this table?
What is the data type of Description field?
If either are small a full text search will be overkill.
#Dillie-O
Maybe not the answer you where looking for but I would advocate a schema change...
proposed schema:
create table name(
nameID identity / int
,name varchar(50))
create table description(
descID identity / int
,desc varchar(50)) --something reasonable and to make the most of it alwase lower case your values
create table nameDescJunc(
nameID int
,descID int)
This will let you use index's without have to implement a bolt on solution, and keeps your data atomic.
related: Recommended SQL database design for tags or tagging
a trick I picked up a little while go
to do "strongly typed" list parsing in
a stored procedure is to parse the
list into a table variable/temporary
table
I think what you might be alluding to here is to put the keywords to include into a table then use relational division to find matches (could also use another table for words to exclude). For a worked example in SQL see Keyword Searches by Joe Celko.
Performance will be depend on the actual server than you use, and on the schema of the data, and the amount of data. With current versions of MS SQL Server, that query should run just fine (MS SQL Server 7.0 had issues with that syntax, but it was addressed in SP2).
Have you run that code through a profiler? If the performance is fast enough and the data has the appropriate indexes in place, you should be all set.
LIKE '%fiend%' will never use an seek, LIKE 'fiend%' will. Simply a wildcard search is not sargable
Try this;
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' +
CONCAT(CONCAT('%',#tblKeyword.Value),'%') + '%'