I'm not particularly familiar with SQL, but my team asked me to take a look at this series of sql statements and see if it is possible to reduce it down to just 1 or 2. I looked at it, and I don't believe so, but I don't quite have the experience or knowledge of the tricks of sql.
So all of the statements have pretty much the same format
select
id, count(*)
from
table
where
x_date between to_date(start_date) and to_date(end_date)
group by
id
where x_date is the only thing that changes. (Start_date and end_date are just what I typed here to make it a bit more readable). There are 10 statements total, 7 of which are exactly this format.
Of the 3 different ones, one of them looks like this:
select
id, count(*)
from
table
where
x_date between to_date(start_date) and to_date(end_date)
and userid not like 'AUTOPEND'
group by
id
and the other 2 look like this:
select
id, count(*)
from
table
where
x_date between to_date(start_date) and to. _date(end_date)
group by
id, x_code
Where x_code differs between them.
They want to use this data for statistical analysis, but they insist on manually using a query and typing it in. The way I see it is that I can't really combine these statements because they are all grouping by the same field (except the last 2), so it all gets combined in the results, making the results useless for the analysis.
Am I thinking about it right, or is there some way to do like they asked? Can I make a 1 or 2 sql statements output more than 1 table each?
Oh, I almost forgot. I believe that this is Oracle PL/SQL using SQL developer.
You are trying to get multiple aggregates with different grouping sets in a single query. This is called a ROLLUP or a CUBE. There are a few ways to solve your specific problem, but the extended grouping functions are the right tool for the job. Going forward, it will be more maintainable and faster.
http://www.oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets.php
Since the first and second example are grouping by the same thing you can use CASE statement nested in an aggregation function:
SELECT id, COUNT(*) ,
SUM( CASE WHEN userid not like 'AUTOPEND' THEN 1 ELSE 0 END) AS [NotAUTOPEND]
FROM table
WHERE x_date between to_date(start_date) and to_date(end_date)
GROUP BY id
Related
My problem is a bit hard to explain, so here's an example of what I'd like to do:
SELECT AVG(num) - num, date
FROM my_tbl
What I'm doing here is accessing both a result of aggregation (AVG(num)) and a specific row (num) at the same time.
This example doesn't work, here's one that does the same thing and works:
SELECT (SELECT AVG(num) FROM my_tbl) - num
FROM my_tbl
My problem with that second request is that my_table is queried 2 times, as if it was 2 different tables
So my question is: is it possible to make 2 different types of selection, and query the table only once?
Use a window function:
SELECT AVG(num) OVER () - num, date
FROM my_tbl;
Whether this reads the table once or twice is up to the optimizer. I'm not sure this will be any faster than your version. But it is a more concise way to write the query.
I have finally succeeded in summing the results of two sql sum queries. One small step for this guy:)
My question relates to the last character in the code (Z):
SELECT SUM(hr)
FROM (SELECT SUM(amount) AS hr
FROM Try_again.dbo.tuesday_practice_database
WHERE account_name LIKE 'concessions'
UNION
SELECT SUM(amount) AS hr
FROM Try_again.dbo.tuesday_practice_database
WHERE account_name LIKE 'salaries')Z
The code won't work unless there is something following the closing parenthesis).
I'm just curious why that is and why does there need to be a character following the ).
It seems that it doesn't matter what the character is, the code will work. But the code breaks if I leave it blank.
Thank you
SELECT sum(hr)
FROM
(
Select sum(amount) as hr
from Try_again.dbo.tuesday_practice_database
where account_name like 'concessions'
union
Select sum(amount) as hr
from Try_again.dbo.tuesday_practice_database
where account_name like 'salaries'
) z
To understand better, look at the above version of your query. It's the same code, just reformatted to help illustrate what is happening. If you notice, the parent FROM clause retrieves data from a sub-query. In this context, SQL requires the subquery to have a name of some kind, and so the z is added as an alias to meet that requirement. You could put anything you wanted there, but since the name doesn't matter to us a single-letter placeholder is fine.
Just like the following query :
select * from table1 as z
By the way, you didn't use wildcard in your LIKE clause! I think you should re-write the query like below :
select sum(hr) from
(
Select sum(amount) as hr from Try_again.dbo.tuesday_practice_database
where account_name like '%concessions&'
union all
Select sum(amount) as hr from Try_again.dbo.tuesday_practice_database
where account_name like '%salaries%'
) AS z
If you don't want to use wildcard, then you should avoid using LIKE Use IN instead and re-write to this simple one :
Select sum(amount) from Try_again.dbo.tuesday_practice_database
where account_name in('concessions', 'salaries')
Just do this:
SELECT SUM(amount) as hr
FROM Try_again.dbo.tuesday_practice_database
WHERE account_name IN ('concessions', 'salaries')
The existing code has a bug, where it will consolidate your two SUM()s if they are both the same amount. You could also fix this with UNION ALL instead of just UNION, but since the LIKE conditions don't have any placeholders like % or _ we can do better and simplify this whole thing down to just one IN() condition, with no need for any subqueries.
Now, if you wanted to allow more variance in your matches ( ie LIKE '%concessions%' and LIKE '%salaries%'), then the UNION ALL is more helpful... just be warned that leading % placeholders in a LIKE condition are very bad for performance, and should be avoided when possible. Often this means changing the schema in some way, such as adding a table named something like AccountCategories that group each account into a specific category that you can target exactly in your query.
But all of that side-steps the actual question: what is that z character?
In this case, it's an alias. You're using a subquery to union the two smaller queries together. In certain contexts, the SQL language requires subqueries to have a name. This includes using the subquery (derived table) as the target of a FROM, JOIN, or APPLY expression. You can use any name you want — it doesn't matter to the functioning of the query, since it's never referenced again — and so a simple single-letter placeholder, like z, is good enough.
In other postgresql DBMSes (e.g., Netezza) I can do something like this without errors:
select store_id
,sum(sales) as total_sales
,count(distinct(txn_id)) as d_txns
,total_sales/d_txns as avg_basket
from my_tlog
group by 1
I.e., I can use aggregate values within the same SQL query that defined them.
However, when I go to do the same sort of thing on Amazon Redshift, I get the error "Column total_sales does not exist..." Which it doesn't, that's correct; it's not really a column. But is there a way to preserve this idiom, rather than restructuring the query? I ask because there would be a lot of code to change.
Thanks.
You simply need to repeat the expressions (or use a subquery or CTE):
select store_id,
sum(sales) as total_sales,
count(distinct txn_id) as d_txns,
sum(sales)/count(distinct txn_id) as avg_basket
from my_tlog
group by store_id;
Most databases do not support the re-use of column aliases in the select. The reason is twofold (at least):
The designers of the database engine do not want to specify the order of processing of expressions in the select.
There is ambiguity when a column alias is also a valid column in a table in the from clause.
Personally I loove the construct in netezza. This is compact and the syntax is not ambiguous: any 'dublicate' column names will default to (new) alias in the current query, and if you need to reference the column of the underlying tables, simply put the tablename in front of the column. The above example would become:
select store_id
,sum(sales) as sales ---- dublicate name
,count(distinct(txn_id)) as d_txns
,my_tlog.sales/d_txns as avg_basket --- this illustrates but may not make sense
from my_tlog
group by 1
I recently moved away from sql server, and on that database I used a construct like this to avoid repeating the expressions:
Select *, total_sales/d_txns as avg_basket
From (
select store_id
,sum(sales) as total_sales
,count(distinct(txn_id)) as d_txns
from my_tlog
group by 1
)x
Most (if not all) databases will support this construct, and have done so for 10 years or more
I would like to query a DB2 table and get all the results of a query in addition to all of the rows returned by the select statement in a separate column.
E.g., if the table contains columns 'id' and 'user_id', assuming 100 rows, the result of the query would appear in this format: (id) | (user_id) | 100.
I do not wish to use a 'group by' clause in the query. (Just in case you are confused about what i am asking) Also, I could not find an example here: http://mysite.verizon.net/Graeme_Birchall/cookbook/DB2V97CK.PDF.
Also, if there is a more efficient way of getting both these results (values + count), I would welcome any ideas. My environment uses zend framework 1.x, which does not have an ODBC adapter for DB2. (See issue http://framework.zend.com/issues/browse/ZF-905.)
If I understand what you are asking for, then the answer should be
select t.*, g.tally
from mytable t,
(select count(*) as tally
from mytable
) as g;
If this is not what you want, then please give an actual example of desired output, supposing there are 3 to 5 records, so that we can see exactly what you want.
You would use window/analytic functions for this:
select t.*, count(*) over() as NumRows
from table t;
This will work for whatever kind of query you have.
Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.