I've more experience using Access where I would build up my analysis in small parts and query each new view.
I'm not trying to do something that must be simple in SQL.
If I have a query of the format:
SELECT events.people, COUNT(events.eventType) AS usersAndEventCount
FROM theDB.events
WHERE event_id = 884
GROUP BY people
And I then want to query usersAndEventCount
like so:
Select usersAndEventCount.people, usersAndEventCount.events
FROM [from where actually?]
Tried from:
usersAndEventCount;
events
theDB.events
This must seem very basic to SQL users on SO. But in my mind it's much easier to breakdown the larger query into these sub queries.
How would I query usersAndEventCount in the same query?
Your statement "then want to query usersAndEventCount" does not make sense because usersAndEventCount is a column - at least in your first example. You can not "query" a column.
But from the example you have given it seems you want something like this:
Select usersAndEventCount.people, usersAndEventCount.events
FROM (
SELECT events.people,
COUNT(events.eventType) AS as events
FROM theDB.events
WHERE event_id = 884
GROUP BY people
) as usersAndEventCount
This is called a "derived table" in SQL
In pure SQL, you can use nested queries (AKA sub-queries). Just enclose your first query in () brackets, so your query will look like this:
Select usersAndEventCount.people, usersAndEventCount.events
FROM (SELECT events.people, COUNT(events.eventType) AS events
FROM theDB.events
WHERE event_id = 884
GROUP BY people) usersAndEventCount
Alternatively, to save the first query and reuse it in several places like you were doing in Access, you can save it as a View or Stored Procedure depending on the database system you're using. If you want an example, let me know the database system you're using.
I am not sure per 100%, because at the moment i can not test. But I think it should work.
uaec is the alias for the subquery. This alias you can use in the main query
Select uaec.people, uaec.usersAndEventCount
FROM (SELECT events.people, COUNT(events.eventType) AS usersAndEventCount
FROM theDB.events
WHERE event_id = 884
GROUP BY people) uaec
Related
I have finally succeeded in summing the results of two sql sum queries. One small step for this guy:)
My question relates to the last character in the code (Z):
SELECT SUM(hr)
FROM (SELECT SUM(amount) AS hr
FROM Try_again.dbo.tuesday_practice_database
WHERE account_name LIKE 'concessions'
UNION
SELECT SUM(amount) AS hr
FROM Try_again.dbo.tuesday_practice_database
WHERE account_name LIKE 'salaries')Z
The code won't work unless there is something following the closing parenthesis).
I'm just curious why that is and why does there need to be a character following the ).
It seems that it doesn't matter what the character is, the code will work. But the code breaks if I leave it blank.
Thank you
SELECT sum(hr)
FROM
(
Select sum(amount) as hr
from Try_again.dbo.tuesday_practice_database
where account_name like 'concessions'
union
Select sum(amount) as hr
from Try_again.dbo.tuesday_practice_database
where account_name like 'salaries'
) z
To understand better, look at the above version of your query. It's the same code, just reformatted to help illustrate what is happening. If you notice, the parent FROM clause retrieves data from a sub-query. In this context, SQL requires the subquery to have a name of some kind, and so the z is added as an alias to meet that requirement. You could put anything you wanted there, but since the name doesn't matter to us a single-letter placeholder is fine.
Just like the following query :
select * from table1 as z
By the way, you didn't use wildcard in your LIKE clause! I think you should re-write the query like below :
select sum(hr) from
(
Select sum(amount) as hr from Try_again.dbo.tuesday_practice_database
where account_name like '%concessions&'
union all
Select sum(amount) as hr from Try_again.dbo.tuesday_practice_database
where account_name like '%salaries%'
) AS z
If you don't want to use wildcard, then you should avoid using LIKE Use IN instead and re-write to this simple one :
Select sum(amount) from Try_again.dbo.tuesday_practice_database
where account_name in('concessions', 'salaries')
Just do this:
SELECT SUM(amount) as hr
FROM Try_again.dbo.tuesday_practice_database
WHERE account_name IN ('concessions', 'salaries')
The existing code has a bug, where it will consolidate your two SUM()s if they are both the same amount. You could also fix this with UNION ALL instead of just UNION, but since the LIKE conditions don't have any placeholders like % or _ we can do better and simplify this whole thing down to just one IN() condition, with no need for any subqueries.
Now, if you wanted to allow more variance in your matches ( ie LIKE '%concessions%' and LIKE '%salaries%'), then the UNION ALL is more helpful... just be warned that leading % placeholders in a LIKE condition are very bad for performance, and should be avoided when possible. Often this means changing the schema in some way, such as adding a table named something like AccountCategories that group each account into a specific category that you can target exactly in your query.
But all of that side-steps the actual question: what is that z character?
In this case, it's an alias. You're using a subquery to union the two smaller queries together. In certain contexts, the SQL language requires subqueries to have a name. This includes using the subquery (derived table) as the target of a FROM, JOIN, or APPLY expression. You can use any name you want — it doesn't matter to the functioning of the query, since it's never referenced again — and so a simple single-letter placeholder, like z, is good enough.
In other postgresql DBMSes (e.g., Netezza) I can do something like this without errors:
select store_id
,sum(sales) as total_sales
,count(distinct(txn_id)) as d_txns
,total_sales/d_txns as avg_basket
from my_tlog
group by 1
I.e., I can use aggregate values within the same SQL query that defined them.
However, when I go to do the same sort of thing on Amazon Redshift, I get the error "Column total_sales does not exist..." Which it doesn't, that's correct; it's not really a column. But is there a way to preserve this idiom, rather than restructuring the query? I ask because there would be a lot of code to change.
Thanks.
You simply need to repeat the expressions (or use a subquery or CTE):
select store_id,
sum(sales) as total_sales,
count(distinct txn_id) as d_txns,
sum(sales)/count(distinct txn_id) as avg_basket
from my_tlog
group by store_id;
Most databases do not support the re-use of column aliases in the select. The reason is twofold (at least):
The designers of the database engine do not want to specify the order of processing of expressions in the select.
There is ambiguity when a column alias is also a valid column in a table in the from clause.
Personally I loove the construct in netezza. This is compact and the syntax is not ambiguous: any 'dublicate' column names will default to (new) alias in the current query, and if you need to reference the column of the underlying tables, simply put the tablename in front of the column. The above example would become:
select store_id
,sum(sales) as sales ---- dublicate name
,count(distinct(txn_id)) as d_txns
,my_tlog.sales/d_txns as avg_basket --- this illustrates but may not make sense
from my_tlog
group by 1
I recently moved away from sql server, and on that database I used a construct like this to avoid repeating the expressions:
Select *, total_sales/d_txns as avg_basket
From (
select store_id
,sum(sales) as total_sales
,count(distinct(txn_id)) as d_txns
from my_tlog
group by 1
)x
Most (if not all) databases will support this construct, and have done so for 10 years or more
I'm not particularly familiar with SQL, but my team asked me to take a look at this series of sql statements and see if it is possible to reduce it down to just 1 or 2. I looked at it, and I don't believe so, but I don't quite have the experience or knowledge of the tricks of sql.
So all of the statements have pretty much the same format
select
id, count(*)
from
table
where
x_date between to_date(start_date) and to_date(end_date)
group by
id
where x_date is the only thing that changes. (Start_date and end_date are just what I typed here to make it a bit more readable). There are 10 statements total, 7 of which are exactly this format.
Of the 3 different ones, one of them looks like this:
select
id, count(*)
from
table
where
x_date between to_date(start_date) and to_date(end_date)
and userid not like 'AUTOPEND'
group by
id
and the other 2 look like this:
select
id, count(*)
from
table
where
x_date between to_date(start_date) and to. _date(end_date)
group by
id, x_code
Where x_code differs between them.
They want to use this data for statistical analysis, but they insist on manually using a query and typing it in. The way I see it is that I can't really combine these statements because they are all grouping by the same field (except the last 2), so it all gets combined in the results, making the results useless for the analysis.
Am I thinking about it right, or is there some way to do like they asked? Can I make a 1 or 2 sql statements output more than 1 table each?
Oh, I almost forgot. I believe that this is Oracle PL/SQL using SQL developer.
You are trying to get multiple aggregates with different grouping sets in a single query. This is called a ROLLUP or a CUBE. There are a few ways to solve your specific problem, but the extended grouping functions are the right tool for the job. Going forward, it will be more maintainable and faster.
http://www.oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets.php
Since the first and second example are grouping by the same thing you can use CASE statement nested in an aggregation function:
SELECT id, COUNT(*) ,
SUM( CASE WHEN userid not like 'AUTOPEND' THEN 1 ELSE 0 END) AS [NotAUTOPEND]
FROM table
WHERE x_date between to_date(start_date) and to_date(end_date)
GROUP BY id
I'm writing a function in ColdFusion that returns the first couple of records that match the user's input, as well as the total count of matching records in the entire database. The function will be used to feed an autocomplete, so speed/efficiency are its top concerns. For example, if the function receives input "bl", it might return {sampleMatches:["blue", "blade", "blunt"], totalMatches:5000}
I attempted to do this in a single query for speed purposes, and ended up with something that looked like this:
select record, count(*) over ()
from table
where criteria like :criteria
and rownum <= :desiredCount
The problem with this solution is that count(*) over () always returns the value of :desiredCount. I saw a similar question to mine here, but my app will not have permissions to create a temp table. So is there a way to solve my problem in one query? Is there a better way to solve it? Thanks!
I'm writing this on top of my head, so you should definitely have to time this, but I believe that using following CTE
only requires you to write the conditions once
only returns the amount of records you specify
has the correct total count added to each record
and is evaluated only once
SQL Statement
WITH q AS (
SELECT record
FROM table
WHERE criteria like :criteria
)
SELECT q1.*, q2.*
FROM q q1
CROSS JOIN (
SELECT COUNT(*) FROM q
) q2
WHERE rownum <= :desiredCount
A nested subquery should return the results you want
select record, cnt
from (select record, count(*) over () cnt
from table
where criteria like :criteria)
where rownum <= :desiredCount
This will, however, force Oracle to completely process the query in order to generate the accurate count. This seems unlikely to be what you want if you're trying to do an autocomplete particularly when Oracle may decide that it would be more efficient to do a table scan on table if :criteria is just b since that predicate isn't selective enough. Are you really sure that you need a completely accurate count of the number of results? Are you sure that your table is small enough/ your system is fast enough/ your predicates are selective enough for that to be a requirement that you could realistically meet? Would it be possible to return a less-expensive (but less-accurate) estimate of the number of rows? Or to limit the count to something smaller (say, 100) and have the UI display something like "and 100+ more results"?
You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.
I would however like access the one of the non-aggregates associated with the max. In plain english, I want a table with the oldest id of each kind.
CREATE TABLE stuff (
id int,
kind int,
age int
);
This query gives me the information I'm after:
SELECT kind, MAX(age)
FROM stuff
GROUP BY kind;
But it's not in the most useful form. I really want the id associated with each row so I can use it in later queries.
I'm looking for something like this:
SELECT id, kind, MAX(age)
FROM stuff
GROUP BY kind;
That outputs this:
SELECT stuff.*
FROM
stuff,
( SELECT kind, MAX(age)
FROM stuff
GROUP BY kind) maxes
WHERE
stuff.kind = maxes.kind AND
stuff.age = maxes.age
It really seems like there should be a way to get this information without needing to join. I just need the SQL engine to remember the other columns when it's calculating the max.
You can't get the Id of the row that MAX found, because there might not be only one id with the maximum age.
You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.
You can, and have to, define what you are grouping by for the aggregate function to return the correct result.
MySQL (and SQLite) decided in their infinite wisdom that they would go against spec, and allow queries to accept GROUP BY clauses missing columns quoted in the SELECT - it effectively makes these queries not portable.
It really seems like there should be a way to get this information without needing to join.
Without access to the analytic/ranking/windowing functions that MySQL doesn't support, the self join to a derived table/inline view is the most portable means of getting the result you desire.
I think it's tempting indeed to ask the system to solve the problem in one pass rather than having to do the job twice (find the max, and the find the corresponding id). You can do using CONCAT (as suggested in Naktibalda refered article), not sure that would be more effeciant
SELECT MAX( CONCAT( LPAD(age, 10, '0'), '-', id)
FROM STUFF1
GROUP BY kind;
Should work, you have to split the answer to get the age and the id.
(That's really ugly though)
In recent databases you can use sum() over (parition by ...) to solve this problem:
select id, kind, age as max_age from (
select id, kind, age, max(age) over (partition by kind) as mage
from table)
where age = mage
This can then be single pass
PostgesSQL's DISTINCT ON will be useful here.
SELECT DISTINCT ON (kind) kind, id, age
FROM stuff
ORDER BY kind, age DESC;
This groups by kind and returns the first row in the ordered format. As we have ordered by age in descending order, we will get the row with max age for kind.
P.S. columns in DISTINCT ON should appear first in order by
You have to have a join because the aggregate function max retrieves many rows and chooses the max.
So you need a join to choose the one that the agregate function has found.
To put it a different way how would you expect the query to behave if you replaced max with sum?
An inner join might be more efficient than your sub query though.