Select of calculated value always returns row - sql

I have a database (running on postgres 9.3) of bookings of resources. This database contains a table reservations which contains beside other values the start and stop time of the reservation (as timestamp with time zone)
Now I need to know how much reservations a given company has currently active in the future in terms of total hours of all these reservations added together.
I have put together the following query that does the job:
SELECT EXTRACT(EPOCH FROM Sum(stop-start))/3600 AS total
FROM (reservations JOIN partners ON partner = email)
WHERE stop > now() AND company = 'givencompany'
This works quite well if the given company has reservations in the future. The problem I am experiencing is that when the company doesnt have any reservations the query does in fact return a row but the collumn total is empty whereas I would like it to return no row at all (or a row containing 0 if nothing is too complicated) in that case.
Is this possible to accomplish with a different SELECT or another modification to the database or does the consuming application have to check for null every time?
Sorry if my question is trivial but I am very new to databases altogether
Edit
I found out that I could default the returned value with 0 by using COALESCE but I would much prefer it if no row would be returned

Short answer: just add HAVING Sum(stop-start) IS NOT NULL at the end of query.
Long answer:
This query has no explicit GROUP BY, but since it aggregates the rows with sum(), it's implicitly turned into a GROUP BY query, with all the rows matching the WHERE condition taken as one group.
See the doc on SELECT :
without GROUP BY, an aggregate produces a single value computed across
all the selected rows
And about the HAVING clause:
The presence of HAVING turns a query into a grouped query even if
there is no GROUP BY clause. This is the same as what happens when the
query contains aggregate functions but no GROUP BY clause. All the
selected rows are considered to form a single group, and the SELECT
list and HAVING clause can only reference table columns from within
aggregate functions. Such a query will emit a single row if the HAVING
condition is true, zero rows if it is not true.

Related

SQL Server 2005 - exclude rows with consecutive duplicate values in 1 field

I have a source table with 2 fields, a date, and a status code. I need a query to remove duplicate consecutive status codes, keeping only the row with the first date of a different status. For example:
Date Status
10/02/2004 A
10/12/2004 B
10/14/2004 B
11/22/2004 C
11/23/2004 C
12/03/2004 C
03/05/2006 B
The desired result set would be:
10/02/2004 A
10/12/2004 B
11/22/2004 C
03/05/2006 B
The main problem is that all the grouping functions (GROUP BY and ROW_NUMBER() OVER) don't seem to care about order, so in the example, all the "B" status records would be grouped together, which is incorrect, since the status changes from non-"B" to "B" two different times.
This problem is easy to solve using a cursor based loop to produce the result. Just remember the current value in a variable, and test each record as you loop. That works perfectly, but is dreadfully slow (over 20 minutes on real data).
This needs to run on SQL Server 2005 and later, so some newer windowing functions are not available. Is there a way to do this using a set-based query, that would presumably run much faster? It seems like it should be a simple thing to do, but maybe not. Other similar questions on SO seem to rely on additional ID or Sequence fields that we do not have available.
The reason regular grouping doesn't help in this situation is because the grouping criteria needs to reference fields in 2 different records to determine if a group break should occur. Since SQL 2005 lags behind the newer versions, we don't have a lag function to look at the prior record's value. Instead, we need to do a self join to get access to the prior record. To do that, we need to create a temporary sequence field in a CTE using ROW_NUMBER(). Then use that generated sequence in the self join to look at the prior record. We end up with something like:
;WITH tmp AS (
SELECT myDate,myStatus,ROW_NUMBER() OVER (ORDER BY myDate) as seq
FROM myTable )
SELECT tmp.* FROM tmp LEFT JOIN tmp t2 ON t2.seq = tmp.seq-1
WHERE t2.seq is null OR t2.myStatus!=tmp.myStatus
So, even though the original data doesn't have a sequence column, we can generate it on the fly in order to be able to find the prior record (if any) for any given other record using the self join. Then we get the desired result of selecting only the records where the status has changed from the prior record.

In an SQL query, is "TOP 1" a reliable substitute for aggregate functions such MIN() or MAX()..?

Is "TOP 1" a reliable substitute for aggregate functions such as MIN() and MAX()..? Shown below is a basic query in Access 2007, to determine the first time a customer ordered a certain product during a certain month. The DBMS system is probably irrelevant, since this question could apply to any system.
In this query, "TOP 1" is used in combination with ORDER BY on the date field. This returns one record, that being the oldest by date. But...is this ok..? What can go wrong..? Is there a better way..?
SELECT TOP 1 DAACCT, DAITEM, DAQTY, DAIDAT
FROM fqlOrdersGrandHistory
WHERE DAACCT="T7414" AND DAITEM="45234" AND (DAIDAT>=20170501 AND DAIDAT<=20170531)
ORDER BY DAIDAT;
Reliable is a subjective term. Yes, TOP 1 will give you a result, as will MAX() or MIN(). It depends on what you are after.
If you look for a specific user only (as you appear to be in this case) and sort by DATE ascending and use TOP 1, you will get all of the details for that one record. However, if you are looking for the first purchase of every user in the table, then TOP 1 will only give you the info for the very first person who made an order.
On the other hand, if you use SELECT DAACCT, MIN(DAIDAT) FROM table GROUP BY DAACCT then you will get the earliest purchase for each user. This assumes you are storing the DAIDAT as a date format with a time component, not just the date value itself. If you do that, you open yourself up to multiple possible records.
TL;DR: If you stick with the concept of the query 1) looking for a very specific user for 2) a very specific product and 3) your dates are stored as proper dates, TOP 1 will be better to use than an aggregate function. If one of these three conditions are not met, reevaluate.
TOP is used to limit the fetched rows and yes it's fine unless you have multiple records with same data. Not sure how it's relates to Min() or Max() ... you use Min() or Max() aggregate function when you are grouping the rows using Group By. Even if you don't specify a group by grouping happens on the entire result set
It is OK if either one field or a combination of fields of the selected fields is unique.
If not, the result set will contain all the records where the field or combination match. To avoid this, always include, say, an autonumber field in the selected fields.

Postgres - Group by without having to aggregate?

This is hard to explain, but say I have this query:
SELECT *
FROM "late_fee_tiers"
And it returns this:
I have a validation in code set up to prevent duplicate days from being saved (notice there are 2 rows of days = 2).
I want my query to double-check there are only unique rows of day, and if there are multiple, select the first one (so it should return 3 rows with 2,3,5).
My first thought is to use GROUP BY day, while selecting a MIN("id").
The problem is, I don't understand SQL enough, because it forces me to add different aggregator functions to every single column... but what if I don't want to do that? I want THAT row to be "chosen" according to the single aggregator function I define, I don't need multiple aggregators creating some weird hybrid row. I just want the MIN() function to choose that 1 row and fill in all the rest of the values for that row.
What function do I use to do this, or how would I do it?
Thanks
You want to use DISTINCT ON:
select distinct on (day) *
from "late_fee_tiers"
order by day, id;
Why day is also required in the order by:
From the official documentation:
The DISTINCT ON expression(s) must match the leftmost ORDER BY
expression(s). The ORDER BY clause will normally contain additional
expression(s) that determine the desired precedence of rows within
each DISTINCT ON group.

Counting results in SQLite, given query with functions

As you may (or may not) already know, SQLite does not provide information about total number of results from the query. One has to wrap the query in SELECT count(*) FROM (original query); in order to get row count.
This worked perfectly fine for me, until one of users created custom SQL function (you can define your own functions in SQLite) that does INSERT into another, unrelated table. Then he executes query:
SELECT customFunction() FROM primaryTable WHERE primaryKeyColumnId = 1;
The query returns always 1 row, that is certain. It turns out that customFunction() was called twice (and inserted to that other table 2 rows) and that's because my application called his query as usuall and then called count(*) on that query as a followup.
How to approach this problem? How to execute only the original query and still have a row count from SQLite?
I'm using SQLite (3.13.0) C API.
You either have to remove such function calls from the query, or you cannot get the row count before actually having stepped through all the result rows.

SQL - Using MAX in a WHERE clause

Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC