What is the meaning of a constant in a SELECT query? - sql

Considering the 2 below queries:
1)
USE AdventureWorks
GO
SELECT a.ProductID, a.ListPrice
FROM Production.Product a
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderDetail b
WHERE b.ProductID = a.ProductID)
2)
USE AdventureWorks
GO
SELECT a.ProductID, a.Name, b.SalesOrderID
FROM Production.Product a LEFT OUTER JOIN Sales.SalesOrderDetail b
ON a.ProductID = b.ProductID
ORDER BY 1
My only question is know what is the meaning of the number 1 in those queries? How about if I change them to 2 or something else?
Thanks for helping

In the first case it does not matter; you can select a 2 or anything, really, because it is an existence query. In general selecting a constant can be used for other things besides existence queries (it just drops the constant into a column in the result set), but existence queries are where you are most likely to encounter a constant.
For example, given a table called person containing three columns, id, firstname, lastname, and birthdate, you can write a query like this:
select firstname, 'YAY'
from person
where month(birthdate) = 6;
and this would return something like
name 'YAY'
---------------
Ani YAY
Sipho YAY
Hiro YAY
It's not useful, but it is possible. The idea is that in a select statement you select expressions, which can be not only column names but constants and function calls, too. A more likely case is:
select lastname||','||firstname, year(birthday)
from person;
Here the || is the string concatenation operator, and year is a function I made up.
The reason you sometimes see 1 in existence queries is this. Suppose you only wanted to know whether there was a person whose name started with 'H', but you didn't care who this person was. You can say
select id
from person
where lastname like 'H%';
but since we don't need the id, you can also say
select 1
from person
where lastname like 'H%';
because all you care about is whether or not you get a non-empty result set or not.
In the second case, the 1 is a column number; it means you want your results sorted by the value in the first column. Changing that to a 2 would order by the second column.
By the way, another place where constants are selected is when you are dumping from a relational database into a highly denormalized CSV file that you will be processing in NOSQL-like systems.

In the second case the 1 is not a literal at all. Rather, it is an ordinal number, indicating that the resultset should be sorted by its first column. If you changed the 1 to 4 the query would fail with an error because the resultset only has three columns.

BTW, the reason you use a constant like 1 instead of using an actual column is you avoid the I/O of actually getting the column value. This may improve performance.

Related

CASE WHEN PostgreSQL only working for one case

I'm writing a query that will go into some golang backend code to autopopulate some billing fields that we have. Basically there's a standard fee and a reduced fee. There's a table of these fees, the ID number, and the effective date.
If the ID number is in the list, I want to just select the fee for that ID number.
If the ID number is not in the list, I want to select the standard fee.
On the statements we get, we are given the MemberID. The MemberID joins with a CorporateID (CorpID) in a CorporateLinks table. If the MemberID is not in the CorporateLinks table at all, I want to select the standard fee.
Here is my query:
SELECT
CASE
WHEN cl.CorpID IN (SELECT CorpID FROM ConversionFactor) THEN MAX(cf.ConversionFactor)
WHEN MemberID NOT IN (SELECT MemberID FROM CorporateLinks) THEN MAX(cf.ConversionFactor)
ELSE MAX(cf.ConversionFactor)
END
FROM ConversionFactor cf
LEFT JOIN CorporateLinks cl
ON cf.CorpID = cl.CorpID
WHERE EffectiveDate = (SELECT EffectiveDate FROM ConversionFactor
WHERE EffectiveDate < $2
ORDER BY EffectiveDate DESC LIMIT 1)
AND MemberID = $1
GROUP BY cl.CorpID, cl.MemberID;
When the MemberID maps to a CorpID and the CorpID is in the list, it returns it perfectly.
When the MemberID is NOT in CorporateLinks, it returns an empty field.
I haven't found a test case where the MemberID is in CorporateLinks but CorpID is not in ConversionFactor (ELSE Case).
I'm not sure where I'm going wrong. I'm not very well versed in using CASE WHEN statements in queries, I've only used them in functions before to perform regex operations.
There are several questionable things about your query.
The parts relevant to the discussion look like:
SELECT CASE WHEN ... IN (SELECT ...)
THEN conversionfactor
WHEN ... NOT IN (SELECT ...)
THEN max(conversionfactor)
ELSE max(conversionfactor)
END
FROM ...
GROUP BY ..., conversionfactor, ...;
Observations:
There can be only a single value of conversionfactor in each group, because that column is part of the GROUP BY clause.
So it makes no sense to write max(conversionfactor) - it is going to be the same as conversionfactor.
The second THEN branch and the ELSE branch both return max(conversionfactor), so the second WHEN clause is superfluous.
Since all three branches return the same value, the whole CASE expression can be replaced with conversionfactor, because that is always going to be the result.
But your actual question is why the CASE expression returns an "empty field".
From the above discussion that would mean that conversionfactor is either an empty string (if it is a string type) or NULL.
Now there is no reason why this shouldn't be the case. You have to examine your data and look for NULL values or empty strings in that column. The CASE expression is useless, but it is not at fault for that.
You want to use a LEFT JOIN. The default join type if you don't specify is an INNER JOIN, and so if there is no entry in ConversionFactor that matches, the result set will omit it completely, not just set the relevant column(s) to NULL. Also your WHERE clause explicitly filters for MemberId. If MemberId is NULL, you'll never see any results as a result either.
You are also using sub-selects when it's not clear you need them. Once you switch to the LEFT JOIN see what the output looks like with a simple selection:
SELECT cf.*, cl.*
FROM ConversionFactor cf
LEFT JOIN CorporateLinks cl USING (CorpID)
WHERE cf.EffectiveDate < $2
LIMIT 100 -- Limit for a sanity check to prevent too many results as you debug
Once you can see your whole result set, it should be easier to work out how you want to filter and aggregate later by editing which columns you want returned.
It's quite possible that you don't need the CASE statement at all.

Oracle SQL Developer(4.0.0.12)

First time posting here, hopes it goes well.
I try to make a query with Oracle SQL Developer, where it returns a customer_ID from a table and the time of the payment from another. I'm pretty sure that the problems lies within my logicflow (It was a long time I used SQL, and it was back in school so I'm a bit rusty in it). I wanted to list the IDs as DISTINCT and ORDER BY the dates ASCENDING, so only the first date would show up.
However the returned table contains the same ID's twice or even more in some cases. I even found the same ID and same DATE a few times while I was scrolling through it.
If you would like to know more please ask!
SELECT DISTINCT
FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
FROM
FIRM.customer
INNER JOIN FIRM.account
ON FIRM.customer.CUSTOMER_ID = FIRM.account.CUSTOMER
INNER JOIN FIRM.account_recharge
ON FIRM.account.ACCOUNT_ID = FIRM.account_recharge.ACCOUNT
WHERE
FIRM.account_recharge.X__INSDATE BETWEEN TO_DATE('14-01-01', 'YY-MM-DD') AND TO_DATE('14-12-31', 'YY-MM-DD')
ORDER
BY FELTOLTES
Your select works like this because a CUSTOMER_ID indeed has more than one X__INSDATE, therefore the records in the result will be distinct. If you need only the first date then don't use DISTINCT and ORDER BY but try to select for MIN(X__INSDATE) and use GROUP BY CUSTOMER_ID.
SELECT DISTINCT FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
Distinct is applied to both the columns together, which means you will get a distinct ROW for the set of values from the two columns. So, basically the distinct refers to all the columns in the select list.
It is equivalent to a select without distinct but a group by clause.
It means,
select distinct a, b....
is equivalent to,
select a, b...group by a, b
If you want the desired output, then CONCATENATE the columns. The distict will then work on the single concatenated resultset.

Identify if at least one row with given condition exists

Employee table has ID and NAME columns. Names can be repeated. I want to find out if there is at least one row with name like 'kaushik%'.
So query should return true/false or 1/0.
Is it possible to find it using single query.
If we try something like
select count(1) from employee where name like 'kaushik%'
in this case it does not return true/false.
Also we are iterating over all the records in table. Is there way in simple SQL such that whenever first record which satisfies condition is fetched, it should stop checking further records.
Or such thing can only be handled in Pl/SQL block ?
EDIT *
First approach provided by Justin looks correct answer
SELECT COUNT(*) FROM employee WHERE name like 'kaushik%' AND rownum = 1
Commonly, you'd express this as either
SELECT COUNT(*)
FROM employee
WHERE name like 'kaushik%'
AND rownum = 1
where the rownum = 1 predicate allows Oracle to stop looking as soon as it finds the first matching row or
SELECT 1
FROM dual
WHERE EXISTS( SELECT 1
FROM employee
WHERE name like 'kaushik%' )
where the EXISTS clause allows Oracle to stop looking as soon as it finds the first matching row.
The first approach is a bit more compact but, to my eye, the second approach is a bit more clear since you really are looking to determine whether a particular row exists rather than trying to count something. But the first approach is pretty easy to understand as well.
How about:
select max(case when name like 'kraushik%' then 1 else 0 end)
from employee
Or, what might be more efficient since like can use indexes:
select count(x)
from (select 1 as x
from employee
where name like 'kraushik%'
) t
where rownum = 1
since you require that the sql query should return 1 or 0, then you can try the following query :-
select count(1) from dual
where exists(SELECT 1
FROM employee
WHERE name like 'kaushik%')
Since the above query uses Exists, then it will scan the employee table and as soon as it encounters the first record where name matches "kaushik", it will return 1 (without scanning the rest of the table). If none of the records match, then it will return 0.
select 1
where exists ( select name
from employee
where name like 'kaushik%'
)

Custom Sorting in SQL order by clause?

Here is the situation that I am trying to solve:
I have a query that could return a set of records. The field being sorted by could have a number of different values - for the sake of this question we will say that the value could be A, B, C, D, E or Z
Now depending on the results of the query, the sorting needs to behave as follows:
If only A-E records are found then sorting them "naturally" is okay. But if a Z record is in the results, then it needs to be the first result in the query, but the rest of the records should be in "natural" sort order.
For instance, if A C D are found, then the result should be
A
C
D
But if A B D E Z are found then the result should be sorted:
Z
A
B
D
E
Currently, the query looks like:
SELECT NAME, SOME_OTHER_FIELDS FROM TABLE ORDER BY NAME
I know I can code a sort function to do what I want, but because of how I am using the results, I can't seem to use because the results are being handled by a third party library, to which I am just passing the SQL query. It is then processing the results, and there seems to be no hooks for me to sort the results and just pass the results to the library. It needs to do the SQL query itself, and I have no access to the source code of the library.
So for all of you SQL gurus out there, can you provide a query for me that will do what I want?
How do you identify the Z record? What sets it apart? Once you understand that, add it to your ORDER BY clause.
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN (record matches Z) THEN 0
ELSE 1
END
),
name
This way, only the Z record will match the first ordering, and all other records will be sorted by the second-order sort (name). You can exclude the second-order sort if you really don't need it.
For example, if Z is the character string 'Bob', then your query might be:
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN name='Bob' THEN 0
ELSE 1
END
), name
My examples are for T-SQL, since you haven't mentioned which database you're using.
There are a number of ways to solve this problem and the best solution depends on a number of factors that you don't discuss such as the nature of those A..Z values and what database product you're using.
If you have only a single value that has to sort on top, you can ORDER BY an expression that maps that value to the lowest possible sort value (with CASE or IIF or IFEQ, depending on your database).
If you have several different special sort values you could ORDER BY a more complicated expression or you could UNION together several SELECTs, with one SELECT for the default sorts and an extra SELECT for each special value. The SELECTs would include a sort column.
Finally, if you have quite a few values you can put the sort values into a separate table and JOIN that table into your query.
Not sure what DB you use - the following works for Oracle:
SELECT
NAME,
SOME_OTHER_FIELDS,
DECODE (NAME, 'Z', '_', NAME ) SORTFIELD
FROM TABLE
ORDER BY DECODE (NAME, 'Z', '_', NAME ) ASC

Why can't I GROUP BY 1 when it's OK to ORDER BY 1?

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.