Using part of the select clause without rewriting it - sql

I am using an Oracle SQL Db and I am trying to count the number of terms starting with X letter in a dictionnary.
Here is my query :
SELECT Substr(Lower(Dict.Term),0,1) AS Initialchar,
Count(Lower(Dict.Term))
FROM Dict
GROUP BY Substr(Lower(Dict.Term),0,1)
ORDER BY Substr(Lower(Dict.Term),0,1);
This query is working as expected, but the thing that I'm not really happy about is the fact that I have to rewrite the long "Substr(Lower(Dict.Term),0,1)" in the GROUP BY and ORDER BY clause. Is there any way to reuse the one I defined in the SELECT part ?
Thanks

You can use a subquery. Because Oracle follows the SQL standard, substr() starts counting at 1. Although Oracle does explicitly allow 0 ("If position is 0, then it is treated as 1"), I find it misleading because "0" and "1" refer to the same position.
So:
select first_letter, count(*)
from (select d.*, substr(lower(d.term), 1, 1) as first_letter
from dict d
) d
group by first_letter
order by first_letter;

Not directly. The output columns can only be referred to in the ORDER BY clause, but not used in any other way. The only way would be to make it into a subselect, but it wouldn't be any clearer and might cause issues with performance.

I prefer subquery factoring for this purpose.
with init as (
select substr(lower(d.term), 1, 1) as Initialchar
from dict d)
select Initialchar, count(*)
from init
group by Initialchar
order by Initialchar;
Contrary to opposite meaning, IMO this makes the query much clearer and defines natural order; especially while using more subqueries.
I'm not aware about performance caveats, but there are some limitation, such as it not possible to use with clause within another with clause: ORA-32034: unsupported use of WITH clause.

Related

Does CnosDB support nested subqueries with aggregate functions or any equivalent solution?

Currently, I cannot do this in CnosQL:
SELECT mean(max("speed")) FROM "wind" GROUP BY time(1m)
I would like to count the average value of the maximum speed per minute over a period of time.
Apparently, CnosDB supports nested subqueries with aggregate functions.
You can try with
SELECT MEAN("max") FROM (SELECT MAX("speed") FROM "wind" GROUP BY time(1m) )
You use the wrong syntax of nested subqueries with aggregate functions.
The right syntax is:
SELECT_clause FROM ( SELECT_clause FROM ( SELECT_statement ) [...] ) [...]
Maybe you can try this again.
I think that it supports. However, your query seems problematic for adding "" for the measurement name. TRY:
SELECT mean(max("speed")) FROM wind GROUP BY time(1m)

SQLite alias (AS) not working in the same query

I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).
Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.
You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))

Using analytical Count(distinct) on Vertica is not supported

Having a thorough Google research, it seems that Vertica DB simply does not support count(distinct <col>) over(<partition by>), as it causes:
"ERROR 4249: Only MIN/MAX are allowed to use DISTINCT ... MIN/MAX are allowed to use DISTINCT"
I'm looking for an easy walk-around for this one.
Meanwhile, I'm using joins or nested queries.
For example:
select campaign_id, segment_id, COUNT(DECODE(rank, 1, 1, NULL)) over()
from (select campaign_id, segment_id, row_number() over(partition by segment_id) rank
from cs)
But my query is very long and I need to invent tricks all over the way. Any idea for a better approach?
Thanks!
(Working at HPE? Please implement this, as you did for all common analytical funcitions!)
I had to do something similar nested counting structure for counting distinct values cumulatively, over a date range. It boiled down to a similar gathering up of ROW_NUMBER() = 1 rows, though I used case:
COUNT(CASE WHEN rank = 1 THEN userID END) OVER (...)
It wasn't pretty to look at, but it was mercifully not slow.
I need to invent tricks all over the way
Yeah, I think that just happens when you bump into missing features.

LINQ to SQL - When do I need to Select and when can I omit the Select?

I've been playing around with LINQ to SQL and I just have a couple of simple questions:
When do I need Select on the end of my query?
When can I omit the Select?
Here are my example queries:
Dim pageRoute = From r In db.PageRoutes Where r.PageId = pageId Order By r.Id Descending
Dim pageRoute = From r In db.PageRoutes Where r.PageId = pageId Order By r.Id Descending
Dim dp = From r In db.DownloadPageOnlineOnlies Where r.PageId = pageId Order By r.Weight Descending, r.Id Ascending
Dim download = (From r In db.Downloads Where r.Id = id).First
Are any of them technically wrong?
Could they be improved with a Select or something else?
In a nutshell, I don't understand when I would need either:
Select r
Select r.AColumnINeed, r.BColumnINeed (does this improve performance?)
Thanks.
P.S. I like to write my LINQ queries on one line unless they are really big.
The select portion of the linq statement is completely optional if and only if you want to get a full object out of the collection that matches your where clause. If you want individual value(s) from an object in the collection being LINQ'd through then you need to use a select clause.
I personally always put the select r clause on the end just out of pure habit but I have come across a few issues with other peoples code when I have option strict on and they did not when they wrote the LINQ. Leaving the select clause off yields multiple late binding errors if you decide to turn option strict on in the future for whatever reason.
so in short you don't need the select clause but you are only helping yourself later on down the line if you decide to turn option strict on. And it does make your code much more readable in my opinion.
Let's have a table with 20 columns. Take a query WITH select (2 columns) and one WITHOUT. The execution plan of the two can be different and there's much less data to transfer from the database server in the former case.
it a good practice to use select plus if you have a query that requests a more precise result and like you only want a or a and b from you query you might use select and it saves the memory allocation for your query since it know exact number of variables you are going to return.

Group by SQL statement

So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?
Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?
If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?
You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)