Snowflake: SQL compilation error: not a valid group by expression - sql

I'm trying to call a window function inside of a case statement, as such:
SELECT
DISTINCT properties.property_id
COALESCE(MAX(CASE
WHEN units_count.unit_type = 'NORMAL' THEN units_count.unit_count
END) OVER (PARTITION BY properties.property_id),
0)::INT AS normal_units_count
FROM units_count
JOIN properties ON units_count.property_id = properties.property_id
I'm receiving the following error:
SQL compilation error: [IFF(UNITS_COUNT.UNIT_TYPE = 'NORMAL', UNITS_COUNT.UNIT_COUNT, SYSTEM$NULL_TO_FIXED(null))] is not a valid group by expression
I've tried adding a qualify clause to remove the MAX() function:
SELECT
DISTINCT properties.property_id
COALESCE(CASE
WHEN units_count.unit_type = 'NORMAL' THEN units_count.unit_count
END OVER (PARTITION BY properties.property_id),
0)::INT AS normal_units_count
FROM units_count
JOIN properties ON units_count.property_id = properties.property_id
QUALIFY units_count.unit_count = MAX(units_count.unit_count) OVER (PARTITION BY properties.property_id)
The code executes, but the qualify clause results in unwanted filtering for other fields. Can I keep the existing logic (with MAX()) or do I need to include a qualify clause?

The DISTINCT is in effect a grouping operation, but your MAX is a for every row with the OVER clause.
which implies as "basic SQL" this should work:
SELECT
p.property_id,
MAX(IFF(uc.unit_type = 'NORMAL', uc.unit_count, 0)) as v1 max_units_count
FROM units_count AS uc
JOIN properties AS p
ON uc.property_id = p.property_id
GROUP BY 1
but as your "other fields" implies you are selecting other things and without knowing what/how you are doing that is hard to see what you are wanting todo.
also your max(unit_count) does not so much feel like it is a normal_unit_count.
But you example needs to pull in a second+ column of what you are wanting to do to see how they should be co-handled. But I would be inclined with zero information to suggest you use a CTE to find the per property_id the MAX unit_count and then join that result to a second read from your data. Because with zero insights to the other operations and how they could re-use that read, I have experienced it is better to GROUP/JOIN then WINDOW(over partition)/ANYVALUE. But that is another option.
Thus (without testing) this might work:
SELECT
DISTINCT p.property_id
ANYVALUE(COALESCE(MAX(CASE
WHEN uc.unit_type = 'NORMAL' THEN uc.unit_count
END) OVER (PARTITION BY p.property_id),
0)::INT) AS normal_units_count
FROM units_count AS uc
JOIN properties AS p
ON uc.property_id = p.property_id

Related

How to cast only the part of a table using a single SQL command in PostgreSQL

In a PostgreSQL table I have several information stored as text. It depends on the context described by a type column what type of information is stored. The application is prepared to get by only one command the Id's of the row.
I got into trouble when i tried to compare the information (bigint stored as a string) with an external value (e.g. '9' > '11'). When I tried to cast the column, the datatbase return an error (not all values in the column are castable, e.g. datetime or normal text). Also when I try to cast only the result of a query command, I get a cast error.
I get the table with the castable rows by this command:
SELECT information.id as id, item.information::bigint as item
FROM information
INNER JOIN item
ON information.id = item.informationid
WHERE information.type = 'task'
The resulting rows are showing up only text that is castable. When I throw it into another command it results in an error.
SELECT x.id FROM (
SELECT information.id as id, item.information::bigint as item
FROM information
INNER JOIN item
ON information.id = item.informationid
WHERE information.type = 'task'
) AS x
WHERE x.item > '0'::bigint
Accroding to the error, the database tried to cast all rows in the table.
Technically, this happens because the optimizer thinks WHERE x.item > '0'::bigint is a much more efficient filter than information.type = 'task'. So in the table scan, the WHERE x.item > '0'::bigint condition is chosen to be the predicate. This thinking is not wrong but will make you fall into this seemingly illogical trouble.
The suggestion by Gordon to use CASE WHEN inf.type = 'task' THEN i.information::bigint END can avoid this, but however it may sometimes ruin your idea to put that as a sub-query and require the same condition to be written twice.
A funny trick I tried is to use OUTER APPLY:
SELECT x.* FROM (SELECT 1 AS dummy) dummy
OUTER APPLY (
SELECT information.id as id, item.information::bigint AS item
FROM information
INNER JOIN item
ON information.id = item.informationid
WHERE information.type = 'task'
) x
WHERE x.item > '0'::bigint
Sorry that I only verified the SQL Server version of this. I understand PostgreSQL has no OUTER APPLY, but the equivalent should be:
SELECT x.* FROM (SELECT 1 AS dummy) dummy
LEFT JOIN LATERAL (
SELECT information.id as id, item.information::bigint AS item
FROM information
INNER JOIN item
ON information.id = item.informationid
WHERE information.type = 'task'
) x ON true
WHERE x.item > '0'::bigint
(reference is this question)
Finally, a more tidy but less flexible method is add the optimizer hint to turn off it to force the optimizer to run the query as how it is written.
This is unfortunate. Try using a case expression:
SELECT inf.id as id,
(CASE WHEN inf.type = 'task' THEN i.information::bigint END) as item
FROM information inf JOIN
item i
ON inf.id = i.informationid
WHERE inf.type = 'task';
There is no guarantee that the WHERE filter is applied before the SELECT. However, CASE does guarantee the order of evaluation, so it is safe.

Why my SQL query does not apply my where not equal condition?

I used a CTE (temp table) to compute data through semesters. With my query, I want to compare if my data (already aggregated) are equal to my sum computed in my CTE. However, when using WHERE data_already_aggregated <> data_computed I got a result where my results in those columns are equal. I tried to use =! as well, and also =. Last option worked, but not the way I want.
Here my code:
WITH trimestriel AS(
SELECT "donnéestrim".annee,
"donnéestrim".trimestre,
"donnéestrim".codescpi,
scpi.scpi,
"donnéestrim"."rd_t",
SUM("donnéestrim".rd_t) OVER (PARTITION BY "donnéestrim".annee, "donnéestrim".codescpi) AS "trim_sum_totalYear_rd_t",
row_number() over (partition BY scpi.codescpi, annee) AS "row_number"
FROM "donnéestrim"
LEFT JOIN scpi ON scpi.codescpi = "donnéestrim".codescpi
ORDER BY "donnéestrim".annee)
SELECT "annuel".codescpi,
scpi.scpi ,
"annuel".annee,
trimestriel."trimestre",
"annuel".revdisavpl AS "annuel_revdisavpl",
"trim_sum_totalYear_rd_t",
"rd_t" AS "trim_rd_t"
FROM "annuel"
LEFT JOIN scpi ON scpi.codescpi = "annuel".codescpi
LEFT JOIN trimestriel ON trimestriel.codescpi = "annuel".codescpi AND trimestriel.annee=annuel.annee
WHERE "annuel".annee = '2018' and scpi.codescpi = '129' and "annuel".revdisavpl != "trim_sum_totalYear_rd_t"
result
As #JNevill suggested, it's a data type problem. A double precision type only display the data with its specific parameters, but can store more information if casted. I cast it to real, and I can saw the gaps between my data.

SQL Group by CASE result

I have a simple SQL query on IBM DB2. I'm trying to run something as below:
select case when a.custID = 42285 then 'Credit' when a.unitID <> '' then 'Sales' when a.unitID = '' then 'Refund'
else a.unitID end TYPE, sum(a.value) as Total from transactions a
group by a.custID, a.unitID
This query runs, however I have a problem with group by a.custID - I'd prefer not to have this, but the query won't run unless it's present. I'd want to run the group by function based on the result of the CASE function, not the condition pool behind it. So, I'm looking something like:
group by TYPE
However adding group by TYPE reports an error message "Column or global variable TYPE not found". Also removing a.custID from group section reports "Column custID or expression in SELECT list not valid"
Is this going to be possible at all or do I need to review my CASE function and avoid using the custID column since at the moment I'm getting a grouping also based on custID column, even though it's not present in SELECT.
I understand why the grouping works as it does, I'm just wondering if it's possible to get rid of the custID grouping, but still maintain it within CASE function.
If you want terseness of code, you could use a subquery here:
SELECT TYPE, SUM(value) AS Total
FROM
(
SELECT CASE WHEN a.custID = 42285 THEN 'Credit'
WHEN a.unitID <> '' THEN 'Sales'
WHEN a.unitID = '' THEN 'Refund'
ELSE a.unitID END TYPE,
value
FROM transactions a
) t
GROUP BY TYPE;
The alternative to this would be to just repeat the CASE expression in the GROUP BY clause, which is ugly, but should still work. Note that some databases (e.g. MySQL) have overloaded GROUP BY and do allow aliases to be used in at least some cases.

single-row subquery returns more than one row. Query not working with main query

I hve to display several cell values into one cell. So I am using this query:
select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME
It is working fine. But when I am merging this with my main(complete) query then I am getting the error as: Error code 1427, SQL state 21000: ORA-01427: single-row subquery returns more than one row
Here is my complete query:
SELECT
TRACK_EVENT.LOCATION,
TRACK_EVENT.ELEMENT_NAME,
(select COUNT(*) from ORS.TRACK_EVENT b where (b.ELEMENT_NAME = sw.SWITCH_NAME)AND (b.ELEMENT_TYPE = 'SWITCH')AND (b.EVENT_TYPE = 'I')AND (b.ELEMENT_STATE = 'NORMAL' OR b.ELEMENT_STATE = 'REVERSE'))as COUNTER,
(select COUNT(*) from ORS.SWITCH_OPERATIONS fc where TRACK_EVENT.ELEMENT_NAME = fc.SWITCH_NAME and fc.NO_CORRESPONDENCE = 1 )as FAIL_COUNT,
(select MAX(cw.COMMAND_TIME) from ORS.SWITCH_OPERATIONS cw where ((TRACK_EVENT.ELEMENT_NAME = cw.SWITCH_NAME) and (cw.NO_CORRESPONDENCE = 1)) group by cw.SWITCH_NAME ) as FAILURE_DATE,
(select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME)
FROM
ORS.SWITCH_OPERATIONS sw,
ORS.TRACK_EVENT TRACK_EVENT
WHERE
sw.SEQUENCE_ID = TRACK_EVENT.SEQUENCE_ID
Not only are subqueries in the SELECT list required to return exactly one row (or any time they're used for a singular comparison, like <, =, etc), but their use in that context tends to make the database execute them RBAR - Row-by-agonizing-row. That is, they're slower and consume more resources than they should.
Generally, unless the result set outside the subquery contains only a few rows, you want to construct subqueries as part of a table-reference. Ie, something like:
SELECT m.n, m.z, aliasForSomeTable.a, aliasForSomeTabe.bSum
FROM mainTable m
JOIN (SELECT a, SUM(b) AS bSum
FROM someTable
GROUP BY a) aliasForSomeTable
ON aliasForSomeTable.a = m.a
This benefits you in other ways to - it's easier to get multiple columns out of the same table-reference, for example.
Assuming that LISTAGG(...) can be included with other aggregate functions, you can change your query to look like this:
SELECT Track_Event.location, Track_Event.element_name,
Counted_Events.counter,
Failure.fail_count, Failure.failure_date, Failure.descrip
FROM ORS.Track_Event
JOIN ORS.Switch_Operations
ON Switch_Operations.sequence_id = Track_Event.sequence_id
LEFT JOIN (SELECT element_name, COUNT(*) AS counter
FROM ORS.Track_Event
WHERE element_type = 'SWITCH'
AND event_type = 'I'
AND element_state IN ('NORMAL', 'REVERSE')
GROUP BY element_name) Counted_Events
ON Counted_Events.element_name = Switch_Operations.swicth_name
LEFT JOIN (SELECT switch_name,
COUNT(CASE WHEN no_correspondence = 1 THEN '1' END) AS fail_count,
MAX(CASE WHEN no_correspondence = 1 THEN command_time END) AS failure_date,
LISTAGG(description, ';' || CHAR(10)) WITHIN GROUP (ORDER BY command_time) AS descrip
FROM ORS.Switch_Operations
GROUP BY switch_name) Failure
ON Failure.switch_name = Track_Event.element_name
This query was written to (attempt to) preserve the semantics of your original query. I'm not completely sure that's what you actually need but without sample starting data and desired results, I have no way to tell how else to improve this. For instance, I'm a little suspicious of the need of Switch_Operations in the outer query, and the fact that LISTAGG(...) is run over row where no_correspondence <> 1. I did change the ordering of LISTAGG(...), because the original column would not have done anything (because the order way the same as the grouping), so would not have been a stable sort.
Single-row subquery returns more than one row.
This error message is self descriptive.
Returned field can't have multiple values and your subquery returns more than one row.
In your complete query you specify fields to be returned. The last field expects single value from the subquery but gets multiple rows instead.
I have no clue about the data you're working with but either you have to ensure that subquery returns only one row or you have to redesign the wrapping query (possibly using joins when appropriate).

Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I'm trying to select a bunch of patients with their unit and division and I want to group the result by unit name, but this code doesn't execute and gives the error as the topic of this question.
SELECT TOP (100) PERCENT
Pat.PatName AS Name,
srvDiv.sgMType AS Perkhidmatan,
Pat.PatMRank AS Pangkat,
Pat.PatMilitaryID AS [No# Tentera],
unt.untName AS Unit,
fct.pesStatusCode as StatusCode,
fct.pesSignedDate AS SignedDate
FROM dbo.FactPES AS fct INNER JOIN
dbo.DimPatient AS Pat ON fct.pesPatID = Pat.PatID LEFT OUTER JOIN
dbo.DimUnit AS unt ON fct.pesUnitID = unt.untID LEFT OUTER JOIN
dbo.DimServiceDiv AS srvDiv ON fct.pesServiceDivID = srvDiv.sgID
GROUP BY unt.untName
HAVING (deas.diDate BETWEEN
CONVERT(DATETIME, #FromDate, 102)
AND
CONVERT(DATETIME, #ToDate, 102))
I assume it's because unt.UntName is in my left join so I can't use it outside the join maybe ? I'm a bit confused because when I put it like this it works:
GROUP BY unt.untName, Pat.PatName, srvDiv.sgMType,
Pat.PatMRank, Pat.PatMilitaryID, unt.untName,
fct.pesStatusCode, fct.pesSignedDate
Any help is appreciated
First, please don't use TOP (100) PERCENT; it hurts even to read.
Second, your query contains no aggregate function, no SUM or COUNT for example. When you say you want to "group by unit name", I suspect you may simply want the results sorted by unit name. In that case, you want ORDER BY instead. (The advice from other to study what group by does is well taken.)
Finally, you might not need those CONVERT functions at the end, depending on your DBMS.
Whenever you use a GROUP BY - it should be present in the SELECT statement as a column. And if you do not want to contain it in a GROUP BY use it as an AGGREGATE column in SELECT.
So now in your case the second GROUP BY stated in your question will work.
Read this to understand more about GROUP BY