CASE WHEN PostgreSQL only working for one case - sql

I'm writing a query that will go into some golang backend code to autopopulate some billing fields that we have. Basically there's a standard fee and a reduced fee. There's a table of these fees, the ID number, and the effective date.
If the ID number is in the list, I want to just select the fee for that ID number.
If the ID number is not in the list, I want to select the standard fee.
On the statements we get, we are given the MemberID. The MemberID joins with a CorporateID (CorpID) in a CorporateLinks table. If the MemberID is not in the CorporateLinks table at all, I want to select the standard fee.
Here is my query:
SELECT
CASE
WHEN cl.CorpID IN (SELECT CorpID FROM ConversionFactor) THEN MAX(cf.ConversionFactor)
WHEN MemberID NOT IN (SELECT MemberID FROM CorporateLinks) THEN MAX(cf.ConversionFactor)
ELSE MAX(cf.ConversionFactor)
END
FROM ConversionFactor cf
LEFT JOIN CorporateLinks cl
ON cf.CorpID = cl.CorpID
WHERE EffectiveDate = (SELECT EffectiveDate FROM ConversionFactor
WHERE EffectiveDate < $2
ORDER BY EffectiveDate DESC LIMIT 1)
AND MemberID = $1
GROUP BY cl.CorpID, cl.MemberID;
When the MemberID maps to a CorpID and the CorpID is in the list, it returns it perfectly.
When the MemberID is NOT in CorporateLinks, it returns an empty field.
I haven't found a test case where the MemberID is in CorporateLinks but CorpID is not in ConversionFactor (ELSE Case).
I'm not sure where I'm going wrong. I'm not very well versed in using CASE WHEN statements in queries, I've only used them in functions before to perform regex operations.

There are several questionable things about your query.
The parts relevant to the discussion look like:
SELECT CASE WHEN ... IN (SELECT ...)
THEN conversionfactor
WHEN ... NOT IN (SELECT ...)
THEN max(conversionfactor)
ELSE max(conversionfactor)
END
FROM ...
GROUP BY ..., conversionfactor, ...;
Observations:
There can be only a single value of conversionfactor in each group, because that column is part of the GROUP BY clause.
So it makes no sense to write max(conversionfactor) - it is going to be the same as conversionfactor.
The second THEN branch and the ELSE branch both return max(conversionfactor), so the second WHEN clause is superfluous.
Since all three branches return the same value, the whole CASE expression can be replaced with conversionfactor, because that is always going to be the result.
But your actual question is why the CASE expression returns an "empty field".
From the above discussion that would mean that conversionfactor is either an empty string (if it is a string type) or NULL.
Now there is no reason why this shouldn't be the case. You have to examine your data and look for NULL values or empty strings in that column. The CASE expression is useless, but it is not at fault for that.

You want to use a LEFT JOIN. The default join type if you don't specify is an INNER JOIN, and so if there is no entry in ConversionFactor that matches, the result set will omit it completely, not just set the relevant column(s) to NULL. Also your WHERE clause explicitly filters for MemberId. If MemberId is NULL, you'll never see any results as a result either.
You are also using sub-selects when it's not clear you need them. Once you switch to the LEFT JOIN see what the output looks like with a simple selection:
SELECT cf.*, cl.*
FROM ConversionFactor cf
LEFT JOIN CorporateLinks cl USING (CorpID)
WHERE cf.EffectiveDate < $2
LIMIT 100 -- Limit for a sanity check to prevent too many results as you debug
Once you can see your whole result set, it should be easier to work out how you want to filter and aggregate later by editing which columns you want returned.
It's quite possible that you don't need the CASE statement at all.

Related

Postgres Error: More than one row returned by a subquery used as an expression

I have two separate databases. I am trying to update a column in one database to the values of a column from the other database:
UPDATE customer
SET customer_id=
(SELECT t1 FROM dblink('port=5432, dbname=SERVER1 user=postgres password=309245',
'SELECT store_key FROM store') AS (t1 integer));
This is the error I am receiving:
ERROR: more than one row returned by a subquery used as an expression
Any ideas?
Technically, to remove the error, add LIMIT 1 to the subquery to return at most 1 row. The statement would still be nonsense.
... 'SELECT store_key FROM store LIMIT 1' ...
Practically, you want to match rows somehow instead of picking an arbitrary row from the remote table store to update every row of your local table customer.
I assume a text column match_name in both tables (UNIQUE in store) for the sake of this example:
... 'SELECT store_key FROM store
WHERE match_name = ' || quote_literal(customer.match_name) ...
But that's an extremely expensive way of doing things.
Ideally, you completely rewrite the statement.
UPDATE customer c
SET customer_id = s.store_key
FROM dblink('port=5432, dbname=SERVER1 user=postgres password=309245'
, 'SELECT match_name, store_key FROM store')
AS s(match_name text, store_key integer)
WHERE c.match_name = s.match_name
AND c.customer_id IS DISTINCT FROM s.store_key;
This remedies a number of problems in your original statement.
Obviously, the basic error is fixed.
It's typically better to join in additional relations in the FROM clause of an UPDATE statement than to run correlated subqueries for every individual row.
When using dblink, the above becomes a thousand times more important. You do not want to call dblink() for every single row, that's extremely expensive. Call it once to retrieve all rows you need.
With correlated subqueries, if no row is found in the subquery, the column gets updated to NULL, which is almost always not what you want. In my updated query, the row only gets updated if a matching row is found. Else, the row is not touched.
Normally, you wouldn't want to update rows, when nothing actually changes. That's expensively doing nothing (but still produces dead rows). The last expression in the WHERE clause prevents such empty updates:
AND c.customer_id IS DISTINCT FROM sub.store_key
Related:
How do I (or can I) SELECT DISTINCT on multiple columns?
The fundamental problem can often be simply solved by changing an = to IN, in cases where you've got a one-to-many relationship. For example, if you wanted to update or delete a bunch of accounts for a given customer:
WITH accounts_to_delete AS
(
SELECT account_id
FROM accounts a
INNER JOIN customers c
ON a.customer_id = c.id
WHERE c.customer_name='Some Customer'
)
-- this fails if "Some Customer" has multiple accounts, but works if there's 1:
DELETE FROM accounts
WHERE accounts.guid =
(
SELECT account_id
FROM accounts_to_delete
);
-- this succeeds with any number of accounts:
DELETE FROM accounts
WHERE accounts.guid IN
(
SELECT account_id
FROM accounts_to_delete
);
This means your nested SELECT returns more than one rows.
You need to add a proper WHERE clause to it.
This error means that the SELECT store_key FROM store query has returned two or more rows in the SERVER1 database. If you would like to update all customers, use a join instead of a scalar = operator. You need a condition to "connect" customers to store items in order to do that.
If you wish to update all customer_ids to the same store_key, you need to supply a WHERE clause to the remotely executed SELECT so that the query returns a single row.
USE LIMIT 1 - so It will return only 1 row.
Example
customerId- (select id from enumeration where enumerations.name = 'Ready To Invoice' limit 1)
The result produced by the Query is having no of rows that need proper handling this issue can be resolved if you provide the valid handler in the query like
1. limiting the query to return one single row
2. this can also be done by providing "select max(column)" that will return the single row

"plan should not reference subplan's variable" error

I need to update 2 columns in a table with values from another table
UPDATE transakcje t SET s_dzien = s_dzien0, s_cena = s_cena0
FROM
(SELECT c.price AS s_cena0, c.dzien AS s_dzien0 FROM ciagle c
WHERE c.dzien = t.k_dzien ORDER BY s_cena0 DESC LIMIT 1) AS zza;
But I got an error:
plan should not reference subplan's variable.
DB structure is as simple as possible: transakcje has k_dzien, k_cena, s_dzien, s_cena and ciagle has fields price, dzien.
I'm running PostgreSQL 9.3.
Edit
I want to update all records from transakcje.
For each row I must find one row from ciagle with same dzien and maximum price and save this price and dzien into transakcje.
In ciagle there are many rows with the same dzien (column is not distinct).
Problem
The form you had:
UPDATE tbl t
SET ...
FROM (SELECT ... WHERE col = t.col LIMIT 1) sub
... is illegal to begin with. As the error message tells you, a subquery cannot reference the table in the UPDATE clause. Items in the FROM list generally cannot reference other items on the same level (except with LATERAL in Postgres 9.3 or later). And the table in the UPDATE clause can never be referenced by subqueries in the FROM clause (and that hasn't changed in Postgres 9.3).
Even if that was possible the result would be nonsense for two reasons:
The subquery with LIMIT 1 produces exactly one row (total), while you obviously want a specific value per dzien:
one row from ciagle with same dzien
Once you amend that and compute one price per dzien, you would end up with something like a cross join unless you add a WHERE condition to unambiguously join the result from the subquery to the table to be updated. Quoting the manual on UPDATE:
In other words, a target row shouldn't join to more than one row from
the other table(s). If it does, then only one of the join rows will be
used to update the target row, but which one will be used is not readily predictable.
Solution
All of this taken into account your query could look like this:
UPDATE transakcje t
SET s_dzien = c.dzien
, s_cena = c.price
FROM (
SELECT DISTINCT ON (dzien)
dzien, price
FROM ciagle
ORDER BY dzien, price DESC
) c
WHERE t.k_dzien = c.dzien
AND (t.s_dzien IS DISTINCT FROM c.dzien OR
t.s_cena IS DISTINCT FROM c.price)
Get the highest price for every dzien in ciagle in a subquery with DISTINCT ON. Details:
Select first row in each GROUP BY group?
Like #wildplasser commented, if you all you need is the highest price, you could also use the aggregate function max() instead of DISTINCT ON:
...
FROM (
SELECT dzien, max(price) AS price
FROM ciagle
GROUP BY czien
) c
...
transakcje ends up with the same value in s_dzien and k_dzien where related rows are present in ciagle.
The added WHERE clause prevents empty updates, which you probably don't want: only cost and no effect (except for exotic special cases with triggers et al.) - a common oversight.

SQL Nested Where with Sums

I've run into a syntax issue with SQL. What I'm trying to do here is add together all of the amounts paid on each order (paid each) an then only select those that are greater than sum of of paid each for a specific order# (1008). I've been trying to move around lots of different things here and I'm not having any luck.
This is what I have right now, though I've had many different things. Trying to use this simply returns an SQL statement not ended properly error. Any help you guys could give would be greatly appreciated. Do I have to use DISTINCT anywhere here?
SELECT ORDER#,
TO_CHAR(SUM(PAIDEACH), '$999.99') AS "Amount > Order 1008"
FROM ORDERITEMS
GROUP BY ORDER#
WHERE TO_CHAR > (SUM (PAIDEACH))
WHERE ORDER# = 1008;
Some versions of SQL regard the hash character (#) as the beginning of a comment. Others use double hyphen (--) and some use both. So, my first thought is that your ORDER# field is named incorrectly (though I can't imagine the engine would let you create a field with that name).
You have two WHERE keywords, which isn't allowed. If you have multiple WHERE conditions, you must link them together using boolean logic, with AND and OR keywords.
You have your WHERE condition after GROUP BY which should be reversed. Specify WHERE conditions before GROUP BY.
One of your WHERE conditions makes no sense. TO_CHAR > (SUM(paideach)): TO_CHAR() is a function which as far as I know is an Oracle function that converts numeric values to strings according to a specified format. The equivalent in SQL Server is CAST or CONVERT.
I'm guessing that you are trying to write a query that finds orders with amounts exceeding a particular value, but it's not very clear because one of your WHERE conditions specifies that the order number should be 1008, which would presumably only return one record.
The query should probably look more like this:
SELECT order,
SUM(paideach) AS amount
FROM orderitems
GROUP BY order
HAVING amount > 999.99;
This would select records from the orderitems table where the sum of paideach exceeds 999.99.
I'm not sure how order 1008 fits into things, so you will have to elaborate on that.
Other have commented on some of the things wrong with your query. I'll try to give more explicit hints about what I think you need to do to get the result I think you're looking for.
The problem seems to break into distinct sections, first finding the total for each order which you're close to and I think probably started from:
SELECT ORDER#, SUM(PAIDEACH) AS AMOUNT
FROM ORDERITEMS
GROUP BY ORDER#;
... finding the total for a specific order:
SELECT SUM(PAIDEACH)
FROM ORDERITEMS
WHERE ORDER# = 1008;
... and combining them, which is where you're stuck. The simplest way, and hopefully something you've recently been taught, is to use the HAVING clause, which comes after the GROUP BY and acts as a kind of filter that can be applied to the aggregated columns (which you can't do in the WHERE clause). If you had a fixed amount you could do this:
SELECT ORDER#, SUM(PAIDEACH) AS AMOUNT
FROM ORDERITEMS
GROUP BY ORDER#
HAVING SUM(PAIDEACH) > 5;
(Note that as #Bridge indicated you can't use the column alias, AMOUNT, in the having clause, you have to repeat the aggregation function SUM). But you don't have a fixed value, you want to use the actual total for order 1008, so you need to replace that fixed value with another query. I'll let you take that last step...
I'm not familiar with Oracle, and since it's homework I won't give you the answers, just a few ideas of what I think is wrong.
select statement should only have one where statement - can have more than one condition of course, just separated by logical operators (anything that evaluates to true will be included). E.g. : WHERE (column1 > column2) AND (column3 = 100)
Group by statements should after WHERE clauses
You can't refer to columns you've aliased in the select in the where clause of the same statement by their aliased name. For example this won't work:
SELECT column1 as hello
FROM table1
WHERE hello = 1
If there's a group by, the columns you're selecting should be the same as in that statement (or aggregates of those). This page does a better explanation of this than I do.

What is the meaning of a constant in a SELECT query?

Considering the 2 below queries:
1)
USE AdventureWorks
GO
SELECT a.ProductID, a.ListPrice
FROM Production.Product a
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderDetail b
WHERE b.ProductID = a.ProductID)
2)
USE AdventureWorks
GO
SELECT a.ProductID, a.Name, b.SalesOrderID
FROM Production.Product a LEFT OUTER JOIN Sales.SalesOrderDetail b
ON a.ProductID = b.ProductID
ORDER BY 1
My only question is know what is the meaning of the number 1 in those queries? How about if I change them to 2 or something else?
Thanks for helping
In the first case it does not matter; you can select a 2 or anything, really, because it is an existence query. In general selecting a constant can be used for other things besides existence queries (it just drops the constant into a column in the result set), but existence queries are where you are most likely to encounter a constant.
For example, given a table called person containing three columns, id, firstname, lastname, and birthdate, you can write a query like this:
select firstname, 'YAY'
from person
where month(birthdate) = 6;
and this would return something like
name 'YAY'
---------------
Ani YAY
Sipho YAY
Hiro YAY
It's not useful, but it is possible. The idea is that in a select statement you select expressions, which can be not only column names but constants and function calls, too. A more likely case is:
select lastname||','||firstname, year(birthday)
from person;
Here the || is the string concatenation operator, and year is a function I made up.
The reason you sometimes see 1 in existence queries is this. Suppose you only wanted to know whether there was a person whose name started with 'H', but you didn't care who this person was. You can say
select id
from person
where lastname like 'H%';
but since we don't need the id, you can also say
select 1
from person
where lastname like 'H%';
because all you care about is whether or not you get a non-empty result set or not.
In the second case, the 1 is a column number; it means you want your results sorted by the value in the first column. Changing that to a 2 would order by the second column.
By the way, another place where constants are selected is when you are dumping from a relational database into a highly denormalized CSV file that you will be processing in NOSQL-like systems.
In the second case the 1 is not a literal at all. Rather, it is an ordinal number, indicating that the resultset should be sorted by its first column. If you changed the 1 to 4 the query would fail with an error because the resultset only has three columns.
BTW, the reason you use a constant like 1 instead of using an actual column is you avoid the I/O of actually getting the column value. This may improve performance.

SQL Query to get latest price

I have a table containing prices for a lot of different "things" in a MS SQL 2005 table. There are hundreds of records per thing per day and the different things gets price updates at different times.
ID uniqueidentifier not null,
ThingID int NOT NULL,
PriceDateTime datetime NOT NULL,
Price decimal(18,4) NOT NULL
I need to get today's latest prices for a group of things. The below query works but I'm getting hundreds of rows back and I have to loop trough them and only extract the latest one per ThingID. How can I (e.g. via a GROUP BY) say that I want the latest one per ThingID? Or will I have to use subqueries?
SELECT *
FROM Thing
WHERE ThingID IN (1,2,3,4,5,6)
AND PriceDate > cast( convert(varchar(20), getdate(), 106) as DateTime)
UPDATE: In an attempt to hide complexity I put the ID column in a an int. In real life it is GUID (and not the sequential kind). I have updated the table def above to use uniqueidentifier.
I think the only solution with your table structure is to work with a subquery:
SELECT *
FROM Thing
WHERE ID IN (SELECT max(ID) FROM Thing
WHERE ThingID IN (1,2,3,4)
GROUP BY ThingID)
(Given the highest ID also means the newest price)
However I suggest you add a "IsCurrent" column that is 0 if it's not the latest price or 1 if it is the latest. This will add the possible risk of inconsistent data, but it will speed up the whole process a lot when the table gets bigger (if it is in an index). Then all you need to do is to...
SELECT *
FROM Thing
WHERE ThingID IN (1,2,3,4)
AND IsCurrent = 1
UPDATE
Okay, Markus updated the question to show that ID is a uniqueid, not an int. That makes writing the query even more complex.
SELECT T.*
FROM Thing T
JOIN (SELECT ThingID, max(PriceDateTime)
WHERE ThingID IN (1,2,3,4)
GROUP BY ThingID) X ON X.ThingID = T.ThingID
AND X.PriceDateTime = T.PriceDateTime
WHERE ThingID IN (1,2,3,4)
I'd really suggest using either a "IsCurrent" column or go with the other suggestion found in the answers and use "current price" table and a separate "price history" table (which would ultimately be the fastest, because it keeps the price table itself small).
(I know that the ThingID at the bottom is redundant. Just try if it is faster with or without that "WHERE". Not sure which version will be faster after the optimizer did its work.)
I would try something like the following subquery and forget about changing your data structures.
SELECT
*
FROM
Thing
WHERE
(ThingID, PriceDateTime) IN
(SELECT
ThingID,
max(PriceDateTime )
FROM
Thing
WHERE
ThingID IN (1,2,3,4)
GROUP BY
ThingID
)
Edit the above is ANSI SQL and i'm now guessing having more than one column in a subquery doesnt work for T SQL. Marius, I can't test the following but try;
SELECT
p.*
FROM
Thing p,
(SELECT ThingID, max(PriceDateTime ) FROM Thing WHERE ThingID IN (1,2,3,4) GROUP BY ThingID) m
WHERE
p.ThingId = m.ThingId
and p.PriceDateTime = m.PriceDateTime
another option might be to change the date to a string and concatenate with the id so you have only one column. This would be slightly nasty though.
If the subquery route was too slow I would look at treating your price updates as an audit log and maintaining a ThingPrice table - perhaps as a trigger on the price updates table:
ThingID int not null,
UpdateID int not null,
PriceDateTime datetime not null,
Price decimal(18,4) not null
The primary key would just be ThingID and "UpdateID" is the "ID" in your original table.
Since you are using SQL Server 2005, you can use the new (CROSS|OUTTER) APPLY clause. The APPLY clause let's you join a table with a table valued function.
To solve the problem, first define a table valued function to retrieve the top n rows from Thing for a specific id, date ordered:
CREATE FUNCTION dbo.fn_GetTopThings(#ThingID AS GUID, #n AS INT)
RETURNS TABLE
AS
RETURN
SELECT TOP(#n) *
FROM Things
WHERE ThingID= #ThingID
ORDER BY PriceDateTime DESC
GO
and then use the function to retrieve the top 1 records in a query:
SELECT *
FROM Thing t
CROSS APPLY dbo.fn_GetTopThings(t.ThingID, 1)
WHERE t.ThingID IN (1,2,3,4,5,6)
The magic here is done by the APPLY clause which applies the function to every row in the left result set then joins with the result set returned by the function then retuns the final result set. (Note: to do a left join like apply, use OUTTER APPLY which returns all rows from the left side, while CROSS APPLY returns only the rows that have a match in the right side)
BlaM:
Because I can't post comments yet( due to low rept points) not even to my own answers ^^, I'll answer in the body of the message:
-the APPLY clause even, if it uses table valued functions it is optimized internally by SQL Server in such a way that it doesn't call the function for every row in the left result set, but instead takes the inner sql from the function and converts it into a join clause with the rest of the query, so the performance is equivalent or even better (if the plan is chosen right by sql server and further optimizations can be done) than the performance of a query using subqueries), and in my personal experience APPLY has no performance issues when the database is properly indexed and statistics are up to date (just like a normal query with subqueries behaves in such conditions)
It depends on the nature of how your data will be used, but if the old price data will not be used nearly as regularly as the current price data, there may be an argument here for a price history table. This way, non-current data may be archived off to the price history table (probably by triggers) as the new prices come in.
As I say, depending on your access model, this could be an option.
I'm converting the uniqueidentifier to a binary so that I can get a MAX of it.
This should make sure that you won't get duplicates from multiple records with identical ThingIDs and PriceDateTimes:
SELECT * FROM Thing WHERE CONVERT(BINARY(16),Thing.ID) IN
(
SELECT MAX(CONVERT(BINARY(16),Thing.ID))
FROM Thing
INNER JOIN
(SELECT ThingID, MAX(PriceDateTime) LatestPriceDateTime FROM Thing
WHERE PriceDateTime >= CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)
GROUP BY ThingID) LatestPrices
ON Thing.ThingID = LatestPrices.ThingID
AND Thing.PriceDateTime = LatestPrices.LatestPriceDateTime
GROUP BY Thing.ThingID, Thing.PriceDateTime
) AND Thing.ThingID IN (1,2,3,4,5,6)
Since ID is not sequential, I assume you have a unique index on ThingID and PriceDateTime so only one price can be the most recent for a given item.
This query will get all of the items in the list IF they were priced today. If you remove the where clause for PriceDate you will get the latest price regardless of date.
SELECT *
FROM Thing thi
WHERE thi.ThingID IN (1,2,3,4,5,6)
AND thi.PriceDateTime =
(SELECT MAX(maxThi.PriceDateTime)
FROM Thing maxThi
WHERE maxThi.PriceDateTime >= CAST( CONVERT(varchar(20), GETDATE(), 106) AS DateTime)
AND maxThi.ThingID = thi.ThingID)
Note that I changed ">" to ">=" since you could have a price right at the start of a day
It must work without using a global PK column (for complex primary keys for example):
SELECT t1.*, t2.PriceDateTime AS bigger FROM Prices t1
LEFT JOIN Prices t2 ON t1.ThingID = t2.ThingID AND t1.PriceDateTime < t2.PriceDateTime
HAVING t2.PriceDateTime IS NULL
Try this (provided you only need the latest price, not the identifier or datetime of that price)
SELECT ThingID, (SELECT TOP 1 Price FROM Thing WHERE ThingID = T.ThingID ORDER BY PriceDateTime DESC) Price
FROM Thing T
WHERE ThingID IN (1,2,3,4) AND DATEDIFF(D, PriceDateTime, GETDATE()) = 0
GROUP BY ThingID
maybe i missunderstood the taks but what about a:
SELECT ID, ThingID, max(PriceDateTime), Price
FROM Thing GROUP BY ThingID