Unique constraint in Postgres based on last non-null value - sql

I have a table that looks like the following: create table prices_history (id0 serial primary key, product_id0 int, time_added timestamptz, listed_price numeric)
I would like to only insert a new price for a particular product_id0 into the table when max(time_added) of that product_id0 is distinct from the price I'm about to insert. Currently, I'm doing this through the following query, assuming that I want to insert a price of 9.50 for the product with id 101:
insert into prices_history (product_id0, time_added, price)
(
select 101, NOW(), 9.50 where not exists (
select * from (
select distinct on (product_id0) * from prices_history order by product_id0, time_added desc
) x where product_id0=101 and listed_price=9.50
)
) returning id0
Is there a better query to solve this problem?
I'm using Postgres v9.6.8 on Ubuntu 16.04 LTS.

Instead of using WHERE NOT EXISTS, I found that a more sustainable solution to do bulk inserts was LEFT JOIN..WHERE NULL. This involves doing a left join with the data I wanted to insert, choosing the data where there was no matching data in the old table. In the following example, say I had the following pricing data (represented as JSON):
[{price: 11.99, product_id0:2},
{price: 10.50, product_id0:3},
{price: 10.00, product_id0:4}]
The following query, would insert a subset of this data if any of it is new information:
insert into prices_history (product_id0, time_added, price)
(
select new_product_id0, new_time_added, new_price from
(select unnest(array[11.99, 10.50, 10.00]) as new_price, unnest(array[2,3,4]) as new_product_id0, now() as new_time_added) new_prices left join
(select distinct on (product_id0) * from prices_history order by product_id0, time_added desc) old_prices
on old_prices.product_id0 = new_prices.new_product_id0 and old_prices.listed_price= new_prices.new_price
where old_prices.product_id0 is null and old_prices.listed_price is null
) returning id0
This new query seems to work well in the current deployment.

Related

Compare a single-column row-set with another single-column row set in Oracle SQL

Is there any Oracle SQL operator or function, which compares 2 result sets whether they are the exact same or not. Currently my idea is to use MINUS operator in both directions, but I am looking for a better and performanter solution to achieve. The one result set is fixed (see below), the other depends on the records.
Very important: I am not allowed to change the schema and structure. So CREATE TABLE and CREATE TYPE etc. are not allowed here for me. Also important that oracle11g version is used where the solution must be found.
The shema for SQL Fiddle is:
CREATE TABLE DETAILS (ID INT, MAIN_ID INT, VALUE INT);
INSERT INTO DETAILS VALUES (1,1,1);
INSERT INTO DETAILS VALUES (2,1,2);
INSERT INTO DETAILS VALUES (3,1,3);
INSERT INTO DETAILS VALUES (4,1,4);
INSERT INTO DETAILS VALUES (5,2,1);
INSERT INTO DETAILS VALUES (6,2,2);
INSERT INTO DETAILS VALUES (7,3,1);
INSERT INTO DETAILS VALUES (7,3,2);
Now this is my SQL query for doing the job well (selects MAIN_IDs of those, whose 'VALUE's are exactly the same as the given lists'):
SELECT DISTINCT D.MAIN_ID FROM DETAILS D WHERE NOT EXISTS
(SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID
MINUS
SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2)))
AND NOT EXISTS
(SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2))
MINUS
SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID)
The SQL Fiddle link: http://sqlfiddle.com/#!4/25dde/7/0
If you use a collection (rather than a VARRAY) then you can aggregate the values into a collection and directly compare two collections:
CREATE TYPE int_list AS TABLE OF INT;
Then:
SELECT main_id
FROM details
GROUP BY main_id
HAVING CAST( COLLECT( value ) AS int_list ) = int_list( 1, 2 );
Outputs:
| MAIN_ID |
| ------: |
| 2 |
| 3 |
db<>fiddle here
Update
Based on your expanded fiddle in comments, you can use:
SELECT B.ID
FROM BUSINESS_DATA B
INNER JOIN BUSINESS_NAME N
ON ( B.NAME_ID=N.ID )
WHERE N.NAME='B1'
AND EXISTS (
SELECT business_id
FROM ORDERS O
LEFT OUTER JOIN TABLE(
SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' )
) d
ON ( o.orderdate = d.COLUMN_VALUE )
WHERE O.BUSINESS_ID=B.ID
GROUP BY business_id
HAVING COUNT( CASE WHEN d.COLUMN_VALUE IS NULL THEN 1 END ) = 0
AND COUNT( DISTINCT o.orderdate )
= ( SELECT COUNT(DISTINCT COLUMN_VALUE) FROM TABLE( SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' ) ) )
)
(Note: Do not implicitly create dates from strings; it will cause the query to fail, without there being any changes to the query text, if a user changes their NLS_DATE_FORMAT session parameter. Instead use TO_DATE with an appropriate format model or a DATE literal.)
db<>fiddle here

Firebird select from table distinct one field

The question I asked yesterday was simplified but I realize that I have to report the whole story.
I have to extract the data of 4 from 4 different tables into a Firebird 2.5 database and the following query works:
SELECT
PRODUZIONE_T t.CODPRODUZIONE,
PRODUZIONE_T.NUMEROCOMMESSA as numeroco,
ANGCLIENTIFORNITORI.RAGIONESOCIALE1,
PRODUZIONE_T.DATACONSEGNA,
PRODUZIONE_T.REVISIONE,
ANGUTENTI.NOMINATIVO,
ORDINI.T_DATA,
FROM PRODUZIONE_T
LEFT OUTER JOIN ORDINI_T ON PRODUZIONE_T.CODORDINE=ORDINI_T.CODORDINE
INNER JOIN ANGCLIENTIFORNITORI ON ANGCLIENTIFORNITORI.CODCLIFOR=ORDINI_T.CODCLIFOR
LEFT OUTER JOIN ANGUTENTI ON ANGUTENTI.IDUTENTE = PRODUZIONE_T.RESPONSABILEUC
ORDER BY right(numeroco,2) DESC, left(numeroco,3) desc
rows 1 to 500;
However the query returns me double (or more) due to the REVISIONE column.
How do I select only the rows of a single NUMEROCOMMESSA with the maximum REVISIONE value?
This should work:
select COD, ORDER, S.DATE, REVISION
FROM TAB1
JOIN
(
select ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By ORDER
) m on m.ORDER = TAB1.ORDER and m.REVISION = TAB1.REVISION
Here you go - http://sqlfiddle.com/#!6/ce7cf/4
Sample Data (as u set it in your original question):
create table TAB1 (
cod integer primary key,
n_order varchar(10) not null,
s_date date not null,
revision integer not null );
alter table tab1 add constraint UQ1 unique (n_order,revision);
insert into TAB1 values ( 1, '001/18', '2018-02-01', 0 );
insert into TAB1 values ( 2, '002/18', '2018-01-31', 0 );
insert into TAB1 values ( 3, '002/18', '2018-01-30', 1 );
The query:
select *
from tab1 d
join ( select n_ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By n_ORDER ) m
on m.n_ORDER = d.n_ORDER and m.REVISION = d.REVISION
Suggestions:
Google and read the classic book: "Understanding SQL" by Martin Gruber
Read Firebird SQL reference: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
Here is yet one more solution using Windowed Functions introduced in Firebird 3 - http://sqlfiddle.com/#!6/ce7cf/13
I do not have Firebird 3 at hand, so can not actually check if there would not be some sudden incompatibility, do it at home :-D
SELECT * FROM
(
SELECT
TAB1.*,
ROW_NUMBER() OVER (
PARTITION BY n_order
ORDER BY revision DESC
) AS rank
FROM TAB1
) d
WHERE rank = 1
Read documentation
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.firebirdsql.org/file/documentation/release_notes/html/en/3_0/rnfb30-dml-windowfuncs.html
Which of the three (including Gordon's one) solution would be faster depends upon specific database - the real data, the existing indexes, the selectivity of indexes.
While window functions can make the join-less query, I am not sure it would be faster on real data, as it maybe can just ignore indexes on order+revision cortege and do the full-scan instead, before rank=1 condition applied. While the first solution would most probably use indexes to get maximums without actually reading every row in the table.
The Firebird-support mailing list suggested a way to break out of the loop, to only use a single query: The trick is using both windows functions and CTE (common table expression): http://sqlfiddle.com/#!18/ce7cf/2
WITH TMP AS (
SELECT
*,
MAX(revision) OVER (
PARTITION BY n_order
) as max_REV
FROM TAB1
)
SELECT * FROM TMP
WHERE revision = max_REV
If you want the max revision number in Firebird:
select t.*
from tab1 t
where t.revision = (select max(t2.revision) from tab1 t2 where t2.order = t.order);
For performance, you want an index on tab1(order, revision). With such an index, performance should be competitive with any other approach.

SQL aliasing with FROM AS

SELECT A.barName AS BarName1, B.barName AS BarName2
FROM (
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS A, B
WHERE A.count = B.count
I'm trying to do a self join on this table that I created, but I'm not sure how to alias the table twice in this format (i.e. FROM AS). Unfortunately, this is a school assignment where I can't create any new tables. Anyone have experience with this syntax?
edit: For clarification I'm using PostgreSQL 8.4. The schema for the tables I'm dealing with are as follows:
Drinkers(name, addr, hobby, frequent)
Bars(name, addr, owner)
Beers(name, brewer, alcohol)
Drinks(drinkerName, drinkerAddr, beerName, rating)
Sells(barName, beerName, price, discount)
Favorites(drinkerName, drinkerAddr, barName, beerName, season)
Again, this is for a school assignment, so I'm given read-only access to the above tables.
What I'm trying to find is pairs of bars (Name1, Name2) that sell the same set of drinks. My thinking in doing the above was to try and find pairs of bars that sell the same number of drinks, then list the names and drinks side by side (BarName1, Drink1, BarName2, Drink2) to try and compare if they are indeed the same set.
You have not mentioned what RDBMS you use.
If Oracle or MS SQL, you can do something like this (I use my sample data table, but you can try it with your tables):
create table some_data (
parent_id int,
id int,
name varchar(10)
);
insert into some_data values(1, 2, 'val1');
insert into some_data values(2, 3, 'val2');
insert into some_data values(3, 4, 'val3');
with data as (
select * from some_data
)
select *
from data d1
left join data d2 on d1.parent_id = d2.id
In your case this query
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
should be placed in WITH section and referenced from main query 2 times as A and B.
It is slightly unclear what you are trying to achive. Are you looking for a list bar names, with how many times they appear in the table? If so, there are a couple ways you could do this. Firstly:
SELECT SellsA.barName AS BarName1, SellsB.count AS Count
FROM
(
SELECT DISTINCT barName
FROM Sells
) SellsA
LEFT JOIN
(
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS SellsB
ON SellsA.barName = SellsB.barName
Secondly, if you are using MSSQL:
SELECT barNamr, MAX(rn) AS Count
FROM
(
SELECT barName,
ROW_NUMBER() OVER(ORDR BY barName PARTITION BY barName) as rn
FROM Sells
) CountSells
GROUP BY barName
Thirdly, you could avoid a self-join in MSSQL, by using OVER():
SELECT
barName
COUNT(*) OVER(ORDER BY barName PARTITION BY barName) AS Count
FROM Sells

How to Retrieve id of inserted row when using upsert with WITH cluase in Posgres 9.5?

I'm trying to do upset query in Postgres 9.5 using "WITH"
with s as (
select id
from products
where product_key = 'test123'
), i as (
insert into products (product_key, count_parts)
select 'test123', 33
where not exists (select 1 from s)
returning id
)
update products
set product_key='test123', count_parts=33
where id = (select id from s)
returning id
Apparently I'm retrieving the id only on the updates and get nothing on insertions even though I know insertions succeeded.
I need to modify this query in a way I'll be able the get the id both on insertions and updates.
Thanks!
It wasn't clear to me why you do at WITH first SELECT, but the reason you get only returning UPDATE id is because you're not selecting INSERT return.
As mentioned (and linked) in comments, Postgres 9.5 supports INSERT ON CONFLICT Clause which is a much cleaner way to use.
And some examples of before and after 9.5:
Before 9.5: common way using WITH
WITH u AS (
UPDATE products
SET product_key='test123', count_parts=33
WHERE product_key = 'test123'
RETURNING id
),i AS (
INSERT
INTO products ( product_key, count_parts )
SELECT 'test123', 33
WHERE NOT EXISTS( SELECT 1 FROM u )
RETURNING id
)
SELECT *
FROM ( SELECT id FROM u
UNION SELECT id FROM i
) r;
After 9.5: using INSERT .. ON CONFLICT
INSERT INTO products ( product_key, count_parts )
VALUES ( 'test123', 33 )
ON CONFLICT ( product_key ) DO
UPDATE
SET product_key='test123', count_parts=33
RETURNING id;
UPDATE:
As hinted in a comment there might be slight cons using INSERT .. ON CONFLICT way.
In case table using auto-increment and this query happens a lot, then WITH might be a better option.
See more: https://stackoverflow.com/a/39000072/1161463

Postgres: GROUP BY several column

I have two table in this example.
( example column name )
First is the product
product_id | product_text
Second table is Price.
price_productid | price_datestart | price_price
Let's just say I have multiple datestart with the same product. How can I get the actual price ?
If I use GROUP BY in Postgres, with all the selected column, 2 row may come for the same product. Because the column price_datestart is different.
Example :
product_id : 1
product_text : "Apple Iphone"
price_productid : 1
price_datestart :"2013-10-01"
price_price :"99"
price_productid : 1
price_datestart :"2013-12-01"
price_price :"75"
If I try this :
SELECT price_productid,price_datestart,price_price,product_text,product_id
WHERE price_datestart > now()
GROUP BY price_productid,price_datestart,price_price,product_text,product_id
ORDER BY price_datestart ASC
It will give me a result, but two rows and I need one.
Use distinct on syntax. If you want current price:
select distinct on (p.productid)
p.productid, pr.product_text, p.price, p.datestart
from Price as p
left outer join Product as pr on pr.productid = p.productid
where p.datestart <= now()
order by p.productid, p.datestart desc
sql fiddle demo
You have a few problems, but GROUP BY is not one of them.
First, although you have a datestart you don't have a dateend. I'd change datestart to be a daterange, for example:
CREATE TABLE product
(
product_id int
,product_text text
);
CREATE TABLE price
(
price_productid int
,price_daterange TSRANGE
,price_price NUMERIC(10,2)
);
The TSRANGE allows you to set up validity of your price over a given range, for example:
INSERT INTO product VALUES(1, 'phone');
INSERT INTO price VALUES(1, '[2013-08-01 00:00:00,2013-10-01 00:00:00)', 199);
INSERT INTO price VALUES(1, '[2013-10-01 00:00:00,2013-12-01 00:00:00)', 99);
INSERT INTO price VALUES(1, '[2013-12-01 00:00:00,)', 75);
And that makes your SELECT much more simple, for example:
SELECT price_productid,price_daterange,price_price,product_text,product_id
FROM product, price
WHERE price_daterange #> now()::timestamp
AND product_id = price_productid
This also has the benefit of allowing you to query for any arbitrary time by swapping out now() for another date.
You should read up on ranges in PostgresQL as they are very powerful. The example above is not complete in that it should also have indices on price_daterange to ensure that you do not have overlaps for any product.
SQL fiddle with above solution