Postgres: GROUP BY several column - sql

I have two table in this example.
( example column name )
First is the product
product_id | product_text
Second table is Price.
price_productid | price_datestart | price_price
Let's just say I have multiple datestart with the same product. How can I get the actual price ?
If I use GROUP BY in Postgres, with all the selected column, 2 row may come for the same product. Because the column price_datestart is different.
Example :
product_id : 1
product_text : "Apple Iphone"
price_productid : 1
price_datestart :"2013-10-01"
price_price :"99"
price_productid : 1
price_datestart :"2013-12-01"
price_price :"75"
If I try this :
SELECT price_productid,price_datestart,price_price,product_text,product_id
WHERE price_datestart > now()
GROUP BY price_productid,price_datestart,price_price,product_text,product_id
ORDER BY price_datestart ASC
It will give me a result, but two rows and I need one.

Use distinct on syntax. If you want current price:
select distinct on (p.productid)
p.productid, pr.product_text, p.price, p.datestart
from Price as p
left outer join Product as pr on pr.productid = p.productid
where p.datestart <= now()
order by p.productid, p.datestart desc
sql fiddle demo

You have a few problems, but GROUP BY is not one of them.
First, although you have a datestart you don't have a dateend. I'd change datestart to be a daterange, for example:
CREATE TABLE product
(
product_id int
,product_text text
);
CREATE TABLE price
(
price_productid int
,price_daterange TSRANGE
,price_price NUMERIC(10,2)
);
The TSRANGE allows you to set up validity of your price over a given range, for example:
INSERT INTO product VALUES(1, 'phone');
INSERT INTO price VALUES(1, '[2013-08-01 00:00:00,2013-10-01 00:00:00)', 199);
INSERT INTO price VALUES(1, '[2013-10-01 00:00:00,2013-12-01 00:00:00)', 99);
INSERT INTO price VALUES(1, '[2013-12-01 00:00:00,)', 75);
And that makes your SELECT much more simple, for example:
SELECT price_productid,price_daterange,price_price,product_text,product_id
FROM product, price
WHERE price_daterange #> now()::timestamp
AND product_id = price_productid
This also has the benefit of allowing you to query for any arbitrary time by swapping out now() for another date.
You should read up on ranges in PostgresQL as they are very powerful. The example above is not complete in that it should also have indices on price_daterange to ensure that you do not have overlaps for any product.
SQL fiddle with above solution

Related

Compare a single-column row-set with another single-column row set in Oracle SQL

Is there any Oracle SQL operator or function, which compares 2 result sets whether they are the exact same or not. Currently my idea is to use MINUS operator in both directions, but I am looking for a better and performanter solution to achieve. The one result set is fixed (see below), the other depends on the records.
Very important: I am not allowed to change the schema and structure. So CREATE TABLE and CREATE TYPE etc. are not allowed here for me. Also important that oracle11g version is used where the solution must be found.
The shema for SQL Fiddle is:
CREATE TABLE DETAILS (ID INT, MAIN_ID INT, VALUE INT);
INSERT INTO DETAILS VALUES (1,1,1);
INSERT INTO DETAILS VALUES (2,1,2);
INSERT INTO DETAILS VALUES (3,1,3);
INSERT INTO DETAILS VALUES (4,1,4);
INSERT INTO DETAILS VALUES (5,2,1);
INSERT INTO DETAILS VALUES (6,2,2);
INSERT INTO DETAILS VALUES (7,3,1);
INSERT INTO DETAILS VALUES (7,3,2);
Now this is my SQL query for doing the job well (selects MAIN_IDs of those, whose 'VALUE's are exactly the same as the given lists'):
SELECT DISTINCT D.MAIN_ID FROM DETAILS D WHERE NOT EXISTS
(SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID
MINUS
SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2)))
AND NOT EXISTS
(SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2))
MINUS
SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID)
The SQL Fiddle link: http://sqlfiddle.com/#!4/25dde/7/0
If you use a collection (rather than a VARRAY) then you can aggregate the values into a collection and directly compare two collections:
CREATE TYPE int_list AS TABLE OF INT;
Then:
SELECT main_id
FROM details
GROUP BY main_id
HAVING CAST( COLLECT( value ) AS int_list ) = int_list( 1, 2 );
Outputs:
| MAIN_ID |
| ------: |
| 2 |
| 3 |
db<>fiddle here
Update
Based on your expanded fiddle in comments, you can use:
SELECT B.ID
FROM BUSINESS_DATA B
INNER JOIN BUSINESS_NAME N
ON ( B.NAME_ID=N.ID )
WHERE N.NAME='B1'
AND EXISTS (
SELECT business_id
FROM ORDERS O
LEFT OUTER JOIN TABLE(
SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' )
) d
ON ( o.orderdate = d.COLUMN_VALUE )
WHERE O.BUSINESS_ID=B.ID
GROUP BY business_id
HAVING COUNT( CASE WHEN d.COLUMN_VALUE IS NULL THEN 1 END ) = 0
AND COUNT( DISTINCT o.orderdate )
= ( SELECT COUNT(DISTINCT COLUMN_VALUE) FROM TABLE( SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' ) ) )
)
(Note: Do not implicitly create dates from strings; it will cause the query to fail, without there being any changes to the query text, if a user changes their NLS_DATE_FORMAT session parameter. Instead use TO_DATE with an appropriate format model or a DATE literal.)
db<>fiddle here

How merge two tables and average it (hourly vs daily tables)

I have the following tables:
CREATE TABLE a (DATE TEXT, PRICE INTEGER)
INSERT INTO a VALUES
("2019-04-27", 10), ("2019-04-29",20), ("2019-04-30",30), ("2019-05-01",40);
CREATE TABLE b (DATE TEXT, PRICE INTEGER)
INSERT INTO b VALUES
("2019-04-27 01:00", 1), ("2019-04-27 02:30)",3), ("2019-04-27 18:00",2),
("2019-04-28 17:00",2), ("2019-04-28 21:00",5),
("2019-04-29 17:00",50), ("2019-04-29 21:00",10),
("2019-04-30 17:00",10), ("2019-04-30 21:00",20),
("2019-05-01 17:00",40), ("2019-05-01 21:00",10),
("2019-05-02 17:00",10), ("2019-05-02 21:00",6);
I need to merge this two tables, so that Table b is averaged to daily and table has 2 columns (1 is date (all dates are necessary to be there) and 2 is Price (Null if no observations for that date). I tried several left joins , however do not know how to tackle the problem that I cannot average hourly data to the daily.
Could you help?
Please, execute query as per below SQL-Fiddle:
select DATE(c.date) as date, avg(c.price) as avg_price
from
(select date, price
from a
union all
select date, price
from b
) as c
group by DATE(c.date);
I suspect that you want a result set with two columns. I'm not a fan of having the date be in a string datatype, but you can use string functions for what you want:
select date, sum(price_a) as price_a, sum(price_b) as price_b
from (select a.date, a.price as price_a, null as price_b
from a
union all
select substr(b.date, 1, 10), null, price
from b
) ab
group by date;

Rotate rows into columns with column names not coming from the row

I've looked at some answers but none of them seem to be applicable to me.
Basically I have this result set:
RowNo | Id | OrderNo |
1 101 1
2 101 10
I just want to convert this to
| Id | OrderNo_0 | OrderNo_1 |
101 1 10
I know I should probably use PIVOT. But the syntax is just not clear to me.
The order numbers are always two. To make things clearer
And if you want to use PIVOT then the following works with the data provided:
declare #Orders table (RowNo int, Id int, OrderNo int)
insert into #Orders (RowNo, Id, OrderNo)
select 1, 101, 1 union all select 2, 101, 10
select Id, [1] OrderNo_0, [2] OrderNo_1
from (
select RowNo, Id, OrderNo
from #Orders
) SourceTable
pivot (
sum(OrderNo)
for RowNo in ([1],[2])
) as PivotTable
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Note: To build each row in the result set the pivot function is grouping by the columns not begin pivoted. Therefore you need an aggregate function on the column that is being pivoted. You won't notice it in this instance because you have unique rows to start with - but if you had multiple rows with the RowNo and Id you would then find the aggregation comes into play.
As you say there are only ever two order numbers per ID, you could join the results set to itself on the ID column. For the purposes of the example below, I'm assuming your results set is merely selecting from a single Orders table, but it should be easy enough to replace this with your existing query.
SELECT o1.ID, o1.OrderNo AS [OrderNo_0], o2.OrderNo AS [OrderNo_1]
FROM Orders AS o1
INNER JOIN Orders AS o2
ON (o1.ID = o2.ID AND o1.OrderNo <> o2.OrderNo)
From your sample data, simplest you can try to use min and MAX function.
SELECT Id,min(OrderNo) OrderNo_0,MAX(OrderNo) OrderNo_1
FROM T
GROUP BY Id

Unique constraint in Postgres based on last non-null value

I have a table that looks like the following: create table prices_history (id0 serial primary key, product_id0 int, time_added timestamptz, listed_price numeric)
I would like to only insert a new price for a particular product_id0 into the table when max(time_added) of that product_id0 is distinct from the price I'm about to insert. Currently, I'm doing this through the following query, assuming that I want to insert a price of 9.50 for the product with id 101:
insert into prices_history (product_id0, time_added, price)
(
select 101, NOW(), 9.50 where not exists (
select * from (
select distinct on (product_id0) * from prices_history order by product_id0, time_added desc
) x where product_id0=101 and listed_price=9.50
)
) returning id0
Is there a better query to solve this problem?
I'm using Postgres v9.6.8 on Ubuntu 16.04 LTS.
Instead of using WHERE NOT EXISTS, I found that a more sustainable solution to do bulk inserts was LEFT JOIN..WHERE NULL. This involves doing a left join with the data I wanted to insert, choosing the data where there was no matching data in the old table. In the following example, say I had the following pricing data (represented as JSON):
[{price: 11.99, product_id0:2},
{price: 10.50, product_id0:3},
{price: 10.00, product_id0:4}]
The following query, would insert a subset of this data if any of it is new information:
insert into prices_history (product_id0, time_added, price)
(
select new_product_id0, new_time_added, new_price from
(select unnest(array[11.99, 10.50, 10.00]) as new_price, unnest(array[2,3,4]) as new_product_id0, now() as new_time_added) new_prices left join
(select distinct on (product_id0) * from prices_history order by product_id0, time_added desc) old_prices
on old_prices.product_id0 = new_prices.new_product_id0 and old_prices.listed_price= new_prices.new_price
where old_prices.product_id0 is null and old_prices.listed_price is null
) returning id0
This new query seems to work well in the current deployment.

Product price comparison in sql

I have a table looks like given below query, I add products price in this table daily, with different sellers name :
create table Product_Price
(
id int,
dt date,
SellerName varchar(20),
Product varchar(10),
Price money
)
insert into Product_Price values (1, '2012-01-16','Sears','AA', 32)
insert into Product_Price values (2, '2012-01-16','Amazon', 'AA', 40)
insert into Product_Price values (3, '2012-01-16','eBay','AA', 27)
insert into Product_Price values (4, '2012-01-17','Sears','BC', 33.2)
insert into Product_Price values (5, '2012-01-17','Amazon', 'BC',30)
insert into Product_Price values (6, '2012-01-17','eBay', 'BC',51.4)
insert into Product_Price values (7, '2012-01-18','Sears','DE', 13.5)
insert into Product_Price values (8, '2012-01-18','Amazon','DE', 11.1)
insert into Product_Price values (9, '2012-01-18', 'eBay','DE', 9.4)
I want result like this for n number of sellers(As more sellers added in table)
DT PRODUCT Sears[My Site] Amazon Ebay Lowest Price
1/16/2012 AA 32 40 27 Ebay
1/17/2012 BC 33.2 30 51.4 Amazon
1/18/2012 DE 7.5 11.1 9.4 Sears
I think this is what you're looking for.
SQLFiddle
It's kind of ugly, but here's a little breakdown.
This block allows you to get a dynamic list of your values. (Can't remember who I stole this from, but it's awesome. Without this, pivot really isn't any better than a big giant case statement approach to this.)
DECLARE #cols AS VARCHAR(MAX)
DECLARE #query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' +
QUOTENAME(SellerName)
FROM Product_Price
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
, 1, 1, '')
Your #cols variable comes out like so:
[Amazon],[eBay],[Sears]
Then you need to build a string of your entire query:
select #query =
'select piv1.*, tt.sellername from (
select *
from
(select dt, product, SellerName, sum(price) as price from product_price group by dt, product, SellerName) t1
pivot (sum(price) for SellerName in (' + #cols + '))as bob
) piv1
inner join
(select t2.dt,t2.sellername,t1.min_price from
(select dt, min(price) as min_price from product_price group by dt) t1
inner join (select dt,sellername, sum(price) as price from product_price group by dt,sellername) t2 on t1.min_price = t2.price) tt
on piv1.dt = tt.dt
'
The piv1 derived table gets you the pivoted values. The cleverly named tt derived table gets you the seller who has the minimum sales for each day.
(Told you it was kind of ugly.)
And finally, you run your query:
execute(#query)
And you get:
DT PRODUCT AMAZON EBAY SEARS SELLERNAME
2012-01-16 AA 40 27 32 eBay
2012-01-17 BC 30 51.4 33.2 Amazon
2012-01-18 DE 11.1 9.4 13.5 eBay
(sorry, can't make that bit line up).
I would think that if you have a reporting tool that can do crosstabs, this would be a heck of a lot easier to do there.
The problem is this requirement:
I want result like this for n number of sellers
If you have a fixed, known number of columns for your results, there are several techniques to PIVOT your data. But if the number of columns is not known, you're in trouble. The SQL language really wants you to be able to describe the exact nature of the result set for the select list in terms of the number and types of columns up front.
It sounds like you can't do that. This leaves you with two options:
Query the data to know how many stores you have and their names, and then use that information to build a dynamic sql statement.
(Preferred option) Perform the pivot in client code.
This is something that would probably work well with a PIVOT. Microsoft's docs are actually pretty useful on PIVOT and UNPIVOT.
http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
Basically it allows you to pick a column, in your case SellerName, and pivot that out so that the elements of the column themselves become columns in the new result. The values that go in the new "Ebay", "Amazon", etc. columns would be an aggregate that you choose - in this case the MAX or MIN or AVG of the price.
For the final "Lowest Price" column you'd likely be best served by doing a subquery in your main query which finds the lowest value per product/date and then joining that back in to get the SellerName. Something like:
SELECT
Product_Price.Date
,Product_Price.Product
,Product_Price.MinimumSellerName
FROM
(SELECT
MIN(Price) AS min_price
,Product
,Date
FROM Product_Price
GROUP BY
Product
,Date) min_price
INNER JOIN Product_Price
ON min_price.Product = Product_Price.Product
AND min_price.Date = Product_Price.Date
Then just put the pivot around that and include the MinimumSellerName columnm, just like you include date and product.