Hi I have a problem i am working on for a while now , let say i have a view lets call it room_price looking like that :
room | people | price | hotel
1 | 1 | 200 | A
2 | 2 | 99 | A
3 | 3 | 95 | A
4 | 1 | 90 | B
5 | 6 | 300 | B
i am looking for the lowest price in given hotel for x amount of people
for 1 i would expect i will have :
hotel | price
A | 200
B | 90
for 2 i would have :
hotel | price
A | 99
it is because hotel B have no rooms that can exactly fit 2 persons. 6 can not be used for less (or more) than 6 people.
for hotel A price is 99 it is because i use room 2
for 6 result should be :
hotel | price
A | 394
B | 300
so for hotel A i take rooms 1,2,3 and for hotel B lowest price would be for one room 5 for 300
I did it with restriction that i will be able to fit people max in to 3 rooms and that is acceptable but my query is to slow :( it looks something like that :
select a.hotel,a.price+a1.price+a2.price
from room_price a, room_price a1, room_price a2
where
a.room<> a1.room
and a1.room<> a2.room
and a.room<> a2.room
and a.hotel = a1.hotel
and a.hotel = a2.hotel
after that i made a grup by hotel and took min(price) and it worked ... but executing 3 times query that gets me room_price and than Cartesian product of that took to much time. There are around 5000 elements in room_price and it is a rather complicated sql which generates this data (takes dates start end multiple prices, currency exchange...)
I can use sql, custom functions ... or anything that will make this work fast , but i would prefer to stay on database level without need to process this data in application (i am using java) as i will be extending this further on to add some additional data to the query.
I would be grateful for any help .
Query itself:
WITH RECURSIVE
setup as (
SELECT 3::INT4 as people
),
room_sets AS (
SELECT
n.hotel,
array[ n.room ] as rooms,
n.price,
n.people
FROM
setup s,
room_price n
WHERE
n.people <= s.people
UNION ALL
SELECT
rs.hotel,
rs.rooms || n.room,
rs.price + n.price as price,
rs.people + n.people as people
FROM
setup s,
room_sets rs
join room_price n using (hotel)
WHERE
n.room > rs.rooms[ array_upper( rs.rooms, 1 )]
AND rs.people + n.people <= s.people
),
results AS (
SELECT
DISTINCT ON (rs.hotel)
rs.*
FROM
room_sets rs,
setup s
WHERE
rs.people = s.people
ORDER BY
rs.hotel, rs.price
)
SELECT * FROM results;
Tested it on this dataset:
CREATE TABLE room_price (
room INT4 NOT NULL,
people INT4 NOT NULL,
price INT4 NOT NULL,
hotel TEXT NOT NULL,
PRIMARY KEY (hotel, room)
);
copy room_price FROM stdin WITH DELIMITER ',';
1,1,200,A
2,2,99,A
3,3,95,A
4,1,90,B
5,6,300,B
\.
Please note that it will become much slower when you'll add more rooms to your base.
Ah, to customize for how many people you want results - change the setup part.
Wrote detailed explanation on how it works.
It looks like your query as typed is incorrect with the FROM clause... it looks like aliases are out of whack
from room_price a, room_price,a1 room_price,room_price a2
and should be
from room_price a, room_price a1, room_price a2
That MIGHT be giving the query a false alias / extra table giving some sort of Cartesian product making it hang....
--- ok on the FROM clause...
Additionally, and just a thought... Since the "Room" appears to be an internal auto-increment ID column, it will never be duplicated, such as Room 100 in hotel A and Room 100 in hotel B. Your query to do <> on the room make sense so you are never comparing across the board on all 3 tables...
Why not force the a1 and a2 joins to only qualify for room GREATER than "a" room. Otherwise you'll be re-testing the same conditions over and over. From your example data, just on hotel A, you have room IDs of 1, 2 and 3. You are thus comparing
a a1 a2
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
Would it help to only compare where "a1" is always greater than "a" and "a2" is always greater than "a1" thus doing tests of
a a1 a2
1 2 3
would give the same results as all the rest, and thus bloat your result down to one record in this case... but then, how can you really compare against a location of only TWO room types "hotel B". You would NEVER get an answer since your qualification for rooms is
a <> a1 AND
a <> a2 AND
a1 <> a2
You may want to try cutting down to only a single self-join for a1, a2 and keep the compare only to the two, such as
select a1.hotel, a1.price + a2.price
from room_price a1, room_price a2
where a1.hotel = a2.hotel
and a2.room > a1.room
For hotel "A", you would thus have final result comparisons of
a1 a2
1 2
1 3
2 3
and for hotel "B"
a1 a2
4 5
The implementation of <> is a going to have a rather large impact when you start to look at larger data sets. Especially if the prior filtering doesn't drastically reduce its size. By using this you may potentially negate the possiblity of the direct query being optimised and implementing indexing but also the view may not implement indexing because SQL will attempt to run the filters for the query and the view against the tables in as few statements as possible (pending optimisations done by the engine).
I would ideally start with the view and confirm it's properly optimised. Just looking at the query itself this has a better chance of being optimised;
SELECT
a.hotel, a.price + a1.price + a2.price
FROM
room_price a,
room_price,
room_price a1,
room_price a2
WHERE
(a.room > a1.room OR a.room < a1.room) AND
(a1.room > a2.room OR a1.room < a2.room) AND
(a.room > a2.room OR a.room < a2.room) AND
a.hotel = a1.hotel AND
a.hotel = a2.hotel
It appears to return the same results, but I'm not sure how you implement this query in your overall solution. So consider just the nature of the changes to the existing query and what you have done already.
Hopefully that helps. If not you might need to consider what the view is doing and how it's working a view that returns results from a temp table or variable can't implement indexing either. In that case maybe generating an indexed temp table would be better for you.
Related
I'm working in a SQL Server database. I have a table with a 5 character alphanumeric field which will always be 5 characters. It will always be 5 characters and there will never be special characters. This table has roughly 100K rows.
I have another table with a string field that may or may not contains these characters. This table currently has roughly 2500 possible formats. But those can be both added to and modified. Unfortunately, I don't have access to the data used to determine what should be in the field.
Table1.Model
A1234
B1234
A6485
16849
A4665
99999
Table2.StringField
I have purchased model number A1234 after returning B6485
I have purchased model number 16849 after we thought about 99999
I have purchased model number B1234 before also looking at A1234
I returned A4665 and never purchased anything else
I have no money and don’t buy anything
I am looking to scrape the model numbers from these. I am currently using a case statement which accounts for basically 20 of the possible formats. I add on to the case statement as I find new scenarios that might appear in my data.
pseudo code:
Case when stringfield like 'I have purchased model number%return%'
Then substring(stringfield,30,5) as Model1 and substring(stringfield,52,5) as Model2
When stringfield like 'I have purchased model number%'
Then substring(stringfield,30,5) as Model1 and substring(stringfield,59,5) as Model2
When stringfield like 'I returned%'
Then substring(stringfield,11,5) as Model1 and 'N/A' as Model2
Else 'N/A' as Model1 and 'N/A' as Model2
END
Expected results:
I have purchased model number A1234 after returning B6485
Model1 = A1234 Model2=B6485
I have purchased model number 16849 after we thought about 99999
Model1 = 16849 Model2=99999
I have purchased model number B1234 before also looking at A1234
Model1 = B1234 Model2=A1234
I returned A4665 and never purchased anything else
Model1 = A4665 Model2=N/A
I have no money and don’t buy anything
Model1 = N/A Model2=N/A
I am putting the various scenarios into a reference table so that I can just update that as needed.
Is there a better way to do this? It's not a huge deal to just keep an eye on things and make updates as necessary. But it's just one more item on my list of things that needs to be maintained.
Thanks in advance.
One thing that I forgot to mention is that there is sometimes another substring of the field that is like A14351835410571982 - and I don't want anything from that string.
The things that I've thought about trying are:
Crossjoin from Table1 to itself and then saying
If stringfield like '%value1%value2%' then value1 and value2.
But that is 100k x 100k combinations which seems prohibitively large.
Searching stringfield for anything that's 5 characters long followed by a space or a period or a comma that's either all numbers or a single letter and 4 numbers and then somehow getting the first string and the second string in that order.
A combination of the first two: Identify all 5 character strings in all records then crossjoin them and match with wildcards. This would probably be about 20k values instead of 100k
Continuing down the path that I'm currently on and just do it with brute force
** Note: I am a report analyst, not a developer, so I know enough SQL to be dangerous. I can typically follow along with up to mid-complexity SQL but might need help with anything above that.
Here an example of how a combination of string_split and cross apply can get the models from the strings.
create table Models (
code char(5) primary key,
name varchar(30) not null
);
insert into Models (code, name) values
('A1234','Model A4')
, ('B6485','Model B5')
, ('16849','Model 49')
, ('99999','Model Five9')
, ('B1234','Model B4')
, ('A4665','Model A5')
;
create table Comments (
id int identity(1,1) primary key,
comment varchar(max)
);
insert into Comments (comment) values
('I have purchased model number A1234 after returning B6485')
, ('I have purchased model number 16849 after we thought about 99999')
, ('I have purchased model number B1234 before also looking at A1234')
, ('I returned A4665 and never purchased anything')
, ('I have no money and don’t buy anything')
;
create table #tmpCommentCodes (
id int identity(1,1) primary key,
comment_id int,
model_code varchar(max)
);
insert into #tmpCommentCodes (comment_id, model_code)
select c.id, ca.code
from Comments c
cross apply (
select value as code
from string_split(c.comment, ' ') spl
where value COLLATE Latin1_General_BIN like '[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]'
) ca;
select tmp.*, model.name
from #tmpCommentCodes as tmp
left join Models as model
on model.code = tmp.model_code
id | comment_id | model_code | name
-: | ---------: | :--------- | :----------
1 | 1 | A1234 | Model A4
2 | 1 | B6485 | Model B5
3 | 2 | 16849 | Model 49
4 | 2 | 99999 | Model Five9
5 | 3 | B1234 | Model B4
6 | 3 | A1234 | Model A4
7 | 4 | A4665 | Model A5
Then the temporary table can be used to replace the codes.
For example:
WITH RCTE_COMMENTS AS
(
SELECT TOP 1 WITH TIES
c.id AS comment_id
, 0 AS lvl
, tmp.id AS tmpId
, REPLACE(c.comment, CONCAT(' ', tmp.model_code), CONCAT(' ', model.name)) AS comment
FROM Comments AS c
LEFT JOIN #tmpCommentCodes AS tmp
ON tmp.comment_id = c.id
LEFT JOIN Models AS model
ON model.code = tmp.model_code
ORDER BY ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY tmp.id)
UNION ALL
SELECT c.comment_id
, c.lvl+1
, tmp.id
, REPLACE(c.comment, ' '+tmp.model_code, ' '+model.name)
FROM RCTE_COMMENTS AS c
JOIN #tmpCommentCodes AS tmp
ON tmp.comment_id = c.comment_id
AND tmp.id = c.tmpId + 1
JOIN Models AS model
ON model.code = tmp.model_code
)
SELECT TOP 1 WITH TIES comment_id
, REPLACE(comment, 'model number ', '') AS comment
FROM RCTE_COMMENTS
ORDER BY ROW_NUMBER() OVER (PARTITION BY comment_id ORDER BY tmpId DESC)
comment_id | comment
---------: | :-----------------------------------------------------------
1 | I have purchased Model A4 after returning Model B5
2 | I have purchased Model 49 after we thought about Model Five9
3 | I have purchased Model B4 before also looking at Model A4
4 | I returned Model A5 and never purchased anything
5 | I have no money and don’t buy anything
db<>fiddle here
Say I have the following data:
Passes
ID | Pass_code
-----------------
100 | 2xBronze
101 | 1xGold
102 | 1xSilver
103 | 2xSteel
Passengers
ID | Passengers
-----------------
100 | 2
101 | 5
102 | 1
103 | 3
I want to count then create a ticket in the output of:
ID 100 | 2 pass (bronze)
ID 101 | 5 pass (because it is gold, we count all passengers)
ID 102 | 1 pass (silver)
ID 103 | 2 pass (steel)
I was thinking something like the code below however, I am unsure how to finish my case statement. I want to substring pass_code so that we get show pass numbers e.g '2xBronze' should give me 2. Then for ID 103, we have 2 passes and 3 customers so we should output 2.
Also, is there a way to firstly find '2xbronze' if the pass_code contained lots of other things such as '101001, 1xbronze, FirstClass' - this may change so i don't want to substring, could we search for '2xbronze' and then pull out the 2??
SELECT
CASE
WHEN Passes.pass_code like '%gold%' THEN Passengers.passengers
WHEN Passes.pass_code like '%steel%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%bronze%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%silver%' THEN SUBSTRING(passes.pass_code, 1,1)
else 0 end as no,
Passes.ID,
Passes.Pass_code,
Passengers.Passengers
FROM Passes
JOIN Passengers ON Passes.ID = Passengers.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=db698e8562546ae7658270e0ec26ca54
So assuming you are indeed using Oracle (as your DB fiddle implies).
You can do some string magic with finding position of a splitter character (in your case the x), then substringing based on that. Obviously this has it's problems, and x is a bad character seperator as well.. but based on your current set.
WITH PASSCODESPLIT AS
(
SELECT PASSES.ID,
TO_Number(SUBSTR(PASSES.PASS_CODE, 0, (INSTR(PASSES.PASS_CODE, 'x')) - 1)) AS NrOfPasses,
SUBSTR(PASSES.PASS_CODE, (INSTR(PASSES.PASS_CODE, 'x')) + 1) AS PassType
FROM Passes
)
SELECT
PASSCODESPLIT.ID,
CASE
WHEN PASSCODESPLIT.PassType = 'gold' THEN Passengers.Passengers
ELSE PASSCODESPLIT.NrOfPasses
END AS NrOfPasses,
PASSCODESPLIT.PassType,
Passengers.Passengers
FROM PASSCODESPLIT
INNER JOIN Passengers ON PASSCODESPLIT.ID = Passengers.ID
ORDER BY PASSCODESPLIT.ID ASC
Gives the result of:
ID NROFPASSES PASSTYPE PASSENGERS
100 2 bronze 2
101 5 gold 5
102 1 silver 1
103 2 steel 3
As can also be seen in this fiddle
But I would strongly advise you to fix your table design. Having multiple attributes in the same column leads to troubles like these. And the more variables/variations you start storing, the more 'magic' you need to keep doing.
In this particular example i see no reason why you don't simply have the 3 columns in Passes, also giving you the opportunity to add new columns going forward. I.e. to keep track of First class.
You can extract the numbers using regexp_substr(). So I think this does what you want:
SELECT (CASE WHEN p.pass_code LIKE '%gold%'
THEN TO_NUMBER(REGEXP_SUBSTR(p.pass_code, '^[0-9]+'))
ELSE pp.passengers
END) as num,
p.ID, p.Pass_code, pp.Passengers
FROM Passes p JOIN
Passengers pp
ON p.ID = pp.ID;
Here is a db<>fiddle.
This converts the leading digits in the code to a number. Also note the use of table aliases to simplify the query.
I take a Database course in which we have listings of AirBnBs and need to be able to do some SQL queries in the Relationship-Model we made from the data, but I struggle with one in particular :
I have two tables that we are interested in, Billing and Amenities. The first one have the id and price of listings, the second have id and wifi (let's say, to simplify, that it equals 1 if there is Wifi, 0 otherwise). Both have other attributes that we don't really care about here.
So the query is, "What is the difference in the average price of listings with and without Wifi ?"
My idea was to build to JOIN-tables, one with listings that have wifi, the other without, and compare them easily :
SELECT avg(B.price - A.price) as averagePrice
FROM (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 0
) A, (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 1) B
WHERE A.id = B.id;
Obviously this doesn't work... I am pretty sure that there is a far easier solution to it tho, what do I miss ?
(And by the way, is there a way to compute the absolute between the difference of price ?)
I hope that I was clear enough, thank you for your time !
Edit : As mentionned in the comments, forgot to say that, but both tables have idas their primary key, so that there is one row per listing.
Just use conditional aggregation:
SELECT AVG(CASE WHEN a.wifi = 0 THEN b.price END) as avg_no_wifi,
AVG(CASE WHEN a.wifi = 1 THEN b.price END) as avg_wifi
FROM Billing b JOIN
Amenities a
ON b.id = a.id
WHERE a.wifi IN (0, 1);
You can use a - if you want the difference instead of the specific values.
Let's assume we're working with data like the following (problems with your data model are noted below):
Billing
+------------+---------+
| listing_id | price |
+------------+---------+
| 1 | 1500.00 |
| 2 | 1700.00 |
| 3 | 1800.00 |
| 4 | 1900.00 |
+------------+---------+
Amenities
+------------+------+
| listing_id | wifi |
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
+------------+------+
Notice that I changed "id" to "listing_id" to make it clear what it was (using "id" as an attribute name is problematic anyways). Also, note that one listing doesn't have an entry in the Amenities table. Depending on your data, that may or may not be a concern (again, refer to the bottom for a discussion of your data model).
Based on this data, your averages should be as follows:
Listings with wifi average $1600 (Listings 1 and 2)
Listings without wifi (just 3) average 1800).
So the difference would be $200.
To achieve this result in SQL, it may be helpful to first get the average cost per amenity (whether wifi is offered). This would be obtained with the following query:
SELECT
Amenities.wifi AS has_wifi,
AVG(Billing.price) AS avg_cost
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
which gives you the following results:
+----------+-----------------------+
| has_wifi | avg_cost |
+----------+-----------------------+
| 0 | 1800.0000000000000000 |
| 1 | 1600.0000000000000000 |
+----------+-----------------------+
So far so good. So now we need to calculate the difference between these 2 rows. There are a number of different ways to do this, but one is to use a CASE expression to make one of the values negative, and then simply take the SUM of the result (note that I'm using a CTE, but you can also use a sub-query):
WITH
avg_by_wifi(has_wifi, avg_cost) AS
(
SELECT Amenities.wifi, AVG(Billing.price)
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
)
SELECT
ABS(SUM
(
CASE
WHEN has_wifi = 1 THEN avg_cost
ELSE -1 * avg_cost
END
))
FROM avg_by_wifi
which gives us the expected value of 200.
Now regarding your data model:
If both your Billing and Amenities table only have 1 row for each listing, it makes sense to combine them into 1 table. For example: Listings(listing_id, price, wifi)
However, this is still problematic, because you probably have a bunch of other amenities you want to model (pool, sauna, etc.) So you might want to model a many-to-many relationship between listings and amenities using an intermediate table:
Listings(listing_id, price)
Amenities(amenity_id, amenity_name)
ListingsAmenities(listing_id, amenity_id)
This way, you could list multiple amenities for a given listing without having to add additional columns. It also becomes easy to store additional information about an amenity: What's the wifi password? How deep is the pool? etc.
Of course, using this model makes your original query (difference in average cost of listings by wifi) a bit tricker, but definitely still doable.
I have 3 tables (see below), Table A describes a product, Table B holds inventory information for different dates, and Table C holds the price of each product for different dates.
Table A
------------------
product_id product_name
1 book
2 pencil
3 stapler
... ...
Table B
------------------
product_id date_id quantity
1 2012-12-01 100
1 2012-12-02 110
1 2012-12-03 90
2 2012-12-01 98
2 2012-12-02 50
... ... ...
Table C
-------------------
product_id date_id price
1 2012-12-01 10.29
1 2012-12-02 12.12
2 2012-12-02 32.98
3 2012-12-01 10.12
In many parts of my java application I would like to know what the dollar-value of each of the product is so I end up doing the following query
select
a.product_name,
b.date_id,
b.quantity * c.price as total
from A a
join B b on a.product_id = b.product_id
join C c on a.product_id = c.product_id and b.date_id = c.date_id
where b.date_id = ${date_input}
I had an idea today that I could make the query above be a view (minus the date condition), then query the view for a specific date so my queries would look like
select * from view where date_id = ${date_input}
I'm not sure where the appropriate level of abstraction for such logic is. Should it be in java code (read from a pref file), or encoded into a view in the database?
The only reason I don't want to put it as a view is that as time goes by the join will become expensive as there will be more and more dates to cover, and I'm usually only interested in the past month's worth of data. Perhaps a stored proc is better? Would that be a good place to abstract this logic?
If views are implemented correctly you should never see worst performance in a case like this where the query would be the same without the view. More dates will not affect the performance because you have this view.
Make the view, it is the correct abstraction in this case.
I have the following data from 2 tables Notes (left) and scans (right) :
Imagine the picker and packers were all varying, like you can have JOHN, JANE etc.
I need a query that outputs like so :
On a given date range :
Name - Picked (units) - Packed (units)
MASI - 15 - 21
JOHN - 21 - 32
etc.
I can't figure out how to even start this, any tips will be helpful thanks.
Without a "worker" take that lists each Picker/Packer individually, I think you'd need something like this...
SELECT
CASE WHEN action.name = 'Picker' THEN scans.Picker ELSE scans.Packer END AS worker,
SUM(CASE WHEN action.name = 'Picker' THEN notes.Units ELSE 0 END) AS PickedUnits,
SUM(CASE WHEN action.name = 'Packer' THEN notes.Units ELSE 0 END) AS PackedUnits
FROM
notes
INNER JOIN
scans
ON scans.PickNote = notes.Number
CROSS JOIN
(
SELECT 'Picker' AS name
UNION ALL SELECT 'Packer' AS name
)
AS action
GROUP BY
CASE WHEN action.name = 'Picker' THEN scans.Picker ELSE scans.Packer END
(This is actually just an algebraic re-arrangement of the answer that #RaphaëlAlthaus posted at the same time as me. Both use UNION to work out the Picker values and the Packer values separately. If you have separate indexes on scans.Picker and scans.Packer then I would expect mine MAY be slowest. If you don't have those two indexes then I would expect mine to be fastest. I recommend creating the indexes and testing on a realtisic data set.)
EDIT
Actually, what I would recommend is a change to scans table completely; normalise it.
Your de-normalised set has one row per PickNote, with fields picker and packer.
A normalised set would have two rows per PickNote with fields role and worker.
id | PickNote | Role | Worker
------+----------+------+--------
01 | PK162675 | Pick | MASI
02 | PK162675 | Pack | MASI
03 | PK162676 | Pick | FRED
04 | PK162676 | Pack | JOHN
This allows you to create simple indexes and simple queries.
You may initially baulk at the extra unecessary rows, but it will yield simpler queries, faster queries, better maintainability, increased flexibility, etc, etc.
In short, this normalisation may cost a little extra space, but it pays back dividends forever.
SELECT name, SUM(nbPicked) Picked, SUM(nbPacked) Packed
FROM
(SELECT n.Picker name, SUM(n.Units) nbPicked, 0 nbPacked
FROM Notes n
INNER JOIN scans s ON s.PickNote = n.Number
--WHERE s.ProcessedOn BETWEEN x and y
GROUP BY n.Picker
UNION ALL
SELECT n.Packer name, 0 nbPicked, SUM(n.Units) nbPacked
FROM Notes n
INNER JOIN scans s ON s.PickNote= n.Number
--WHERE s.ProcessedOn BETWEEN x and y
GROUP BY n.Packer)
GROUP BY name;