I have 3 tables (see below), Table A describes a product, Table B holds inventory information for different dates, and Table C holds the price of each product for different dates.
Table A
------------------
product_id product_name
1 book
2 pencil
3 stapler
... ...
Table B
------------------
product_id date_id quantity
1 2012-12-01 100
1 2012-12-02 110
1 2012-12-03 90
2 2012-12-01 98
2 2012-12-02 50
... ... ...
Table C
-------------------
product_id date_id price
1 2012-12-01 10.29
1 2012-12-02 12.12
2 2012-12-02 32.98
3 2012-12-01 10.12
In many parts of my java application I would like to know what the dollar-value of each of the product is so I end up doing the following query
select
a.product_name,
b.date_id,
b.quantity * c.price as total
from A a
join B b on a.product_id = b.product_id
join C c on a.product_id = c.product_id and b.date_id = c.date_id
where b.date_id = ${date_input}
I had an idea today that I could make the query above be a view (minus the date condition), then query the view for a specific date so my queries would look like
select * from view where date_id = ${date_input}
I'm not sure where the appropriate level of abstraction for such logic is. Should it be in java code (read from a pref file), or encoded into a view in the database?
The only reason I don't want to put it as a view is that as time goes by the join will become expensive as there will be more and more dates to cover, and I'm usually only interested in the past month's worth of data. Perhaps a stored proc is better? Would that be a good place to abstract this logic?
If views are implemented correctly you should never see worst performance in a case like this where the query would be the same without the view. More dates will not affect the performance because you have this view.
Make the view, it is the correct abstraction in this case.
Related
I take a Database course in which we have listings of AirBnBs and need to be able to do some SQL queries in the Relationship-Model we made from the data, but I struggle with one in particular :
I have two tables that we are interested in, Billing and Amenities. The first one have the id and price of listings, the second have id and wifi (let's say, to simplify, that it equals 1 if there is Wifi, 0 otherwise). Both have other attributes that we don't really care about here.
So the query is, "What is the difference in the average price of listings with and without Wifi ?"
My idea was to build to JOIN-tables, one with listings that have wifi, the other without, and compare them easily :
SELECT avg(B.price - A.price) as averagePrice
FROM (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 0
) A, (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 1) B
WHERE A.id = B.id;
Obviously this doesn't work... I am pretty sure that there is a far easier solution to it tho, what do I miss ?
(And by the way, is there a way to compute the absolute between the difference of price ?)
I hope that I was clear enough, thank you for your time !
Edit : As mentionned in the comments, forgot to say that, but both tables have idas their primary key, so that there is one row per listing.
Just use conditional aggregation:
SELECT AVG(CASE WHEN a.wifi = 0 THEN b.price END) as avg_no_wifi,
AVG(CASE WHEN a.wifi = 1 THEN b.price END) as avg_wifi
FROM Billing b JOIN
Amenities a
ON b.id = a.id
WHERE a.wifi IN (0, 1);
You can use a - if you want the difference instead of the specific values.
Let's assume we're working with data like the following (problems with your data model are noted below):
Billing
+------------+---------+
| listing_id | price |
+------------+---------+
| 1 | 1500.00 |
| 2 | 1700.00 |
| 3 | 1800.00 |
| 4 | 1900.00 |
+------------+---------+
Amenities
+------------+------+
| listing_id | wifi |
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
+------------+------+
Notice that I changed "id" to "listing_id" to make it clear what it was (using "id" as an attribute name is problematic anyways). Also, note that one listing doesn't have an entry in the Amenities table. Depending on your data, that may or may not be a concern (again, refer to the bottom for a discussion of your data model).
Based on this data, your averages should be as follows:
Listings with wifi average $1600 (Listings 1 and 2)
Listings without wifi (just 3) average 1800).
So the difference would be $200.
To achieve this result in SQL, it may be helpful to first get the average cost per amenity (whether wifi is offered). This would be obtained with the following query:
SELECT
Amenities.wifi AS has_wifi,
AVG(Billing.price) AS avg_cost
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
which gives you the following results:
+----------+-----------------------+
| has_wifi | avg_cost |
+----------+-----------------------+
| 0 | 1800.0000000000000000 |
| 1 | 1600.0000000000000000 |
+----------+-----------------------+
So far so good. So now we need to calculate the difference between these 2 rows. There are a number of different ways to do this, but one is to use a CASE expression to make one of the values negative, and then simply take the SUM of the result (note that I'm using a CTE, but you can also use a sub-query):
WITH
avg_by_wifi(has_wifi, avg_cost) AS
(
SELECT Amenities.wifi, AVG(Billing.price)
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
)
SELECT
ABS(SUM
(
CASE
WHEN has_wifi = 1 THEN avg_cost
ELSE -1 * avg_cost
END
))
FROM avg_by_wifi
which gives us the expected value of 200.
Now regarding your data model:
If both your Billing and Amenities table only have 1 row for each listing, it makes sense to combine them into 1 table. For example: Listings(listing_id, price, wifi)
However, this is still problematic, because you probably have a bunch of other amenities you want to model (pool, sauna, etc.) So you might want to model a many-to-many relationship between listings and amenities using an intermediate table:
Listings(listing_id, price)
Amenities(amenity_id, amenity_name)
ListingsAmenities(listing_id, amenity_id)
This way, you could list multiple amenities for a given listing without having to add additional columns. It also becomes easy to store additional information about an amenity: What's the wifi password? How deep is the pool? etc.
Of course, using this model makes your original query (difference in average cost of listings by wifi) a bit tricker, but definitely still doable.
In a DB2 Database, I want to do the following simple mathematics using a SQL query:
AvailableStock = SupplyStock - DemandStock
SupplyStock is stored in 1 table in 1 row, let's call this table the Supply table.
So the Supply table has this data:
ProductID | SupplyStock
---------------------
109 10
244 7 edit: exclude this product from the search
DemandStock is stored in a separate table Demand, where demand is logged as each customer logs demand during a customer order journey. Example data from the Demand table:
ProductID | DemandStock
------------------------
109 1
244 4 edit: exclude this product
109 6
109 2
So in our heads, if I want to calculate the AvailableStock for product '109', Supply is 10, Demand for product 109 totals to 9, and so Available stock is 1.
How do I do this in one select query in DB2 SQL?
The knowledge I have so far of some of the imagined steps in PseudoCode:
I select SupplyStock where product ID = '109'
I select sum(DemandStock) where product ID = '109'
I subtract SupplyStock from DemandStock
I present this as a resulting AvailableStock
The results will look like this:
Product ID | AvailableStock
109 9
I'd love to get this selected in one SQL select query.
Edit: I've since received an answer (that was almost perfect) and realised the question missed out some information.
This information:
We need to exclude data from products we don't want to select data for, and we also need to specifically select product 109.
My apologies, this was omitted from the original question.
I've since added a 'where' to select the product and this works for me. But for future sake, perhaps the answer should include this information too.
You do this using a join to bring the tables together and group by to aggregate the results of the join:
select s.ProductId, s.SupplyStock, sum(d.DemandStock),
(s.SupplyStock - sum(d.DemandStock)) as Available
from Supply s left join
Demand d
on s.ProductId = d.ProductId
where s.ProductId = 109
group by s.ProductId, s.SupplyStock;
I have two classes Apartment and AdditionalSpace representing tables as below.
Apartment table
ID AREA SOLD
---- ------ ----
1 100 1
2 200 0
AdditionalSpace table
ID AREA APARTMENTID
---- ------ -----------
10 10 1
11 10 1
12 10 1
20 20 2
21 20 2
As you can see Apartment's table has a one-to-many relation with AdditionalSpace table, i.e. Apartment.ID=AdditionalSpace.APARTMENTID.
Question:- How to retrieve total area of a sold apartment including its additional space area.
The SQL which I have used so far to retrieve similar result is :-
select sum(apt.area + ads.adsarea) from apartment apt left outer join (select sum(area) as adsarea, apartmentid from additionalspace group by apartmentid) ads on ads.apartmentid=apt.id where apt.sold=1
I am struggling to find a way in order to implement the above scenario via criteria instead of SQL/HQL. Please suggest. Thanks.
I don't think this is possible in criteria. The closest I can see is to simply get the size of the apartment and the sum of the additional areas as two columns in your result, like this:
Criteria criteria = session.createCriteria(Apartment.class,"a");
criteria.createAlias("additionalSpaces", "ads");
criteria.setProjection(Projections.projectionList()
.add(Projections.property("area"))
.add(Projections.groupProperty("a.id"))
.add(Projections.sum("ads.area")));
Alternatively, if you still want to use Hibernate but are happy to write it in HQL, you can do the following:
select ads.apartment.id,max(a.area)+sum(ads.area)
from Apartment a
join a.additionalSpaces ads
group by ads.apartment.id
This works because HQL allows you to write the + to add together the two projections, but I don't know that an analogous method exists on the projections api.
Happy Friday folks,
I'm trying to write an SSRS report displaying data from three (actually about 12, but only three relevant) tables that have akward relationships and the SQL query behind the data is proving difficult.
There are three entities involved - a Purchase Order, a Sales Order, and a Delivery. The problem is the a Purchase Order can have many sales orders, and also many deliveries which are NOT linked to the sales orders...that would be too easy.
Both the Sales Order and Delivery tables can be linked to the Purchase Order table by foreign keys and an intermediate table each.
I need to basically list Purchase Orders, a list of sales orders and a list of deliveries next to them, with NULLs for any fields that aren't valid so that'll give the required output in SSRS/when read by a human, ie, for a purchase order with 2 sales orders and 4 delivery dates;
PO SO Delivery
1234 ABC 05/10
1234 DEF 09/10
1234 NULL 10/12
1234 NULL 14/12
The above (when grouped by PO) will tell the users there are two sales orders and four (unlinked) delivery dates.
Likewise if there are more SOs than deliveries, we need NULLs in the Delivery column;
PO SO Delivery
1234 ABC 03/08
1234 DEF NULL
1234 GHI NULL
1234 JKL NULL
Above would be the case with 4 SOs and one delivery date.
Using Left Outer joins alone gives too much duplication - in this case 8 rows, as it gives 4 delivery dates for each match on the sales order;
PO SO Delivery
1234 ABC 05/10
1234 ABC 09/10
1234 ABC 10/12
1234 ABC 14/12
1234 DEF 05/10
1234 DEF 09/10
1234 DEF 10/12
1234 DEF 14/12
It's fine that the PO column is duplicated as SSRS can visually group that - but the SO/Delivery fields can't be allowed to duplicate as this can't be got rid of in the report - if I group the column in SSRS by SO then it still spits out 4 delivery dates for each one.
The only situation our query works nice is when there is just one SO per PO. In that case the single PO and SO numbers are duplicated together for x deliveries and can both be neatly grouped in SSRS. Unfortunately this is a rare occurence in the data.
I've thought of trying to use some sort of windowing function or CROSS APPLY but both fall down as they will repeat for every PO number listed and end up spitting out too much data.
At the point of thinking this just isn't set-based enough to be doable in SQL, I know the data is horrible..
Any help much appreciated.
EDIT - basical sqlfiddle link to the table schemas. Omitted many columns which aren't relevant. http://sqlfiddle.com/#!2/5ba16
Example data...
Purchase Order
PO_Number Style
1001 Black work boots
1002 Green hat
1006 Red Scarf
Sales Order
Sales_order_number PO_number Qty Retailer
A100-21 1001 15 Walmart
A100-22 1001 29 Walmart
A200-31 1006 1000 Asda
Delivery
Delivery_ID Delivery_Date PO_number
1543285 10/05/2014 1001
1543286 12/05/2014 1001
1543287 17/05/2014 1001
1543288 21/05/2014 1002
If you assign row numbers to the elements in salesorders and deliveries, you can link on that.
Something like this
declare #salesorders table (po int, so varchar(10))
declare #deliveries table (po int, delivery date)
declare #purchaseorders table (po int)
insert #purchaseorders values (123),(456)
insert #salesorders values (123,'a'),(123,'b'),(456,'c')
insert #deliveries values (123,'2014-1-1'),(456,'2014-2-1'),(456,'2014-2-1')
select *
from
(
select numbers.number, p.po, so.so, d.delivery from #purchaseorders p
cross join (Select number from master..spt_values where type='p') numbers
left join (select *,ROW_NUMBER() over (partition by po order by so) sor from #salesorders ) so
on p.po = so.po and numbers.number = so.sor
left join (select * , ROW_NUMBER() over (partition by po order by delivery) dor from #deliveries) d
on p.po = d.po and numbers.number = d.dor
) v
where so is not null or delivery is not null
order by po,number
Hypothetical situation: I work for a custom sign-making company, and some of our clients have submitted more sign designs than they're currently using. I want to know what signs have never been used.
3 tables involved:
table A - signs for a company
sign_pk(unique) | company_pk | sign_description
1 --------------------1 ---------------- small
2 --------------------1 ---------------- large
3 --------------------2 ---------------- medium
4 --------------------2 ---------------- jumbo
5 --------------------3 ---------------- banner
table B - company locations
company_pk | company_location(unique)
1 ------|------ 987
1 ------|------ 876
2 ------|------ 456
2 ------|------ 123
table C - signs at locations (it's a bit of a stretch, but each row can have 2 signs, and it's a one to many relationship from company location to signs at locations)
company_location | front_sign | back_sign
987 ------------ 1 ------------ 2
987 ------------ 2 ------------ 1
876 ------------ 2 ------------ 1
456 ------------ 3 ------------ 4
123 ------------ 4 ------------ 3
So, a.company_pk = b.company_pk and b.company_location = c.company_location. What I want to try and find is how to query and get back that sign_pk 5 isn't at any location. Querying each sign_pk against all of the front_sign and back_sign values is a little impractical, since all the tables have millions of rows. Table a is indexed on sign_pk and company_pk, table b on both fields, and table c only on company locations. The way I'm trying to write it is along the lines of "each sign belongs to a company, so find the signs that are not the front or back sign at any of the locations that belong to the company tied to that sign."
My original plan was:
Select a.sign_pk
from a, b, c
where a.company_pk = b.company_pk
and b.company_location = c.company_location
and a.sign_pk *= c.front_sign
group by a.sign_pk having count(c.front_sign) = 0
just to do the front sign, and then repeat for the back, but that won't run because c is an inner member of an outer join, and also in an inner join.
This whole thing is fairly convoluted, but if anyone can make sense of it, I'll be your best friend.
How about something like this:
SELECT DISTINCT sign_pk
FROM table_a
WHERE sign_pk NOT IN
(
SELECT DISTINCT front_sign sign
FROM table_c
UNION
SELECT DISTINCT rear_sign sign
FROM table_c
)
ANSI outer join is your friend here. *= has dodgy semantics and should be avoided
select distinct a.sign_pk, a.company_pk
from a join b on a.company_pk = b.company_pk
left outer join c on b.company_location = c.company_location
and (a.sign_pk = c.front_sign or a.sign_pk = c.back_sign)
where c.company_location is null
Note that the where clause is a filter on the rows returned by the join, so it says "do the joins, but give me only the rows that didn't to join to c"
Outer join is almost always faster than NOT EXISTS and NOT IN
I would be tempted to create a Temp table for the inner join and then outer join that.
But it really depends on the size of your data sets.
Yes, the schema design is flawed, but we can't always fix that!