Need help with a complex Join statement in SQL - sql

How can you join between a table with a sparse number of dates and another table with an exhaustive number of dates such that the gaps between the sparse dates take the values of the previous sparse date?
Illustrative example:
PRICE table (sparse dates):
date itemid price
2008-12-04 1 $1
2008-12-11 1 $3
2008-12-15 1 $7
VOLUME table (exhaustive dates):
date itemid volume_amt
2008-12-04 1 12345
2008-12-05 1 23456
2008-12-08 1 34567
2008-12-09 1 ...
2008-12-10 1
2008-12-11 1
2008-12-12 1
2008-12-15 1
2008-12-16 1
2008-12-17 1
2008-12-18 1
Desired result:
date price volume_amt
2008-12-04 $1 12345
2008-12-05 $1 23456
2008-12-08 $1 34567
2008-12-09 $1 ...
2008-12-10 $1
2008-12-11 $3
2008-12-12 $3
2008-12-15 $7
2008-12-16 $7
2008-12-17 $7
2008-12-18 $7
Update:
A couple people have suggested a correlated subquery that accomplishes the desired result. (Correlated subquery = a subquery that contains a reference to the outer query.)
This will work; however, I should have noted that the platform I'm using is MySQL, for which correlated subqueries are poorly optimized. Any way to do it without using a correlated subquery?

This isn't as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.
EXPLAIN SELECT v.`date`, v.volume_amt, p1.item_id, p1.price
FROM Volume v JOIN Price p1
ON (v.`date` >= p1.`date` AND v.item_id = p1.item_id)
LEFT OUTER JOIN Price p2
ON (v.`date` >= p2.`date` AND v.item_id = p2.item_id
AND p1.`date` < p2.`date`)
WHERE p2.item_id IS NULL;
This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.
I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.
edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on (date) and an index on (date, item_id) and the EXPLAIN plan was identical. An index on (item_id, date) may be better in this case. Here's the EXPLAIN output for that:
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ALL | item_id | NULL | NULL | NULL | 6 | |
| 1 | SIMPLE | v | ref | item_id | item_id | 22 | test.p1.item_id | 3 | Using where |
| 1 | SIMPLE | p2 | ref | item_id | item_id | 22 | test.v.item_id | 1 | Using where; Using index; Not exists |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.
edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the (item_id, date) index.

Assuming there is only 1 price per date/itemid:
select v.date, v.itemid, p.price
from volume v
join price p on p.itemid = v.item_id
where p.date = (select max(p2.date) from price p2
where p2.itemid = v.itemid
and p2.date <= v.date);

SELECT v.date, p.price, v.volume
FROM volume v
LEFT JOIN Price p ON p.itemID=v.itemID
AND p.[date] = (
SELECT MAX([date] )
FROM price p2
WHERE p2.[date] <= v.[date] AND p2.itemid= v.itemid
GROUP BY p2.[date]
)

SELECT Volume.date, volume.itemid, price.price, volume.volume_amt
FROM Volume
LEFT OUTER JOIN Price
ON Volume.date = Price.date
Probably. My SQL-fu is weak

This method works in Oracle. Don't know about other databases, and you didn't specify. If this exact syntax doesn't work in your database, I would guess there are similar techniques.
dev> select * from price;
AS_OF ID AMOUNT
----------- ---------- ----------
04-Dec-2008 1 1
11-Dec-2008 1 2
15-Dec-2008 1 3
dev> select * from volume;
DAY ID VOLUME
----------- ---------- ----------
05-Dec-2008 1 1
06-Dec-2008 1 2
07-Dec-2008 1 3
08-Dec-2008 1 4
09-Dec-2008 1 5
10-Dec-2008 1 6
11-Dec-2008 1 7
12-Dec-2008 1 8
13-Dec-2008 1 9
14-Dec-2008 1 10
15-Dec-2008 1 11
16-Dec-2008 1 12
17-Dec-2008 1 13
18-Dec-2008 1 14
19-Dec-2008 1 15
20-Dec-2008 1 16
21-Dec-2008 1 17
22-Dec-2008 1 18
23-Dec-2008 1 19
dev> select day, volume, amount from (
2 select day, volume, (select max(as_of) from price p where p.id = v.id and as_of <= day) price_as_of
3 from volume v
4 )
5 join price on as_of = price_as_of
6 order by day;
DAY VOLUME AMOUNT
----------- ---------- ----------
05-Dec-2008 1 1
06-Dec-2008 2 1
07-Dec-2008 3 1
08-Dec-2008 4 1
09-Dec-2008 5 1
10-Dec-2008 6 1
11-Dec-2008 7 2
12-Dec-2008 8 2
13-Dec-2008 9 2
14-Dec-2008 10 2
15-Dec-2008 11 3
16-Dec-2008 12 3
17-Dec-2008 13 3
18-Dec-2008 14 3
19-Dec-2008 15 3
20-Dec-2008 16 3
21-Dec-2008 17 3
22-Dec-2008 18 3
23-Dec-2008 19 3

Related

SQL that returns all the permutations of a summed column

Shot in the dark here. I'd personally struggle to come up with a simple SQL statement to do the following (if it can even be done), so I thought I'd throw this out here:
Let's say we have the following data:
ID VALUE
-- -----
1 60
2 60
3 60
4 60
And I wanted to find all the permutations of records that SUM to 120. Meaning, the results would be 6 rows:
1 AND 2
1 AND 3
1 AND 4
--2 AND 1 (already used)
2 AND 3
2 AND 4
--3 AND 1 (already used)
--3 AND 2 (already used)
3 AND 4
They actually want a "random sampling" of that result-set, but I need to know if I can even get that result-set. Of course, the real data wouldn't be that easy (everything 60), and the question was posed as "10 records that add up to 5 minutes" (the field is a duration field), which leads to other questions on how to handle that, but let me see if I can start with just getting permutations before actually getting more sophisticated.
Thanks.
These are combinations, not permutations. If you want all 2-way combinations, then use a self-join:
select t1.*, t2.*
from t t1 join
t t2
on t1.id < t2.id and
t1.value + t2.value = 60;
For an about 10% random sample, you can use:
select t1.*, t2.*
from t t1 join
t t2
on t1.id < t2.id and
t1.value + t2.value = 60
where rand() < 0.1;
select l.id, r.id, l.value+r.value as sum
from t l
inner join t r
on l.id < r.id
where l.value+r.value = 120
order by l.id, r.id
rextester demo: http://rextester.com/FWCLT49699
returns:
+----+----+-----+
| id | id | sum |
+----+----+-----+
| 1 | 2 | 120 |
| 1 | 3 | 120 |
| 1 | 4 | 120 |
| 2 | 3 | 120 |
| 2 | 4 | 120 |
| 3 | 4 | 120 |
+----+----+-----+
Table Tvalues
ID VALUE
-- -----
1 60
2 60
3 60
4 60
Select A.ID, B.ID from TValues A
join TValues B on B.ID != A.ID
where
(A.Value+B.Value) = 120
and
A.ID < B.ID -- eliminates dups, if (1,3) is printed, (3,1 will not be printed)

SQL: Add values to STDEVP calculation

I have the following table.
Key | Count | Amount
----| ----- | ------
1 | 2 | 10
1 | 2 | 15
2 | 5 | 1
2 | 5 | 2
2 | 5 | 3
2 | 5 | 50
2 | 5 | 20
3 | 3 | 5
3 | 3 | 4
3 | 3 | 5
Sorry I couldn't figure out who to make the above a table.
I'm running this on SQL Server Management Studio 2012.
I'd like the stdevp return of the amount columns but if the number of records is less than some value 'x' (there will never be more than x records for a given key), then I want to add zeros to account for the remainder.
For example, if 'x' is 6:
for key 1, I need stdevp(10,5,0,0,0,0)
for key 2, I need stdevp(1,2,3,50,20,0)
for key 3, I need stdevp(5,4,5,0,0,0)
I just need to be able to add zeros to the calculation. I could insert records to my table, but that seems rather tedious.
This seems complicated -- padding data for each key. Here is one approach:
with xs as (
select 0 as val, 1 as n
union all
select 0, n + 1
from xs
where xs.n < 6
)
select k.key, stdevp(coalesce(t.amount, 0))
from xs cross join
(select distinct key from t) k left join
(select t.*, row_number() over (partition by key order by key) as seqnum
from t
) t
on t.key = k.key and t.seqnum = xs.n
group by k.key;
The idea is that the cross join generates 6 rows for each key. Then the left join brings in available rows, up to the maximum.

How to calculate the value of a previous row from the count of another column

I want to create an additional column which calculates the value of a row from count column with its predecessor row from the sum column. Below is the query. I tried using ROLLUP but it does not serve the purpose.
select to_char(register_date,'YYYY-MM') as "registered_in_month"
,count(*) as Total_count
from CMSS.USERS_PROFILE a
where a.pcms_db != '*'
group by (to_char(register_date,'YYYY-MM'))
order by to_char(register_date,'YYYY-MM')
This is what i get
registered_in_month TOTAL_COUNT
-------------------------------------
2005-01 1
2005-02 3
2005-04 8
2005-06 4
But what I would like to display is below, including the months which have count as 0
registered_in_month TOTAL_COUNT SUM
------------------------------------------
2005-01 1 1
2005-02 3 4
2005-03 0 4
2005-04 8 12
2005-05 0 12
2005-06 4 16
To include missing months in your result, first you need to have complete list of months. To do that you should find the earliest and latest month and then use heirarchial
query to generate the complete list.
SQL Fiddle
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
)
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date;
Once you have all the months, you can outer join it to your table. To get the cumulative sum, simply add up the count using SUM as analytical function.
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
),
y(all_months) as (
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date
)
select to_char(a.all_months,'yyyy-mm') registered_in_month,
count(b.register_date) total_count,
sum(count(b.register_date)) over (order by a.all_months) "sum"
from y a left outer join users_profile b
on a.all_months = trunc(b.register_date,'month')
group by a.all_months
order by a.all_months;
Output:
| REGISTERED_IN_MONTH | TOTAL_COUNT | SUM |
|---------------------|-------------|-----|
| 2005-01 | 1 | 1 |
| 2005-02 | 3 | 4 |
| 2005-03 | 0 | 4 |
| 2005-04 | 8 | 12 |
| 2005-05 | 0 | 12 |
| 2005-06 | 4 | 16 |

Help with optimising SQL query

Hi i need some help with this problem.
I am working web application and for database i am using sqlite. Can someone help me with one query from databse which must be optimized == fast =)
I have table x:
ID | ID_DISH | ID_INGREDIENT
1 | 1 | 2
2 | 1 | 3
3 | 1 | 8
4 | 1 | 12
5 | 2 | 13
6 | 2 | 5
7 | 2 | 3
8 | 3 | 5
9 | 3 | 8
10| 3 | 2
....
ID_DISH is id of different dishes, ID_INGREDIENT is ingredient which dish is made of:
so in my case dish with id 1 is made with ingredients with ids 2,3
In this table a have more then 15000 rows and my question is:
i need query which will fetch rows where i can find ids of dishes ordered by count of ingreedients ASC which i haven added to my algoritem.
examle: foo(2,4)
will rows in this order:
ID_DISH | count(stillMissing)
10 | 2
1 | 3
Dish with id 10 has ingredients with id 2 and 4 and hasn't got 2 more, then is
My query is:
SELECT
t2.ID_dish,
(SELECT COUNT(*) as c FROM dishIngredient as t1
WHERE t1.ID_ingredient NOT IN (2,4)
AND t1.ID_dish = t2.ID_dish
GROUP BY ID_dish) as c
FROM dishIngredient as t2
WHERE t2.ID_ingredient IN (2,4)
GROUP BY t2.ID_dish
ORDER BY c ASC
works,but it is slow....
select ID_DISH, sum(ID_INGREDIENT not in (2, 4)) stillMissing
from x
group by ID_DISH
having stillMissing != count(*)
order by stillMissing
this is the solution, my previous query work 5 - 20s this work about 80ms
This is from memory, as I don't know the SQL dialect of sqlite.
SELECT DISTINCT T1.ID_DISH, COUNT(T1.ID_INGREDIENT) as COUNT
FROM dishIngredient as T1 LEFT JOIN dishIngredient as T2
ON T1.ID_DISH = T2.ID_DISH
WHERE T2.ID_INGREDIENT IN (2,4)
GROUP BY T1.ID_DISH
ORDER BY T1.ID_DISH

SQL AVG(COUNT(*))?

I'm trying to find out the average number of times a value appears in a column, group it based on another column and then perform a calculation on it.
I have 3 tables a little like this
DVD
ID | NAME
1 | 1
2 | 1
3 | 2
4 | 3
COPY
ID | DVDID
1 | 1
2 | 1
3 | 2
4 | 3
5 | 1
LOAN
ID | DVDID | COPYID
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
4 | 3 | 4
5 | 1 | 5
6 | 1 | 5
7 | 1 | 5
8 | 1 | 2
etc
Basically, I'm trying to find all the copy ids that appear in the loan table LESS times than the average number of times for all copies of that DVD.
So in the example above, copy 5 of dvd 1 appears 3 times, copy 2 twice and copy 1 once so the average for that DVD is 2. I want to list all the copies of that (and each other) dvd that appear less than that number in the Loan table.
I hope that makes a bit more sense...
Thanks
Similar to dotjoe's solution, but using an analytic function to avoid the extra join. May be more or less efficient.
with
loan_copy_total as
(
select dvdid, copyid, count(*) as cnt
from loan
group by dvdid, copyid
),
loan_copy_avg as
(
select dvdid, copyid, cnt, avg(cnt) over (partition by dvdid) as copy_avg
from loan_copy_total
)
select *
from loan_copy_avg lca
where cnt <= copy_avg;
This should work in Oracle:
create view dvd_count_view
select dvdid, count(1) as howmanytimes
from loans
group by dvdid;
select avg(howmanytimes) from dvd_count_view;
Untested...
with
loan_copy_total as
(
select dvdid, copyid, count(*) as cnt
from loan
group by dvdid, copyid
),
loan_copy_avg as
(
select dvdid, avg(cnt) as copy_avg
from loan_copy_total
group by dvdid
)
select lct.*, lca.copy_avg
from loan_copy_avg lca
inner join loan_copy_total lct on lca.dvdid = lct.dvdid
and lct.cnt <= lca.copy_avg;