How to select value in a table based on date range that may change - sql

I can't seem to find a question that specifically addresses the requirements of what I'm trying to do here, but apologies if this isn't the case.
I have an example table:
type|start_date|end_date |cost|
bananas|2019-01-01|2019-01-31|100
bananas|2019-02-01|2019-02-28|95
juice |null |null |55
And so on. The point of the table is that the 'food' or whatever else, can be a different price based on a different time range, however there are some items e.g. juice, that don't get affected when the date changes (i.e they stay constant).
In another table, I want to lookup a value based on the date range in the table above.
For example:
customerId|transaction_date|item |quantity|
abc123 |2019-01-25 |bananas|4
abc126 |2019-02-06 |bananas|4
abc128 |2019-02-09 |juice |1
So the first two customers bought bananas, but one in January and one in February.
Then another customer bought juice.
How can I return the correct corresponding cost based on my other table.
I have tried to do where date is BETWEEN the two, but this alienates any other dates that don't exist in the pricing table.
I've also tried to use CASE to approach this, but to no avail.
What would be the best approach to this?
I would expect the result to yield a joined table with the following:
customerId|transaction_date|item |quantity|cost|
abc123 |2019-01-25 |bananas|4 |100
abc126 |2019-02-06 |bananas|4 |95
abc128 |2019-02-09 |juice |1 |55
Where cost column is joined by the criteria above?

You could use ISNULL on the left join.
SELECT P.*, C.cost
FROM Purchases P
LEFT JOIN Product_Costs C ON P.item = C.type
AND P.transaction_date >= ISNULL(C.start_date, P.transaction_date)
AND P.transaction_date <= ISNULL(C.end_date, P.transaction_date);
Modified Nick's fiddle: http://sqlfiddle.com/#!18/e366b/6

You can use a correlated subquery:
select t2.*,
(select t1.cost
from table1 t1
where t1.type = t2.type and
t2.transaction_date >= t1.start_date and
t2.transaction_date <= t1.end_date
) as cost
from table2 t2;
Or, you can use a left join:
select t2.*, t1.cost
from table2 t2 left join
table1 t1
on t1.type = t2.type and
t2.transaction_date >= t1.start_date and
t2.transaction_date <= t1.end_date ;
EDIT:
You can handle NULL values with OR:
select t2.*, t1.cost
from table2 t2 left join
table1 t1
on t1.type = t2.type and
(t2.transaction_date >= t1.start_date or
t1.start_date is null
) and
(t2.transaction_date <= t1.end_date or
t1.end_date is null
);

You could use an outer apply:
SELECT p.*,
c.cost
FROM Purchases AS p
OUTER APPLY (
SELECT TOP 1 c.cost
FROM Product_Costs AS c
WHERE p.transaction_date BETWEEN c.start_date AND c.end_date
AND c.type = p.Item
) AS c;
I've created a Sql Fiddle using your example data: http://sqlfiddle.com/#!18/e366b/2/0

Related

Join Tables Where Absolute of Occurrences Equal to Absolute of Single Occurrence

Problem: I am currently trying to join two tables in Access where the absolute value of the total of multiple occurrences of one field in one table is equal to the absolute value of the occurrence in another table.
Am I able to use the Abs function in the WHERE statement? Everything I saw involved the function being used in the SELECT statement. Do I need to create separate queries to get the absolute values? It would also work if I were to check to see if they balance out rather than getting the absolute value.
In one table, a certain value will repeat multiple times while it will only appear once in the other table. How can I get the absolute values of the totals in order to compare it to the single occurrence in the other table? Thanks!
Table 1
Reference
Amount
55555
$15
55555
$20
Table 2
Reference
Amount
55555
-$35
If these are equal or if they balance out, they should appear on a query. If these aren't equal but the reference number and a partial amount appears, they should appear on another query.
The matching query is relatively simple. Just aggregate and compare the values:
select t2.reference, t2.amount
from (select reference, sum(amount) as amount
from table2
group by reference
) as t2 inner join
(select reference, sum(amount) as amount
from table1
group by reference
) as t1
on t1.reference = t2.reference
where t2.amount + t1.amount = 0;
However, the non-matches are much trickier -- because presumably a reference could be missing from either table. And MS Access does not support full join. One method is:
select t2.reference, t2.amount, t1.amount
from (select reference, sum(amount) as amount
from table2
group by reference
) as t2 left join
(select reference, sum(amount) as amount
from table1
group by reference
) as t1
on t1.reference = t2.reference
where t2.amount + t1.amount <> 0 or t1.amount is null
union all
select t1.reference, null, sum(amount)
from table1 as t1
where not exists (select 1 from table2 as t2 where t2.reference = t1.reference)
group by t1.reference;
Aggregate data in Table1 and join to Table2.
Consider:
SELECT Table2.*, SumAmt FROM Table2
INNER JOIN (SELECT Reference, Sum(Amount) AS SumAmt FROM Table1 GROUP BY Reference) AS T
ON Table2.Reference = T.Reference
WHERE SumAmt = -Table2.Amount;
You can list if the amounts match or not, no Abs is needed:
Select
Table2.Reference,
Table2.Amount,
(T.Total = -Table2.Amount) As T1Match
From
Table2
Inner Join
(Select
Table1.Reference,
Sum(Table1.Amount) As Total
From
Table1
Group By
Table1.Reference) As T
On T.Reference On Table2.Reference
you can use sub-query as follows:
select t1.reference from
(select reference, sum(amount) as amount from table1 group by reference) t1
join table2 t2
on abs(t1.amount) = abs(t2.amount)

How to join two SQL tables by extracting maximum numbers from one then into another?

As others have commented, I'm now going to add some code:
Imported tables
table3
Case No. is the primary key. Each report date shows one patient. Depending on if the patient is import or local, the cumulative column increases. You can see some days there are no cases so the date like 25/01/2020 is skipped
table2
Report date has no duplicate.
Now, I want to join the tables. Example outcome here:
enter image description here
The maximum cumulative of each date is joined into the new table. So although 26/01/2020 of table3 shows the increase from 6, 7, to 8, I only want the highest cumulative number there.
Thanks for letting me know how my previous query could be improved. Your opinion helps me a lot.
I have tried Gordon Linoff's by substituting the actual names (which I initially omitted because I thought they were ambiguous).
His code is as follows (I've upvoted):
SELECT t3.`Report date`,
max(max(t3.cumulative_local)) over (order by t3.`Report date`),
max(max(t3.cumulative_import)) over (order by t3.`Report date`)
from table3 t3 left join
table2 t2
using (`Report date`)
group by t2.`Report date`;
But I got an error
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'new.t3.Report date' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Anyways I am now experimenting. Both answers helped. If you know how to fix 1055, let me know, or if you could propose another solution. Thanks
I think you just want aggregation and window functions:
select t1.date,
max(max(cumulativea)) over (order by t1.date),
max(max(cumulativeb)) over (order by t1.date)
from table1 t1 left join
table2 t2
on t1.date = t2.date
group by t1.date;
This returns the maximum values of the two columns up to each date, which is, I think, what you are trying to describe.
I don't understand why you have cumulA and cumulB on table1. I suppose it will be to store the Max cumulA and cumulB for each days.
You must first self-join table2 to find the Max for each date (with a GROUP BY date) :
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
After, you join left table1 on result of self-join table2 to have each days.
SELECT * FROM `t1` LEFT JOIN t2 ON t1.date = t2.date ORDER BY t1.date
Here is the fusion of the 2 junctions :
SELECT * FROM `t1` LEFT JOIN (
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
) AS tt
ON t1.date = tt.date ORDER BY t1.date
You do the same for cumulB.
And after (I suppose), you INSERT INTO the result into table1.
I hope I answered your question.
Good continuation.
_Teddy_

Need help in postgreSQL query

I have two tables, table1 and table2. I have written the query with some condition as follows
Select t2.employee_id,
t2.adddate,
t2.previousaleave
from table2 as t2, table1 as t1
WHERE t1.enddate IS NULL
OR t1.enddate>t2.adddate
AND t2.adddate<=now()
AND t2.leavetype='annualleave'
If i run this,the conditions are not working.It is selecting all the empids of the table t2.?I checked that the problem is with the t1.enddate is NULL condition. Since the enddate column can be either,
some date
or, null
I need to get the empid if the t1.enddate IS NULL and the other conditions succeed. Here leave type is distinct with in each empid. (Each employee have only one row for the annualleave). Is there any other alternative way to do this.
You always need parentheses if using OR
For example:
Select t2.employee_id,t2.adddate,t2.previousaleave
from table2 as t2
Inner join table1 as t1 ON ............. ?????
WHERE (t1.enddate IS NULL OR t1.enddate>t2.adddate AND t2.adddate<=now())
AND t2.leavetype='annualleave'
BUT ALSO NOTE you don't appear to have joined the tables. Always use explicit ANSI join syntax to avoid creating an accidental Cartesian product of the rows from the tables.
If empid exists in both tables try
Select t2.employee_id,t2.adddate,t2.previousaleave
from table2 as t2
Inner join table1 as t1 ON t2.empid = t1.empid
WHERE (t1.enddate IS NULL OR t1.enddate>t2.adddate AND t2.adddate<=now())
AND t2.leavetype='annualleave'
SELECT t2.employee_id, t2.adddate, t2.previousaleave
FROM table2 AS t2
INNER JOIN table1 as t1
ON (
( t1.enddate IS NULL OR t1.enddate > t2.adddate )
AND t2.adddate <= now()
AND t2.leavetype = 'annualleave'
)
Is it possible for t2.adddate to be > now()? That seems awkward.

SQL Join based on dates- Table2.Date=Next date after Table1.Date

I have two seperate tables which I want to join based on Dates. However, I don't want the dates in the tables to be equal to one another I want the date (and accompanying value) from one table to be joined with the next date available after that date in the second table.
I've put an example of the problem below:
Table 1:
Date Value
2015-04-13 A
2015-04-10 B
2015-04-09 C
2015-04-08 D
Table 2:
Date Value
2015-04-13 E
2015-04-10 F
2015-04-09 G
2015-04-08 H
Desired Output Table:
Table1.Date Table2.Date Table1.Value Table2.Value
2015-04-10 2015-04-13 B E
2015-04-09 2015-04-10 C F
2015-04-08 2015-04-09 D G
I'm at a bit of an ends of where to even get going with this, hence the lack of my current SQL starting point!
Hopefully that is clear. I found this related question that comes close but I get lost on incorporating this into a join statment!!
SQL - Select next date query
Any help is much appreciated!
M.
EDIT- There is a consideration that is important in that the day will not always be simply 1 day later. They need to find the next day available, which was in the original question but Ive update my example to reflect this.
Since you want the next available date, and that might not necessarily be the following date (eg. date + 1) you'll want to use a correlated subquery with either min or top 1.
This will give you the desired output:
;WITH src AS (
SELECT
Date,
NextDate = (SELECT MIN(Date) FROM Table2 WHERE Date > t1.Date)
FROM table1 t1
)
SELECT src.Date, src.NextDate, t1.Value, t2.Value
FROM src
JOIN Table1 t1 ON src.Date = t1.Date
JOIN Table2 t2 ON src.NextDate = t2.Date
WHERE src.NextDate IS NOT NULL
ORDER BY src.Date DESC
Sample SQL Fiddle
try this
select [Table 1].Date,[Table 1].Value,[Table 2].date,[Table 2].Value
from [Table 1]
join [Table 1]
on dateadd(dd,1,[Table 1].date) = [Table 2].date
i'd go with an outer apply:
SELECT t1.*, t2.*
FROM Table1 t1
CROSS APPLY (
SELECT TOP 1 *
FROM Table2 t2
WHERE t2.Date > t1.Date
ORDER BY t2.Date) t2
ORDER BY t1.Date DESC

hive sql aggregate

I have two tables in Hive, t1 and t2
>describe t1;
>date_id string
>describe t2;
>messageid string,
createddate string,
userid int
> select * from t1 limit 3;
> 2011-01-01 00:00:00
2011-01-02 00:00:00
2011-01-03 00:00:00
> select * from t2 limit 3;
87211389 2011-01-03 23:57:01 13864753
87211656 2011-01-03 23:57:59 13864769
87211746 2011-01-03 23:58:25 13864785
What I want is to count previous three-day distinct userid for a given date.
For example, for date 2011-01-03, I want to count distinct userid from 2011-01-01 to 2011-01-03.
for date 2011-01-04, I want to count distinct userid from 2011-01-02 to 2011-01-04
I wrote the following query. But it does not return three-day result. It returns distinct userid per day instead.
SELECT to_date(t1.date_id), count(distinct t2.userid) FROM t1 JOIN t2
ON (to_date(t2.createddate) = to_date(t1.date_id))
WHERE date_sub(to_date(t2.createddate),0) > date_sub(to_date(t1.date_id), 3)
AND to_date(t2.createddate) <= to_date(t1.date_id)
GROUP by to_date(t1.date_id);
`to_date()` and `date_sub()` are date function in Hive.
That said, the following part does not take effect.
WHERE date_sub(to_date(t2.createddate),0) > date_sub(to_date(t1.date_id), 3)
AND to_date(t2.createddate) <= to_date(t1.date_id)
EDIT: One solution can be (but it is super slow):
SELECT to_date(t3.date_id), count(distinct t3.userid) FROM
(
SELECT * FROM t1 LEFT OUTER JOIN t2
WHERE
(date_sub(to_date(t2.createddate),0) > date_sub(to_date(t1.date_id), 3)
AND to_date(t2.createddate) <= to_date(t1.date_id)
)
) t3
GROUP by to_date(t3.date_id);
UPDATE: Thanks for all answers. They are good.
But Hive is a bit different from SQL. Unfortunately, they cannot use in HIVE.
My current solution is to use UNION ALL.
SELECT * FROM t1 JOIN t2 ON (to_date(t1.date_id) = to_date(t2.createddate))
UNION ALL
SELECT * FROM t1 JOIN t2 ON (to_date(t1.date_id) = date_add(to_date(t2.createddate), 1)
UNION ALL
SELECT * FROM t1 JOIN t2 ON (to_date(t1.date_id) = date_add(to_date(t2.createddate), 2)
Then, I do group by and count. In this way, I can get what I want.
Although it is not elegant, it is much efficient than cross join.
The following should seem to work in standard SQL...
SELECT
to_date(t1.date_id),
count(distinct t2.userid)
FROM
t1
LEFT JOIN
t2
ON to_date(t2.createddate) >= date_sub(to_date(t1.date_id), 2)
AND to_date(t2.createddate) < date_add(to_date(t1.date_id), 1)
GROUP BY
to_date(t1.date_id)
It will, however, be slow. Because you are storing dates as strings, the using to_date() to convert them to dates. What this means is that indexes can't be used, and the SQL engine can't do Anything clever to reduce the effort being expended.
As a result, every possible combination of rows needs to be compared. If you have 100 entries in T1 and 10,000 entries in T2, your SQL engine is processing a million combinations.
If you store these values as dates, you don't need to_date(). And if you index the dates, the SQL engine can quickly home in on the range of dates being specified.
NOTE: The format of the ON clause means that you do not need to round t2.createddate down to a daily value.
EDIT Why your code didn't work...
SELECT to_date(t1.date_id), count(distinct t2.userid) FROM t1 JOIN t2
ON (to_date(t2.createddate) = to_date(t1.date_id))
WHERE date_sub(to_date(t2.createddate),0) > date_sub(to_date(t1.date_id), 3)
AND to_date(t2.createddate) <= to_date(t1.date_id)
GROUP by to_date(t1.date_id);
This joins t1 to t2 with an ON clause of (to_date(t2.createddate) = to_date(t1.date_id)). As the join is a LEFT OUTER JOIN, the values in t2.createddate MUST now either be NULL (no matches) or be the same as t1.date_id.
The WHERE clause allows a much wider range (3 days). But the ON clause of the JOIN has already restricted you data down to a single day.
The example I gave above simply takes your WHERE clause and put's it in place of the old ON clause.
EDIT
Hive doesn't allow <= and >= in the ON clause? Are you really fixed in to using HIVE???
If you really are, what about BETWEEN?
SELECT
to_date(t1.date_id),
count(distinct t2.userid)
FROM
t1
LEFT JOIN
t2
ON to_date(t2.createddate) BETWEEN date_sub(to_date(t1.date_id), 2) AND date_add(to_date(t1.date_id), 1)
GROUP BY
to_date(t1.date_id)
Alternatively, refactor your table of dates to enumerate the dates you want to include...
TABLE t1 (calendar_date, inclusive_date) =
{ 2011-01-03, 2011-01-01
2011-01-03, 2011-01-02
2011-01-03, 2011-01-03
2011-01-04, 2011-01-02
2011-01-04, 2011-01-03
2011-01-04, 2011-01-04
2011-01-05, 2011-01-03
2011-01-05, 2011-01-04
2011-01-05, 2011-01-05 }
SELECT
to_date(t1.calendar_date),
count(distinct t2.userid)
FROM
t1
LEFT JOIN
t2
ON to_date(t2.createddate) = to_date(t1.inclusive_date)
GROUP BY
to_date(t1.calendar_date)
You need a subquery:
try something like this (i cannot test because i don't have hive)
SELECT to_date(t1.date_id), count(distinct t2.userid) FROM t1 JOIN t2
ON (to_date(t2.createddate) = to_date(t1.date_id))
WHERE t2.messageid in
(
select t2.messageid from t2 where
date_sub(to_date(t2.createddate),0) > date_sub(to_date(t1.date_id), 3)
AND
to_date(t2.createddate) <= to_date(t1.date_id)
)
GROUP by to_date(t1.date_id);
the key is that with subquery FOR EACH date in t1, the right records are selected in t2.
EDIT:
Forcing subquery in from clause you could try this:
SELECT to_date(t1.date_id), count(distinct t2.userid) FROM t1 JOIN
(select userid, createddate from t2 where
date_sub(to_date(t2.createddate),0) > date_sub(to_date(t1.date_id), 3)
AND
to_date(t2.createddate) <= to_date(t1.date_id)
) as t2
ON (to_date(t2.createddate) = to_date(t1.date_id))
GROUP by to_date(t1.date_id);
but don't know if could work.
I am making an assumption that t1 is used to define the 3 day period. I suspect the puzzling approach is due to Hive's shortcomings.
This allows you to have an arbitrary number of 3 day periods.
Try the following 2 queries
SELECT substring(t1.date_id,1,10), count(distinct t2.userid)
FROM t1
JOIN t2
ON substring(t2.createddate,1,10) >= date_sub(substring(t1.date_id,1,10), 2)
AND substring(t2.createddate,1,10) <= substring(t1.date_id,1,10)
GROUP BY t1.date_id
--or--
SELECT substring(t1.date_id,1,10), count(distinct t2.userid)
FROM t1
JOIN t2
ON t2.createddate like substring(t1.date_id ,1,10) + '%'
OR t2.createddate like substring(date_sub(t1.date_id, 1) ,1,10) + '%'
OR t2.createddate like substring(date_sub(t1.date_id, 2) ,1,10) + '%'
GROUP BY t1.date_id
The latter minimizes the function calls on the t2 table. I am also assuming that t1 is the smaller of the 2.
substring should return the same result as to_date. According to the documentation, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions, to_date returns a string data type.
Support for date data types seems minimal but I am not familiar with hive.