SQL to get minimum of two different fields - sql

I have two different tables to track location of equipment. The "equipment" table tracks the current location and when it was installed there. If the equipment was previously at a different location, that information is kept in the "locationHistory" table. There is one row per equip_id in the equipment table. There can be 0 or more entries for each equip_id in the locationHistory table.
equipment
equip_id
current_location
install_date_at_location
locationHistory
equip_id
location
install_date
pickup_date
I want an SQL query that gets the date of the FIRST install_date for each piece of eqipment...
Example:
equipment
=========
equip_id | current_location | install_date_at_location
123 location1 1/23/2011
locationHistory
===============
equip_id | location | install_date | pickup_date
123 location2 1/1/2011 1/5/2011
123 location3 1/7/2011 1/20/2011
Should return: 123, 1/1/2011
Thoughts?

You will want to union the queries that each look at one field, then use a MIN against it.
Or you can use the CASE and MIN for the same effect
select e.equip_id, MIN(CASE WHEN h.install_date < e.install_date_at_location
THEN h.install_date
ELSE e.install_date_at_location
END) as first_install_date
from equipment e
left join locationHistory h on h.equip_id = e.equip_id
group by e.equip_id

Well, the critical piece of information is whether the install_at_location_date in equipment can ever be less than what I assume is the historical information in locationHistory. If that's not possible, you can do:
SELECT * FROM locationHistory L INNER JOIN
(SELECT equip_id, MIN(install_date) AS firstDate FROM locationHistory)
AS firstInstalls F
ON L.equip_id = F.equip_id AND L.install_date = F.firstDate
But if you have to worry about both tables, you need to create view that normalizes the tables for you, and then apply the query against the view:
CREATE VIEW normalLocations (equip_id, location, install_date) AS
SELECT equip_id, location, install_date_at_location FROM equipment
UNION ALL
SELECT equip_id, location, install_date FROM equipment;
SELECT * FROM normalLocations L INNER JOIN
(SELECT equip_id, MIN(install_date) AS firstDate FROM normalLocations)
AS firstInstalls F
ON L.equip_id = F.equip_id AND L.install_date = F.firstDate

A simple way to do it is:
SELECT U.Equip_ID, MIN(U.Install_Date)
FROM (SELECT E.Equip_ID, E.Install_Date_At_Location AS Install_Date
FROM Equipment AS E
UNION
SELECT L.Equip_ID, L.Install_Date
FROM LocationHistory AS L
) AS U
GROUP BY U.Equip_ID
This could generate a lot of rows from the LocationHistory table, but it isn't clear that it is worth 'optimizing' it by trying to apply a GROUP BY and MIN to the second half of the UNION (because you'd immediately redo the grouping with the result from the information in the equipment table).

Related

Join future dates to table which only has dates until current day

I have these two tables:
table1: name (string), actual (double), yyyy_mm_dd (date)
table2: name (string), expected(double), yyyy_mm_dd (string)
table1 contains data from 2018-01-01 up until the current day, table2 contains predicted data for the year of 2020. My problem is that table1 doesn’t have any date values past the present date, so I get duplicate data when joining like below:
SELECT
kpi.yyyy_mm_dd,
kpi.name,
kpi.actual as actual,
pre.predicted as predicted
FROM
schema1.table1 kpi
LEFT JOIN
schema1.table2 pre
ON name = kpi.name --AND pre.yyyy_mm_dd = kpi.yyyy_mm_dd
WHERE
kpi.yyyy_mm_dd >= '2019-12-09'
Output:
+----------+------------+----------+-------------+
|yyyy_mm_dd| name |actual |predicted |
+----------+------------+----------+-------------+
|2019-12-10| Company | 100000 | 925,180 |
|2019-12-10| Company | 100000 | 1,145,723 |
|2019-12-10| Company | 100000 | 456,359 |
--------------------------------------------------
If I uncomment the AND condition in my join clause, I won’t get the predicted values as my first table has no 2020 data. How can I join these tables together without duplicating actual values? actual should be null for days which haven't happened yet.
I think you want UNION ALL and not a JOIN:
SELECT
yyyy_mm_dd,
name,
actual as actual,
NULL as predicted
FROM schema1.table1
WHERE yyyy_mm_dd >= '2019-12-09'
UNION ALL
SELECT
yyyy_mm_dd,
name,
NULL as actual,
predicted as predicted
FROM schema1.table2
Hive supports full join:
SELECT COALESCE(kpi.yyyy_mm_dd, pre.yyyy_mm_dd) as yyyy_mm_dd,
COALESCE(kpi.name, pre.name) as name,
kpi.actual as actual,
pre.predicted as predicted
FROM (SELECT kpi.*
FROM schema1.table1 kpi
WHERE kpi.yyyy_mm_dd >= '2019-12-09'
) kpi FULL JOIN
schema1.table2 pre
ON kpi.name = pre.name AND
kpi.yyyy_mm_dd = pre.yyyy_mm_dd
Try using
group by
clause in your query, below might solve your problem
SELECT
kpi.yyyy_mm_dd,
kpi.name,
kpi.actual as actual,
pre.predicted as predicted
FROM
schema1.table1 kpi
LEFT JOIN
schema1.table2 pre
ON name = kpi.name
group by kpi.yyyy_mm_dd,kpi.name,kpi.actual

Finding a min() date for one column and then using this to join with other tables that have a date LESS than this date

In short, I have two tables:
(1) pharmacy_claims (columns: user_id, date_service, claim_id, record_id, prescription)
(2) medical_claims (columns: user_id, date_service, provider, npi, cost)
I want to find user_id's in (1) that have a certain prescription value, find their earliest date_service (e.g. min(date_service)) and then use these user_id's with their earliest date of service as a cohort to pull all of their associated data from (2). Basically I want to find all of their medical_claims data PRIOR to the first time they were prescribed a given prescription in pharmacy_claims.
pharmacy_claims looks something like this:
user_id | prescription | date_service
1 a 2018-05-01
1 a 2018-02-11
1 a 2019-10-11
1 b 2018-07-12
2 a 2019-01-02
2 a 2019-03-10
2 c 2018-04-11
3 c 2019-05-26
So for instance, if I was interested in prescription = 'a', I would only want user_id 1 and 2 returned, with dates 2018-02-11 and 2019-01-02, respectively. Then I would want to pull user_id 1 and 2 from the medical_claims, and get all of their data PRIOR to these respective dates.
The way I tried to go about this was to build out a temp table in the pharmacy_claims table to query the user_id's that have a given medication, and then left join this back to the table to create a cohort of user_id's with a date_service
Here's what I did:
(1) Pulled all of the relevant data from the main pharmacy claims table:
CREATE TABLE user.temp_pharmacy_claims AS
SELECT user_id, claim_id, record_id, date_service
FROM dw.pharmacyclaims
WHERE date_service between '2018-01-01' and '2019-08-31'
This results in ~50,000 user_id's
(2) Created a table with just the user_id's a min(date_service):
CREATE TABLE user.temp_pharmacy_claims_index AS
SELECT distinct user_id, min(date_service) AS Min_Date
FROM user.temp_pharmacy_claims
GROUP BY 1
(3) Created a final table (to get the desired cohort):
CREATE TABLE user.temp_pharmacy_claims_final_index AS
SELECT a.userid
FROM user.temp_pharmacy_claims a
LEFT JOIN user.temp_pharmacy_claims_index b
ON a.user = b.user
WHERE a.date_service < Min_Date
However, this gets me 0 results when there should be a few thousand. Is this set up correctly? It's probably not the most efficient approach, but it looks sound to me, so not sure what's going on.
I think you just want a correlated subquery:
select mc.*
from medical_claims mc
where mc.date_service < (select min(pc.date)
from pharmacy_claims pc
where pc.user_id = mc.user_id and
pc.prescription = ?
);

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

SQL Inner Join query

I have following table structures,
cust_info
cust_id
cust_name
bill_info
bill_id
cust_id
bill_amount
bill_date
paid_info
paid_id
bill_id
paid_amount
paid_date
Now my output should display records (1 jan 2013 to 1 feb 2013) between two bill_dates dates as single row as follows,
cust_name | bill_id | bill_amount | tpaid_amount | bill_date | balance
where tpaid_amount is total paid for particular bill_id
For example,
for bill id abcd, bill_amount is 10000 and user pays 2000 one time and 3000 second time
means, paid_info table contains two entries for same bill_id
bill_id | paid_amount
abcd 2000
abcd 3000
so, tpaid_amount = 2000 + 3000 = 5000 and balance = 10000 - tpaid_amount = 10000 - 5000 = 5000
Is there any way to do this with single query (inner joins)?
You'd want to join the 3 tables, then group them by bill ids and other relevant data, like so.
-- the select line, as well as getting your columns to display, is where you'll work
-- out your computed columns, or what are called aggregate functions, such as tpaid and balance
SELECT c.cust_name, p.bill_id, b.bill_amount, SUM(p.paid_amount) AS tpaid, b.bill_date, b.bill_amount - SUM(p.paid_amount) AS balance
-- joining up the 3 tables here on the id columns that point to the other tables
FROM cust_info c INNER JOIN bill_info b ON c.cust_id = b.cust_id
INNER JOIN paid_info p ON p.bill_id = b.bill_id
-- between pretty much does what it says
WHERE b.bill_date BETWEEN '2013-01-01' AND '2013-02-01'
-- in group by, we not only need to join rows together based on which bill they're for
-- (bill_id), but also any column we want to select in SELECT.
GROUP BY c.cust_name, p.bill_id, b.bill_amount, b.bill_date
A quick overview of group by: It will take your result set and smoosh rows together, based on where they have the same data in the columns you give it. Since each bill will have the same customer name, amount, date, etc, we are fine to group by those as well as the bill id, and we'll get a record for each bill. If we wanted to group it by p.paid_amount, though, since each payment would have a different one of those (possibly), you'd get a record for each payment as opposed to for each bill, which isn't what you'd want. Once group by has smooshed these rows together, you can run aggregate functions such as SUM(column). In this example, SUM(p.paid_amount) totals up all the payments that have that bill_id to work out how much has been paid. For more information, please look at W3Schools chapter on group by in their SQL tutorials.
Hope I've understood this correctly and that this helps you.
This will do the trick;
select
cust_name,
bill_id,
bill_amount,
sum(paid_amount),
bill_date,
bill_amount - sum(paid_amount)
from
cust_info
left outer join bill_info
left outer join paid_info
on bill_info.bill_id=paid_info.bill_id
on cust_info.cust_id=bill_info.cust_id
where
bill_info.bill_date between X and Y
group by
cust_name,
bill_id,
bill_amount,
bill_date

How to Select and Order By columns not in Groupy By SQL statement - Oracle

I have the following statement:
SELECT
IMPORTID,Region,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
GROUP BY
IMPORTID, Region,RefObligor
Order BY
IMPORTID, Region,RefObligor
There exists some extra columns in table Positions that I want as output for "display data" but I don't want in the group by statement.
These are Site, Desk
Final output would have the following columns:
IMPORTID,Region,Site,Desk,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
Ideally I'd want the data sorted like:
Order BY
IMPORTID,Region,Site,Desk,RefObligor
How to achieve this?
It does not make sense to include columns that are not part of the GROUP BY clause. Consider if you have a MIN(X), MAX(Y) in the SELECT clause, which row should other columns (not grouped) come from?
If your Oracle version is recent enough, you can use SUM - OVER() to show the SUM (grouped) against every data row.
SELECT
IMPORTID,Site,Desk,Region,RefObligor,
SUM(NOTIONAL) OVER(PARTITION BY IMPORTID, Region,RefObligor) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
Order BY
IMPORTID,Region,Site,Desk,RefObligor
Alternatively, you need to make an aggregate out of the Site, Desk columns
SELECT
IMPORTID,Region,Min(Site) Site, Min(Desk) Desk,RefObligor,SUM(NOTIONAL) AS SUM_NOTIONAL
From
Positions
Where
ID = :importID
GROUP BY
IMPORTID, Region,RefObligor
Order BY
IMPORTID, Region,Min(Site),Min(Desk),RefObligor
I believe this is
select
IMPORTID,
Region,
Site,
Desk,
RefObligor,
Sum(Sum(Notional)) over (partition by IMPORTID, Region, RefObligor)
from
Positions
group by
IMPORTID, Region, Site, Desk, RefObligor
order by
IMPORTID, Region, RefObligor, Site, Desk;
... but it's hard to tell without further information and/or test data.
A great blog post that covers this dilemma in detail is here:
http://bernardoamc.github.io/sql/2015/05/04/group-by-non-aggregate-columns/
Here are some snippets of it:
Given:
CREATE TABLE games (
game_id serial PRIMARY KEY,
name VARCHAR,
price BIGINT,
released_at DATE,
publisher TEXT
);
INSERT INTO games (name, price, released_at, publisher) VALUES
('Metal Slug Defense', 30, '2015-05-01', 'SNK Playmore'),
('Project Druid', 20, '2015-05-01', 'shortcircuit'),
('Chroma Squad', 40, '2015-04-30', 'Behold Studios'),
('Soul Locus', 30, '2015-04-30', 'Fat Loot Games'),
('Subterrain', 40, '2015-04-30', 'Pixellore');
SELECT * FROM games;
game_id | name | price | released_at | publisher
---------+--------------------+-------+-------------+----------------
1 | Metal Slug Defense | 30 | 2015-05-01 | SNK Playmore
2 | Project Druid | 20 | 2015-05-01 | shortcircuit
3 | Chroma Squad | 40 | 2015-04-30 | Behold Studios
4 | Soul Locus | 30 | 2015-04-30 | Fat Loot Games
5 | Subterrain | 40 | 2015-04-30 | Pixellore
(5 rows)
Trying to get something like this:
SELECT released_at, name, publisher, MAX(price) as most_expensive
FROM games
GROUP BY released_at;
But name and publisher are not added due to being ambiguous when aggregating...
Let’s make this clear:
Selecting the MAX(price) does not select the entire row.
The database can’t know and when it can’t give the right answer every
time for a given query it should give us an error, and that’s what it
does!
Ok… Ok… It’s not so simple, what can we do?
Use an inner join to get the additional columns
SELECT g1.name, g1.publisher, g1.price, g1.released_at
FROM games AS g1
INNER JOIN (
SELECT released_at, MAX(price) as price
FROM games
GROUP BY released_at
) AS g2
ON g2.released_at = g1.released_at AND g2.price = g1.price;
Or Use a left outer join to get the additional columns, and then filter by the NULL of a duplicate column...
SELECT g1.name, g1.publisher, g1.price, g2.price, g1.released_at
FROM games AS g1
LEFT OUTER JOIN games AS g2
ON g1.released_at = g2.released_at AND g1.price < g2.price
WHERE g2.price IS NULL;
Hope that helps.