LEFT JOIN by closer value condition - sql

I have this query
SELECT
loc.proceso,
loc.codigo_municipio,
loc.codigo_concejo,
loc.concejo,
(CASE
WHEN loc.poblacion IS NOT NULL THEN loc.poblacion
ELSE pob.valor
END) AS poblacion
FROM develop.031401_elecciones_dimension_localizacion_electoral AS loc
LEFT JOIN develop.031401_elecciones_dimension_proceso_electoral AS proc
ON loc.proceso = proc.proceso
LEFT JOIN develop.020101_t05 AS pob
ON loc.codigo_municipio = CAST(pob.cmun AS INT) AND pob.year = proc.anno_eleccion
In the second LEFT JOIN, I would like to change the second condition pob.year = proc.anno_eleccion so that it does not only search for the exact year when joining. Instead, I would like to get the closer year stored in my pob table. For example, the first year stored in pob is 2003, so I want all the entries in loc whose year is lower than 2003 to be matched with that value when performing the join. Also at the inverse, the last year stored in pob is 2020, so I want those entries in loc whose year is 2021 (or even greater), to be matched with the 2020 row from my pob table. When the exact year is contained in pob table, it should be used for the join.

1. If you want the nearest year to NOW
I don't think of a direct join but you can try this one by using ROW_NUMBER() function to sort data by year and pick the first result to join:
(WHERE rn = 1 picks the first index, so it prevents any duplicate)
LEFT JOIN
(SELECT T.* FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY pob.cmun ORDER BY pob.year DESC) AS rn,
*
FROM develop.020101_t05) AS T
WHERE rn = 1) AS pob
ON loc.codigo_municipio = CAST(pob.cmun AS INT) AND pob.year = proc.anno_eleccion
2. If you want the nearest year to your data
Even it's not best practice, you can join your data using comparison operators on join condition. Then, take the difference between two years, sort the difference ascending and pick the first result using ROW_NUMBER() function. See example:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY a.Id ORDER BY a.Year - b.Year) AS RowNumber,
a.Id,
a.Year,
b.Year,
a.Year - b.Year AS YearDiff
FROM a
LEFT JOIN b ON a.Id = b.Id AND a.Year >= b.Year) AS T
WHERE RowNumber = 1

Related

look up dynamic value from range in another table

I have 2 tables. The first one is a detail table with years and actual limit values. The second table has max control limits but only for certain years.
Table 1 & 2
What I want to do is list all of the detail records but pull in the values of the control limits if the year is less than the next one listed in the control table.
Desired results:
results
I have tried this query but it duplicates 2015, 2016 and 2017.
SELECT d.id, d.yeard, d.value, c.Column1
FROM detailTbl d
RIGHT OUTER JOIN controlTbl c ON d.dated <= c.datec
You may use Row_Number() function as the following to remove duplicates:
with cte as
(
Select D.id,D.yeard,D.val, C.limitVal,
row_number() over (partition by D.id order by C.yeard desc) as rn from
detailTbl D left join controlTbl C
on D.yeard>=C.yeard
)
Select B.id,B.yeard,B.val,B.limitVal from
cte B where B.rn=1 order by B.id
See a demo on MySQL 8.0 from here.

Why is my Left Not pulling all the dates even though they exist on the other table SQL

I have a table without dates and wish to join on the table with dates.I am doing a left join on id and bn_number. The id can have more than one dates , i obviously want the latest date from the other tables as it has more than one date for each id. i am not sure how to get all the dates at least then i can be able to choose the latest one.
select Reg_Property_id,a.Bnd_nbr,account_balance,abs(account_balanc‌​e) as Bond_Balance,a.Bnd_regDate
into #Jan2014ValidFin
from #Jan2014Valid aa
left join Pr_analytics..bond a
on aa.Reg_Property_id=a.Prop_id
and aa.bnd_nbr=a.Bnd_nbr
where aa.reg_property_id is not null
SQL
Please assist.
Use the ROW_NUMBER() window function to get the most recent date:
SELECT c.*
FROM (
SELECT a.cols, b.cols, ROW_NUMBER() OVER (PARTITION BY b.colID1,b.colID2 ORDER BY b.theDate DESC) AS rn
FROM a
LEFT OUTER JOIN b ON a.col1 = b.col1
AND a.col2 = b.col2
) c
WHERE c.rn = 1
A simple group by should do the trick:
SELECT
Reg_Property_id -- What table is this from?
,a.Bnd_nbr
,account_balance -- What table is this from?
,abs(account_balance) as Bond_Balance -- What table is this from?
,max(a.Bnd_regDate) as Bnd_regDate
into #Jan2014ValidFin
from #Jan2014Valid aa
left join Pr_analytics..bond a
on aa.Reg_Property_id = a.Prop_id
and aa.bnd_nbr = a.Bnd_nbr
where aa.reg_property_id is not null
group by
Reg_Property_id
,a.Bnd_nbr
,account_balance
,abs(account_balance)
Note that if there are no dates (a.Bnd_regDate), you will get NULL
Note also that if any of the values marked "what table is this from" are found in #Jan2014Valid, you will need to either aggregate them (max, sum, etc.) or include them in the group by clause--I can't tell which, from the information provided.

Transpose only certain data in SQL

My data looks like this:
Company Year Total Comment
Comp A 01-01-2000 5,000 Checked
Comp A 01-01-2001 6,000 Checked
Comp B 05-05-2007 3,000 Not checked completely
Comp B 05-05-2008 4,000 Checked
Comp C 18-01-2003 1,500 Not checked completely
Comp C 18-01-2002 3,500 Not checked completely
I've been asked to transpose certain data, but I do not believe this can be done using SQL (Server) so that it looks like this:
Company Base Date Base Date-1 Comment Base Date Comment Base Date-1
Comp A 01-01-2001 01-01-2000 Checked Checked
Comp B 05-05-2008 05-05-2007 Checked Not completely checked
Comp C 18-01-2003 18-01-2002 Not completely checked Not completely checked
I have never built anything like this. If I would then maybe Excel is a better alternative? How should I tackle this?
Is it possible using SELECT MAX(Base Date) and MIN(Base Date)? And how would I then tackle the strings like that..
You can use a self join to do this. However, you should think about dates like February 29 as they only occur in leap years.
select t1.company,t1.year as basedate,t2.year as basedate_1,
t1.comment as comment_basedate,t2.comment as comment_basedate_1
from t t1
left join t t2 on t1.company=t2.company dateadd(year,1,t2.year)=t1.year
Change the left join to an inner join if you only need results where both the date values exist for a company. This solution assumes there can only be one comment per day.
I'd assign a row number to each record partitioned by company ordered by year desc though an analytical function in a common table expression... then use a left self join... on the row number + 1 and company.
This assumes you only want 1 record per company using the 2 most recent years. and if only 1 record exists for a company null values are acceptable for the second year. If not we can change the left join to an inner and eliminate both records...
We use a common table expression (though a inline view would work as well) to assign a row number to each record. That value is then made available in our self join so we don't have to worry about different dates and max values. We then use our RowNumber (RN) and company to join the 2 desired records together. To save on some performance we limit 1 table to RN 1 and the second table to RN 2.
WITH CTE AS (
SELECT *, Row_Number() over (Partition by Company Order by Year Desc) RN FROM TABLE)
SELECT A.Company
, A.Year as Base_Date
, B.Year as Base_Date1
, A.comment as Base_Date_Comment
, B.Comment as Base_Date1_Comment
FROM CTE A
LEFT JOIN CTE B
on A.RN+1 = B.RN
and A.Company = B.Company
and B.RN = 2
WHERE A.RN = 1
Note the limit on RN=2 must be on the join since it's an outer join or we would eliminate the companies without 2 years. (in essence making the left join an inner)
This approach makes all columns of the data available for each row.
If there are only two rows each, then that's pretty simple. If there are more than two rows, you could do something like this -- essentially joining all rows, then making sure A represents the earliest row and B represents the latest row.
SELECT A.Company, A.Year AS [Base Date], B.Year AS [Base Date 1],
A.Comment AS [Comment Base Date], B.Comment AS [Comment Base Date 1]
FROM MyTable A
INNER JOIN MyTable B ON A.Company = B.Company
WHERE A.Year = (SELECT MIN(C.YEAR) FROM MyTable C WHERE C.Company = A.Company)
AND B.Year = (SELECT MAX(C.YEAR) FROM MyTable C WHERE C.Company = B.Company)
There might be a more efficient way to do this with Row_Number or something.

How to include non-matching rows?

This script is working as intended.
select a.Loc, Count(a.PID) as TotalVisit
from AccountCount as a
inner join Data as b
on a.PID = b.PID
where
cast(a.DateTime as date) between cast(b.ADateTime as date) and cast(b.DDateTime as date)
and year(a.DateTime)=2015
and month(a.DateTime)=05
group by a.Loc
order by a.Loc;
However, I need to include few more PID from Data table. These PID is not in AccountCount table.
select LocID, PID
from Data
where
and cast(ADateTime as date) = cast(DDateTime as date)
and year(ADateTime) = 2015
and month(ADateTime)=05
order by LocID;
In simple terms, I need to do union between the first script and the second script. I tried to right join the Data table but it didn't work.
Using the UNION ALL provided by xQbert, I get the result like.
Loc TotalVisit
1st floor 20
2nd floor 5
3rd floor 8
1st floor 2
It needs to be
Loc TotalVisit
1st floor 22
2nd floor 5
3rd floor 8
Please help.
Thank you.
I would think a right join would work so long as the ON criteria is setup correctly and the Where clause is moved to the join (as it makes the right join an inner join again if left in the where clause. (the outer join results in null records which are excluded by the where clause thus negating the outer join))
The union all doesn't allow for the aggregation of data. To me the outer join is the right thing to do here. We just need to understand the data better to make it work correctly. However, using union all you could simply sum up the results... using an outer query... but now that you've given some sample data I might be able to figure out why the outer join wasn't working)
Using union all ... (I'm about getting it working then improving it)
Select X.Loc, sum(X.TotalVisit) as TotalVisit
from (SELECT a.Loc as LOC, Count(a.PID) as TotalVisit
from AccountCount as a
inner join Data as b
on a.PID = b.PID
where
cast(a.DateTime as date) between cast(b.ADateTime as date) and cast(b.DDateTime as date)
group by a.Loc
UNION ALL
select LocID as LOC, count(PID)
from Data
where
and cast(ADateTime as date) = cast(DDateTime as date)
GROUP BY by LocID
) X
GROUP BY X.Loc
ORDER BY X.LOC
This leads me to this... which I think would work Take the first non-null value of location from AccountCount.Loc and Data.LocID and use it. Notice no where clause...
SELECT Coalesce(A.Loc, B.LocID) as Loc, count(B.PID) as TotalVisit
FROM Data B
LEFT JOIN AccountCount A
on B.PID = A.PID
and (cast(a.DateTime as date) between cast(b.ADateTime as date) and cast(b.DDateTime as date)
OR cast(B.ADateTime as date) = cast(B.DDateTime as date))
GROUP BY Coalesce(A.Loc, B.LocID)
Order by Coalesce(A.Loc, B.LocID)

Match two tables based on minimum dates efficiently

I have two tables one which contains quarterly data and one which contains daily data. I would like to join the two tables such that for each day in the daily data the quarterly data for that quarterly is selected and returned daily. I am working with Postgres 9.3.
The current query is as follows:
select
a.ID,
a.datadate,
b.*,
case when a.datadate = b.rdq then 1 else 0 end as VALID
from proj_data a, proj_rat b
where a.id = b.id
and b.rdq = (select min(rdq)
from proj_rat c
where a.id = c.id and a.datadate >= c.rdq);
But it is excruciatingly slow and I need to do this for several thousand IDs. Can anyone suggest a more efficient solution?
This eliminates the need for a subquery in the where clause
select
ID,
a.datadate,
b.*,
(a.datadate = b.rdq)::integer as VALID
from
proj_data a
inner join
(
select distinct on (id, rdq) *
from project_rat
order by id, rdq
) b using(id)
where a.datadate >= b.rdq;