Match two tables based on minimum dates efficiently - sql

I have two tables one which contains quarterly data and one which contains daily data. I would like to join the two tables such that for each day in the daily data the quarterly data for that quarterly is selected and returned daily. I am working with Postgres 9.3.
The current query is as follows:
select
a.ID,
a.datadate,
b.*,
case when a.datadate = b.rdq then 1 else 0 end as VALID
from proj_data a, proj_rat b
where a.id = b.id
and b.rdq = (select min(rdq)
from proj_rat c
where a.id = c.id and a.datadate >= c.rdq);
But it is excruciatingly slow and I need to do this for several thousand IDs. Can anyone suggest a more efficient solution?

This eliminates the need for a subquery in the where clause
select
ID,
a.datadate,
b.*,
(a.datadate = b.rdq)::integer as VALID
from
proj_data a
inner join
(
select distinct on (id, rdq) *
from project_rat
order by id, rdq
) b using(id)
where a.datadate >= b.rdq;

Related

selecting result from two tables in form clause

Hi i have two table A and B.A has 6 rows and b has 7 rows.Both tables have common value in name column.All the 6 rows of a table is present in b table on name column.
When i write query select * from a,b where a.name = b.name i get 14 rows returned i was expecting an inner join of with 6 rows in result.
Please explain me how query works when we have two tables in form clause.
Table A
Table B
query is
select * from a,b where a.tt = b.tt and a.nename=b.nename;
reuslt is
You've got duplicates in both tables (except for {2, 2017-03-04 03:00:00} which has three copies) which is why you get 14 = (2 * 4) + (2 * 3).
It's very hard to make sense of duplicate data. It's even harder to do when it duplicated on both sides of a join.
You could do something like
With fixedA (SELECT
*,
row_number() over (partition by nename, tt order by nename) rn
FROM
A),
fixedb (SELECT
*,
row_number() over (partition by nename, tt order by nename) rn
FROM
B)
SELECT *
FROM fixedA a full outer join fixedb b
on a.neName = b.neName
and a.tt = b.tt
and a.rn = b.rn
This will however leave one B record with a Null A record
The row_number also seems to do what cellID does so you could just do
SELECT *
FROM a full outer join b
on a.neName = b.neName
and a.tt = b.tt
and a.cellID = b.cellID
you should be doing something like full outer join on that table that you need result set from I would suggest something like this
select * from a full outer join b on a.tt = b.tt and a.nename=b.nename;
if your dealing with a bigger data set join on data type like varchar might take a lot of time to load the result set due to comparison. So, it would be better to use foreign key or primary key joins
https://www.w3schools.com/sql/sql_join_full.asp

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

SQL Inner Join using Distinct and Order by Desc

table a.
Table b . I have two tables. Table A has over 8000+ records and continues to grow with time.
Table B has only 5 or so records and grows rarely but does grow sometimes.
I want to query Table A's last records where the Id for Table A matches for Table B. The problem is; I am getting all the rows from Table A. I just need the ones where Table A and B match once. These are unique Id's when a new row is inserted into table B and never get repeated.
Any help is most appreciated.
SELECT a.nshift,
a.loeeworkcellid,
b.loeeconfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM oeeworkcell a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
ORDER BY a.loeeworkcellid DESC
I am assuming you want to get the only the lastest (as you said) row from the TableA but JOIN giving you all the rows.You can use the Row_Number() to get the rownumber and then apply the join and filter it with the Where clause to select only the first row from the JOIN. So what you can try as below,
;WITH CTE
AS
(
SELECT * , ROW_NUMBER() OVER(PARTITION BY loeeconfigworkcellid ORDER BY loeeworkcellid desc) AS Rn
FROM oeeworkcell
)
SELECT a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM CTE a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
WHERE
a.Rn = 1
You need to group by your data and select only the data having the condition with min id.
SELECT a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM oeeworkcell a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
group by
a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
having a.loeeworkcellid = min(a.loeeworkcellid)

SQL Query linking tables, returning the earliest history from one table with the associated result

I have two tables which I need to query. Let's call them table A, and table A_HISTORIES.
Each row from Table A, is linked to multiple rows in A_HISTORIES. What I want to do is to be able to link each row from table A with its earliest history from table A_HISTORIES so something like:-
SELECT A.*
A_HISTORIES.CREATED_DATE
FROM A, A_HISTORIES
WHERE A.ID = A_HISTORIES.A_ID
AND A_HISTORIES.ID = (SELECT max(id) keep (dense_rank first order by CREATED_DATE)
FROM A_HISTORIES)
However, this will only return the row from A/A_HISTORIES that has the earliest CREATED_DATE. Can anyone help me do this per row in A?
Thanks
How about something like this:
SELECT A.*
A_HISTORIES.CREATED_DATE
FROM A
INNER JOIN A_HISTORIES ON A.ID = A_HISTORIES.A_ID
INNER JOIN (SELECT A_ID, MAX(CREATE_DATE) AS max_create_date
FROM A_HISTORIES
GROUP BY A_ID) max_hist ON A_HISTORIES.A_ID = max_hist.A_ID
AND A_HISTORIES.ceate_date = max_create_date

SQL: Left Outer Join, different GROUP BY, need to replicate records ?

I have information about accounts in two tables (A, B).
The records in A are all unique at the account level (account_id), but in table B, accounts are identified by account_id and month_start_dt, so each account may exist in zero or more months.
The trouble is, when I left outer join A to B so the joined table contains all records from A with the records from B (by account, by month) any account that does not exist in table B for a given month does not have a record for that month.
Desired outcome: If an account does not exist in table B for a given month, create a record for that account in the joined table with month_start_dt and 0 for all variables being selected from B.
As it stands, I can get the join to work where all accounts not appearing in B (not appearing at all, in any month) have 0 values for all variables being selected from B (using nvl(variable, 0) ) but, these accounts only have a single record. They should have one for each month.
Create a temp table with number of records you want for not-existing rows and right join the result of first query.
select tbl.* from ( select * from A left join B on a.col1 = b.col2) tbl join tmpTable on tbl.col2 = tmpTable.zerocol
try this.
I don't see why you need an outer join. This uses Standard SQL's EXCEPT (MINUS in Oracle):
SELECT account_id, month_start_dt, all_variables
FROM B
UNION
(
SELECT account_id, month_start_dt, 0 AS all_variables
FROM A
CROSS JOIN (
SELECT DISTINCT month_start_dt
FROM B
) AS DT1
EXCEPT
SELECT account_id, month_start_dt, 0 AS all_variables
FROM B
);
You could use a tally Calendar table, with months (of several years). See this similar question: How to create a Calender table for 100 years in Sql
And then have:
FROM
A
CROSS JOIN
( SELECT y
, m
FROM Calendar
WHERE ( y = #start_year
AND m >= #start_month
)
OR ( y > #start_year
AND y < #end_year
)
OR ( y = #end_year
AND m <= #end_month
)
) AS C
LEFT JOIN
B
ON B.account_id = A.account_id
AND YEAR(B.start_date) = C.y
AND MONTH(B.start_date) = C.m