Using INNER JOIN resulting table in another INNER JOIN - sql

I'm not really sure that the title actually corresponds, maybe my approach is wrong.
I have the following database structure:
TABLE producers
id
TABLE data
id
date
value
producer_id OneToMany
First thing first, for each producer, I want to get the latest date of registered data that there is. The code below does exactly this:
SELECT producers.id AS producer_id, max.date AS max_date
FROM producers
INNER JOIN data ON producers.id = data.producer_id
INNER JOIN (
SELECT producer_id, MAX(date) AS date
FROM data
GROUP BY producer_id
) AS max USING (producer_id, date)
And the resulting table is:
----------------------------------------------------
| producer_id | max_date |
----------------------------------------------------
| 5 | 2022-01-01 01:45:00.000 +0000 |
| 7 | 2022-01-01 01:45:00.000 +0000 |
| 14 | 2022-01-01 01:45:00.000 +0000 |
| 15 | 2022-01-01 01:45:00.000 +0000 |
| 17 | 2022-01-01 01:45:00.000 +0000 |
----------------------------------------------------
The next thing that I need is to SUM all the data records per producer WITH date bigger than the max_date we got for each producer after the INNER JOIN from the previous query. The SUM() will be performed on column value.
Hopefully that was clear, if not, let me know. I've tried doing another INNER JOIN and use table max in the WHERE clause but I got an error that told me that the table was there, but it wasn't possible to be used in that part of the query.
Maybe another INNER JOIN isn't the solution. Here I'm limited by my knowledge of SQL and I don't really know about which keywords to read more in-depth to understand what's the best approach and how to do it. So, an info to redirect me on the best path would be really helpful.
Thanks in advance.
EDIT: Forgot to specify on which column the SUM() will be executed on.
EDIT 2: Just realized that what I'm asking here, the result will always be an empty table because there won't ever be a record whose date will be bigger. When I wrote the simplified version of my database, forgot to add a table/join, that's why. But in the end imo the approach/solution will still be the same, just applied on a different table. Sorry for that again.

The first query in the question can be greatly simplified using distinct on and order by:
select distinct on (p.id)
p.id, d.date
from producers p
join data d on p.id = d.producer_id
order by p.id, d.date desc;
As for "SUM all the data records per producer WITH date bigger than the max_date" - well, none exists with date bigger than the latest one. Here is a query to do so (even the result will be empty)
select producer_id, sum(value)
from data d inner join -- the query above follows
(
select distinct on (p.id)
p.id producer_id, d.date
from producers p
join data d on p.id = d.producer_id
order by p.id, d.date desc
) t using (producer_id)
where d.date > t.date
group by producer_id;

Related

SQL Server : getting sum of values in "calendar" table without joining

Is it possible to get a the sum of value from the calendar_table to the main_table without joining like below?
select
date, sum(value)
from
main_table
inner join
calendar_table on start_date <= date and end_date >= date
group by
date
I am trying to avoid a join like this because main_table is a very large table with rows that have very large start and end dates, and it is absolutely killing my performance. And I've already indexed both tables.
Sample desired results:
+-----------+-------+
| date | total |
+-----------+-------+
| 7-24-2010 | 11 |
+-----------+-------+
Sample tables
calendar_table:
+-----------+-------+
| date | value |
+-----------+-------+
| 7-24-2010 | 5 |
| 7-25-2010 | 6 |
| ... | ... |
| 7-23-2020 | 2 |
| 7-24-2020 | 10 |
+-----------+-------+
main_table:
+------------+-----------+
| start_date | end_date |
+------------+-----------+
| 7-24-2010 | 7-25-2010 |
| 8-1-2011 | 8-5-2011 |
+------------+-----------+
You want the sum in the calendar table. So, I would recommend an "incremental" approach. This starts by unpivoting the data and putting the value as an increment and decrement in the results:
select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
union all
select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date;
The final step is to aggregate and do a cumulative sum:
select date, sum(inc) as value_on_date,
sum(sum(inc)) over (order by date) as net_value
from ((select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
) union all
(select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date
)
) c
group by date
order by date;
This is processing two rows of data for each row in the master table. Assuming that your time spans are longer than two days typically for each master row, the resulting data processed should be much smaller. And smaller data implies a faster query.
Here's a cross-apply example to possibly work from.
select main_table.date
, CalendarTable.ValueSum
from main_table
CROSS APPLY(
SELECT SUM(value) as ValueSum
FROM calendar_table
WHERE start_date <= main_table.date and main_table.end_date >= date
) as CalendarTable
group by date
You could try something like this ... but be aware, it is still technically 'joined' to the main table. If you look at an execution plan, you will see that there is a join operation of some kind going on.
select
date,
(select sum(value) from calendar_table t where m.start_date <= t.date and m.end_date >= t.date)
from
main_table m
The thing about that query is that the 'main_table' is not grouped as part of the results. You could possibly do that outside the select, but I don't know what you are trying to achieve. If you are grouping just to get the SUM, then perhaps maintaining the 'main_table' in the group is superflous.
As already mentioned, you must perform a join of some sort in order to get data from more than one table in a query.
You did not provide details if the indexes which are important for performance. I suggest the following indexes to optimize query performance.
For calendar_table, make sure you have a unique clustered index (or primary key) on date. Alternatively, a unique nonclustered index on date with the value column included.
A composite index on the main_table start_date and end_date columns may also be beneficial.
Even with optimal indexes, the query will still take some time against a 500M row table (e.g. a couple of minutes) with no additional filter criteria. If you need results in milliseconds, create an indexed view to materialize the join and aggregation results. Be aware the indexed view will add overhead for inserts/deletes on both tables as well as for updates to the value column in order to keep the index consistent with the underlying data.
Below is an indexed view DDL example.
CREATE VIEW dbo.vw_example
WITH SCHEMABINDING
AS
SELECT
date, sum(value) AS value, COUNT_BIG(*) AS countbig
from
dbo.main_table
inner join
dbo.calendar_table on start_date <= date and end_date >= date
group by
date;
GO
CREATE UNIQUE CLUSTERED INDEX cdx ON dbo.vw_example(date);
GO
Depending on your SQL Server edition, the optimizer may be able to use the indexed view automatically so your original query can use the view index without changes. Otherwise, query the view directly and specify a NOEXPAND hint:
SELECT date, value AS total
FROM dbo.vw_example WITH (NOEXPAND);
EDIT:
With the query improvement #GordonLinoff suggested, a non-clustered index on the main_table end_date column will help optimize that query.

How to filter out conditions based on a group by in JPA?

I have a table like
| customer | profile | status | date |
| 1 | 1 | DONE | mmddyy |
| 1 | 1 | DONE | mmddyy |
In this case, I want to group by on the profile ID having max date. Profiles can be repeated. I've ruled out Java 8 streams as I have many conditions here.
I want to convert the following SQL into JPQL:
select customer, profile, status, max(date)
from tbl
group by profile, customer,status, date, column-k
having count(profile)>0 and status='DONE';
Can someone tell how can I write this query in JPQL if it is correct in SQL? If I declare columns in select it is needed in group by as well and the query results are different.
I am guessing that you want the most recent customer/profile combination that is done.
If so, the correct SQL is:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.customer = t.customer and t2.profile = t.profile
) and
t.status = 'DONE';
I don't know how to convert this to JPQL, but you might as well start with working SQL code.
In your query date column not needed in group by and status='DONE' should be added with where clause
select customer, profile, status, max(date)
from tbl
where status='DONE'
group by profile, customer,status,
having count(profile)>0

How to get value from another row in sql

I have a sql query like the following:
select
,c.customer_leg1
,d.mid
,c.previous_customer_leg1
,c.creation_date
,c.end_date
,c.cid
from table1 c
JOIN table2 d
ON c.cid = d.cid
where c.cid = '1234'
which gives the below output:
customer_leg1 | previous_customer_leg1 | creation_data | end_date | cid
4092 | 1888 | 05/06/17 | 05/07/17 | 735
8915 | 4092 | 05/06/17 | 05/08/17 | 735
I want to add a new column such that for each customer_leg1 where ever we find that in previous_customer_leg1 it should put that row's "end_date" in that column.
For eg: in row 1 of the above output customer_leg1 is 4092 and this is found in row 2 in the previous_customer_leg1, so in row 1, this new_column should have 05/08/17 in it. And for those, where the customer_leg1 doesn't match in previous_customer_leg1, it should be NULL. I think I could maybe use partition and lag function for this, but I'm not very clear on those. Any help will be appreciated. Thanks!
Since you are only showing "the gist" of what you want, perhaps "the gist" of one possible solution is like this:
Add to your "really huge" query another join:
select .....
, c1.end_date as new_column
from table1 c join table2 d on c.cid = d.cid
join talbe1 c1 on c.cid = c1.cid
and c.customer_leg1 = c1.previous_customer_leg1
..................
As I asked in comments, what are the columns on which you will make sure that your next row is displayed correctly, as you cannot guarantee that your output will always be in same order. So assuming that your column for order is creation_date, and you want this to be done on a partition of c.cid, you can add something like below to your select statement to derive this new column, without disturbing the rest of the query.
Disclaimer: If the columns for partition and order are not the same, then this will not work. Please change the columns. But the concept of reading next row for a column and if matched, then display another column from next row is below.
,CASE
WHEN lead (c.previous_customer_leg1,1)
over (partition BY c.cid ORDER BY c.creation_date)
= c.customer_leg1
THEN lead (c.end_date,1) over (partition BY c.cid ORDER BY c.creation_date)
END AS new_column

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

Selecting records from subquery found set (postgres)

I have a query on 2 tables (part, price). The simplified version of this query is:
SELECT price.*
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type = '01'
ORDER BY date DESC
That returns several records:
code | type | date | price | file
-------------+----------+------------------------------------------------------
00065064705 | 01 | 2008-01-07 00:00:00 | 16.400000 | 28SEP2011.zip
00065064705 | 01 | 2007-02-05 00:00:00 | 15.200000 | 20JUL2011.zip
54868278900 | 01 | 2006-02-24 00:00:00 | 16.642000 | 28SEP2011.zip
As you can see, there is code 00065064705 listed twice. I just need the maxdate record (2008-01-07) along with the code, type, date and price for each unique code. So basically the top record for each unique code. This postgres so I can't use SELECT TOP or something like that.
I think I should be using this as subquery inside of a main query but I'm not sure how. something like
SELECT *
FROM price
JOIN (insert my original query here) AS price2 ON price.code = price2.code
Any help would be greatly appreciated.
You can use the row_number() window function to do that.
select *
from (SELECT price.*,
row_number() over (partition by price.code order by price.date desc) as rn
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type='01') x
where rn = 1
ORDER BY date DESC
(*) Note: I may have prefixed some of the columns incorrectly, as I'm not sure which column is in which table. I'm sure you can fix that.
In Postgres you can use DISTINCT ON:
SELECT DISTINCT ON(code) *
FROM price
INNER JOIN parts ON price.code = part.code
WHERE price.type='01'
ORDER BY code, "date" DESC
select distinct on (code)
code, p.type, p.date, p.price, p.file
from
price p
inner join
parts using (code)
where p.type='01'
order by code, p.date desc
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT