Is it possible to speed up this join? - sql

The values table has millions of rows and doing this join on the indexes is quite slow (7s).
Is it possible to speed this up?
select *
from (
select
instance.Name, Date, Value,
instance.Category, instance.CreatedDate
from ValuesTable ValuesTable
join (
select
max(IDX) ID,
CreatedDate,
max(ModelsTable.Category) Category,
max(InstanceTable.Name) Name,
max(ModelsTable.Region) Country
from InstanceTable
join ModelsTable ModelsTable
on ModelsTable.ModelID = InstanceTable.ModelID
where InstanceTable.RunCategory = 'Scenario'
and CreatedDate = GETDATE()
group by InstanceTable.Name, ModelsTable.Category,
InstanceTable.CreatedDate
) instance on instance.ID=ValuesTable.IDX
) a

First, I would simplify the query a bit by getting rid of the first subquery. And clarify it using table aliases:
select i.Name, i.Date, i.Value, i.Category, i.CreatedDate,
instance.id, instance.CreatedDate, instance.Category,
instance.Name, instancy.Country
from ValuesTable i join
(select max(IDX) as ID, CreatedDate,
max(m.Category) as Category,
max(i.Name) as Name,
max(m.Region) as Country
from InstanceTable i join
ModelsTable m
on m.ModelID = i.ModelID
where i.RunCategory = 'Scenario' and
i.CreatedDate = GETDATE()
group by i.Name, m.Category, i.CreatedDate
) instance
on i.ID = instance.IDX;
Then, this is highly unlikely to do anything useful. The problem is:
i.CreatedDate = GETDATE()
GETDATE() is a non-standard function. But in every database that I know of that supports it, the function returns a time as well as a date. You probably intend:
i.CreatedDate = cast(GETDATE() as date)
or:
i.CreatedDate >= cast(GETDATE() as date)
You probably want to convert it to a date for the aggregation as well.
Then, you want indexes on:
InstanceTable(RunCategory, CreatedDate)
ModelsTable(ModelId)
ValuesTable(id)

Related

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!
The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run
You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1
Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

How to use DISTINCT in one column when SELECTING from many columns

I'm trying to select columns from two different views but I only want to use the DISTINCT statement on one specific column. I thought using the GROUP BY statement would work but it's throwing an error.
SELECT DISTINCT
[Act].[ClientId]
, [Ref].[Agency]
, [Act].[FundCode]
, [Act].[VService]
, [Act].[Service]
, [Act].[Attended]
, [Act].[StartDate]
FROM [dbo].[FS_v_CrossReference_ALL] AS [Ref]
INNER JOIN [dbo].[FS_v_Activities] AS [Act] ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] BETWEEN '1/1/2015' AND '12/31/2015'
GROUP BY [Act].[ClientId]
I want to use the DISTINCT statement on [Act].[ClientId]. Is there any way to do this?
Presumably, you want row_number():
SELECT ar.*
FROM (SELECT Act.*, Reg.Agency,
ROW_NUMBER() OVER (PARTITION BY Act.ClientId ORDER BY ACT.StartDate DESC) as seqnum
FROM [dbo].[FS_v_CrossReference_ALL] [Ref] JOIN
[dbo].[FS_v_Activities] Act
ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] >= '2015-01-01' AND
[Act].[StartDate] < '2016-01-01'
) ar
WHERE seqnum = 1;
Particularly note the changes to the date comparisons:
The dates are in standard format (YYYY-MM-DD or YYYYMMDD).
BETWEEN is replaced by two inequalities. This makes the code robust if the date is really a date/time with a time component.

Using a date field for matching SQL Query

I'm having a bit of an issue wrapping my head around the logic of this changing dimension. I would like to associate these two tables below. I need to match the Cost - Period fact table to the cost dimension based on the Id and the effective date.
As you can see - if the month and year field is greater than the effective date of its associated Cost dimension, it should adopt that value. Once a new Effective Date is entered into the dimension, it should use that value for any period greater than said date going forward.
EDIT: I apologize for the lack of detail but the Cost Dimension will actually have a unique Index value and the changing fields to reference for the matching would be Resource, Project, Cost. I tried to match the query you provided with my fields, but I'm getting the incorrect output.
FYI: Naming convention change: EngagementId is Id, Resource is ConsultantId, and Project is ProjectId
I've changed the images below and here is my query
,_cte(HoursWorked, HoursBilled, Month, Year, EngagementId, ConsultantId, ConsultantName, ProjectId, ProjectName, ProjectRetainer, RoleId, Role, Rate, ConsultantRetainer, Salary, amount, EffectiveDate)
as
(
select sum(t.Duration), 0, Month(t.StartDate), Year(t.StartDate), t.EngagementId, c.ConsultantId, c.ConsultantName, c.ProjectId, c.ProjectName, c.ProjectRetainer, c.RoleId, c.Role, c.Rate, c.ConsultantRetainer,
c.Salary, 0, c.EffectiveDate
from timesheet t
left join Engagement c on t.EngagementId = c.EngagementId and Month(c.EffectiveDate) = Month(t.EndDate) and Year(c.EffectiveDate) = Year(t.EndDate)
group by Month(t.StartDate), Year(t.StartDate), t.EngagementId, c.ConsultantName, c.ConsultantId, c.ProjectId, c.ProjectName, c.ProjectRetainer, c.RoleId, c.Role, c.Rate, c.ConsultantRetainer,
c.Salary, c.EffectiveDate
)
select * from _cte where EffectiveDate is not null
union
select _cte.HoursWorked, _cte.HoursBilled, _cte.Month, _cte.Year, _cte.EngagementId, _cte.ConsultantId, _cte.ConsultantName, _cte.ProjectId, _Cte.ProjectName, _cte.ProjectRetainer, _cte.RoleId, _cte.Role, sub.Rate, _cte.ConsultantRetainer,_cte.Salary, _cte.amount, sub.EffectiveDate
from _cte
outer apply (
select top 1 EffectiveDate, Rate
from Engagement e
where e.ConsultantId = _cte.ConsultantId and e.ProjectId = _cte.ProjectId and e.RoleId = _cte.RoleId
and Month(e.EffectiveDate) < _cte.Month and Year(e.EffectiveDate) < _cte.Year
order by EffectiveDate desc
) sub
where _cte.EffectiveDate is null
Example:
I'm struggling with writing the query that goes along with this. At first I attempted to partition by greatest date. However, when I executed the join I got the highest effective date for every single period (even those prior to the effective date).
Is this something that can be accomplished in a query or should I be focusing on incremental updates of the destination table so that any effective date / time period in the past is left alone?
Any tips would be great!
Thanks,
Channing
Try this one:
; with _CTE as(
select p.* , c.EffectiveDate, c.Cost
from period p
left join CostDimension c on p.id = c.id and p.Month = DATEPART(month, c.EffectiveDate) and p.year = DATEPART (year, EffectiveDate)
)
select * from _CTE Where EffectiveDate is not null
Union
select _CTE.id, _CTE.Month, _CTE.Year, sub.EffectiveDate, sub.Cost
from _CTE
outer apply (select top 1 EffectiveDate, Cost
from CostDimension as cd
where cd.Id = _CTE.id and cd.EffectiveDate < DATETIMEFROMPARTS(_CTE.Year, _CTE.Month, 1, 0, 0, 0, 0)
order by EffectiveDate desc
) sub
where _Cte.EffectiveDate is null

How to self-join table in a way that every record is joined with the "previous" record?

I have a MS SQL table that contains stock data with the following columns: Id, Symbol, Date, Open, High, Low, Close.
I would like to self-join the table, so I can get a day-to-day % change for Close.
I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday's date).
My idea is to do something like this:
select * from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and
t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)
However I do not know if that's the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)
There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008
One option is to use a recursive cte (if I'm understanding your requirements correctly):
WITH RNCTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn
FROM quotes
),
CTE AS (
SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed
FROM RNCTE
WHERE rn = 1
UNION ALL
SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed
FROM CTE c
JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn
)
SELECT * FROM CTE
ORDER BY symbol, date
SQL Fiddle Demo
If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount -- wasn't completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.
Something like this w'd work in SQLite:
SELECT ..
FROM quotes t1, quotes t2
WHERE t1.symbol = t2.symbol
AND t1.date < t2.date
GROUP BY t2.ID
HAVING t2.date = MIN(t2.date)
Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.
Index on (symbol, date)
SELECT *
FROM quotes q_curr
CROSS APPLY (
SELECT TOP(1) *
FROM quotes
WHERE symbol = q_curr.symbol
AND date < q_curr.date
ORDER BY date DESC
) q_prev
You do something like this:
with OrderedQuotes as
(
select
row_number() over(order by Symbol, Date) RowNum,
ID,
Symbol,
Date,
Open,
High,
Low,
Close
from Quotes
)
select
a.Symbol,
a.Date,
a.Open,
a.High,
a.Low,
a.Close,
a.Date PrevDate,
a.Open PrevOpen,
a.High PrevHigh,
a.Low PrevLow,
a.Close PrevClose,
b.Close-a.Close/a.Close PctChange
from OrderedQuotes a
join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1
If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.
You can use option with CTE and ROW_NUMBER ranking function
;WITH cte AS
(
SELECT symbol, date, [Open], [High], [Low], [Close],
ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id
FROM quotes
)
SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close],
ISNULL(c2.[Close] / c1.[Close], 0) AS perc
FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1
ORDER BY c1.symbol, c1.date
For improving performance(avoiding sorting and RID Lookup) use this index
CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])
Simple demo on SQLFiddle
What you had is fine. I don't know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.
select *
from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and t1.date > t2.date
left outer join quotes t3
on t2.symbol = t3.symbol and t2.date > t3.date
where t3.date is null
You could do something like this:
DECLARE #Today DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))
;WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE],
DATEADD(DAY, -1, Date) AS yesterday
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol
AND today.yesterday = yesterday.Date
That way you limit your "today" results, if that's an option.
EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:
DECLARE #Today DATETIME, #PreviousDay DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));
SELECT #PreviousDay = MAX(Date) FROM quotes WHERE Date < #Today;
WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE]
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes AS previousday
ON today.Symbol = previousday.Symbol
AND previousday.Date = #PreviousDay

Group By column throwing off query

I have a query that checks a database to see if a customer has visited multiple times a day. If they have it counts the number of visits, and then tells me what times they visited. The problem is it throws "Tickets.lcustomerid" into the group by clause, causing me to miss 5 records (Customers without barcodes). How can I change the below query to remove "tickets.lcustomerid" from the group by clause... If I remove it I get an error telling me "Tickets.lCustomerID" is not a valid select because it's not part of an aggregate or groupby clause.
The Query that works:
SELECT Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME) AS dtCreatedDate, COUNT(Customers.sBarcode) AS [Number of Scans],
MAX(Customers.sLastName) AS LastName
FROM Tickets INNER JOIN
Customers ON Tickets.lCustomerID = Customers.lCustomerID
WHERE (Tickets.dtCreated BETWEEN #startdate AND #enddate) AND (Tickets.dblTotal <= 0)
GROUP BY Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME)
HAVING (COUNT(*) > 1)
ORDER BY dtCreatedDate
The Output is:
sBarcode dtcreated Date Number of Scans slastname
1234 1/4/2013 12:00:00 AM 2 Jimbo
1/5/2013 12:00:00 AM 3 Jimbo2
1578 1/6/2013 12:00:00 AM 3 Jimbo3
My current Query with the subquery
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS sub
WHERE ( lcustomerid = tickets.lcustomerid )
AND ( dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME),
tickets.lcustomerid
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
The Current output (notice the record without a barcode is missing) is:
sBarcode dtcreated Date Number of Scans slastname Times Scanned
1234 1/4/2013 12:00:00 AM 2 Jimbo 12:00PM, 1:00PM
1578 1/6/2013 12:00:00 AM 3 Jimbo3 03:05PM, 1:34PM
UPDATE: Based on our "chat" it seems that customerid is not the unique field but barcode is, even though customer id is the primary key.
Therefore, in order to not GROUP BY customer id in the subquery you need to join to a second customers table in there in order to actually join on barcode.
Try this:
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS subticket
inner join
customers as subcustomers
on
subcustomers.lcustomerid = subticket.lcustomerid
WHERE ( subcustomers.sbarcode = customers.sbarcode )
AND ( subticket.dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME)
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
I can't directly solve your problem because I don't understand your data model or what you are trying to accomplish with this query. However, I can give you some advice on how to solve the problem yourself.
First do you understand exactly what you are trying to accomplish and how the tables fit together? If so move on to the next step, if not, get this knowledge first, you cannot do complex queries without this understanding.
Next break up what you are trying to accomplish in little steps and make sure you have each covered before moving to the rest. So in your case you seem to be missing some customers. Start with a new query (I'm pretty sure this one has more than one problem). So start with the join and the where clauses.
I suspect you may need to start with customers and left join to tickets (which would move the where conditions to the left joins as they are on tickets). This will get you all the customers whether they have tickets or not. If that isn't what you want, then work with the jon and the where clasues (and use select * while you are trying to figure things out) until you are returning the exact set of customer records you need. The reason why you use select * at this stage is to see what in the data may be causeing the problem you are having. That may tell you how to fix.
Usually I start with a the join and then add in the where clasues one at a time until I know I am getting the right inital set of records. If you have multiple joins, do them one at time to know when you suddenly start have more or less records than you would expect.
Then go into the more complex parts. Add each in one at a time and check the results. If you suddenly go from 10 records to 5 or 15, then you have probably hit a problem. When you work one step at a time and run into a problem, you know exactly what caused the problem making it much easier to find and fix.
Group BY is important to understand thoroughly. You must have every non-aggregated field in the group by or it will not work. Think of this as law like the law of gravity. It is not something you can change. However it can be worked around through the use of derived tables or CTEs. Please read up on those a bit if you don't know what they are, they are very useful techniques when you get into complex stuff and you shoud understand them thoroughly. I suspect you will need to use the derived table approach here to group on only the things you need and then join that derived table to the rest of teh query to get the ontehr fields. I'll show a simple example:
select
t1.table1id
, t1.field1
, t1.field2
, a.field3
, a.MostRecentDate
From table1 t1
JOIN
(select t1.table1id, t2.field3, max (datefield) as MostRecentDate
from table1 t1
JOin Table2 t2 on t1.table1id = t2.table1id
Where t2.field4 = 'test'
group by t1.table1id,t2.field3) a
ON a.table1id = t1.table1id
Hope this approach helps you solve this problem.