How to use DISTINCT in one column when SELECTING from many columns

How to use DISTINCT in one column when SELECTING from many columns - sql

I'm trying to select columns from two different views but I only want to use the DISTINCT statement on one specific column. I thought using the GROUP BY statement would work but it's throwing an error.
SELECT DISTINCT
[Act].[ClientId]
, [Ref].[Agency]
, [Act].[FundCode]
, [Act].[VService]
, [Act].[Service]
, [Act].[Attended]
, [Act].[StartDate]
FROM [dbo].[FS_v_CrossReference_ALL] AS [Ref]
INNER JOIN [dbo].[FS_v_Activities] AS [Act] ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] BETWEEN '1/1/2015' AND '12/31/2015'
GROUP BY [Act].[ClientId]
I want to use the DISTINCT statement on [Act].[ClientId]. Is there any way to do this?

Presumably, you want row_number():
SELECT ar.*
FROM (SELECT Act.*, Reg.Agency,
ROW_NUMBER() OVER (PARTITION BY Act.ClientId ORDER BY ACT.StartDate DESC) as seqnum
FROM [dbo].[FS_v_CrossReference_ALL] [Ref] JOIN
[dbo].[FS_v_Activities] Act
ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] >= '2015-01-01' AND
[Act].[StartDate] < '2016-01-01'
) ar
WHERE seqnum = 1;
Particularly note the changes to the date comparisons:
The dates are in standard format (YYYY-MM-DD or YYYYMMDD).
BETWEEN is replaced by two inequalities. This makes the code robust if the date is really a date/time with a time component.

Related

Is it possible to speed up this join?

The values table has millions of rows and doing this join on the indexes is quite slow (7s).
Is it possible to speed this up?
select *
from (
select
instance.Name, Date, Value,
instance.Category, instance.CreatedDate
from ValuesTable ValuesTable
join (
select
max(IDX) ID,
CreatedDate,
max(ModelsTable.Category) Category,
max(InstanceTable.Name) Name,
max(ModelsTable.Region) Country
from InstanceTable
join ModelsTable ModelsTable
on ModelsTable.ModelID = InstanceTable.ModelID
where InstanceTable.RunCategory = 'Scenario'
and CreatedDate = GETDATE()
group by InstanceTable.Name, ModelsTable.Category,
InstanceTable.CreatedDate
) instance on instance.ID=ValuesTable.IDX
) a

First, I would simplify the query a bit by getting rid of the first subquery. And clarify it using table aliases:
select i.Name, i.Date, i.Value, i.Category, i.CreatedDate,
instance.id, instance.CreatedDate, instance.Category,
instance.Name, instancy.Country
from ValuesTable i join
(select max(IDX) as ID, CreatedDate,
max(m.Category) as Category,
max(i.Name) as Name,
max(m.Region) as Country
from InstanceTable i join
ModelsTable m
on m.ModelID = i.ModelID
where i.RunCategory = 'Scenario' and
i.CreatedDate = GETDATE()
group by i.Name, m.Category, i.CreatedDate
) instance
on i.ID = instance.IDX;
Then, this is highly unlikely to do anything useful. The problem is:
i.CreatedDate = GETDATE()
GETDATE() is a non-standard function. But in every database that I know of that supports it, the function returns a time as well as a date. You probably intend:
i.CreatedDate = cast(GETDATE() as date)
or:
i.CreatedDate >= cast(GETDATE() as date)
You probably want to convert it to a date for the aggregation as well.
Then, you want indexes on:
InstanceTable(RunCategory, CreatedDate)
ModelsTable(ModelId)
ValuesTable(id)

sql in but with multiple column

I have final sql query like this
UPDATE Booking
set BookingType='Booked'
where BookingType='Defaulter' and BookingId in(SELECT BookingID
FROM ScheduledDues WHERE projectID=#ProjectId and DueFrom <= GETDATE()
GROUP BY BookingID HAVING MAX(DueTill) =0)
Now what i want is that select column to contains ScheduleDues with order by ScheduleDues desc. but cannot do it because it contains in. how can i do it?
SELECT BookingID,Max(DueTill),ScheduledDueID
FROM ScheduledDues WHERE projectID=30 and DueFrom <= GETDATE()
GROUP BY BookingID ,ScheduledDueID order by ScheduledDueID desc

If I understand your question correctly, is this what you try to achieve?
SELECT
ScheduledDues.BookingID
, ScheduledDues.DueTill
-- or you could type , InnerTable.MaxDueTill
, ScheduledDues.ScheduledDueID
FROM (
SELECT BookingID, Max(DueTill) AS MaxDueTill
FROM ScheduledDues
WHERE projectID=30 and DueFrom <= GETDATE()
GROUP BY BookingID ) InnerTable
JOIN ScheduledDues
ON InnerTable.BookingID = ScheduledDues.BookingID
AND InnerTable.MaxDueTill = ScheduledDues.DueTill
ORDER BY ScheduledDueID DESC
Thus for each booking (that can have multiple Schedule-Ids) you want the schedule-id associated with the maximum due-date. Meaning that for each booking, you get one schedule-id. And you wish to order that schedule-id. (And with the schedule-id being in the original group-by, you got all schedule-ids and not just the one you were looking for.)

SQL Query to Return Count Based on Time

I have an interesting problem and am unsure how to write a query to solve it. Say I have a table named "Cars". It has two columns, CarId (int PK), and ArrivalTime (datetime). As cars enter a space, the arrival time is entered.
What I need to know is this: for each car, how many entered the space in the 24 time period prior to it's arrival.
I'd like to write this without using a cursor, but don't know how I can do it. Any SQL gurus out there with an idea?
Oh - I should mention that the SQL Server version being used is 2005.

You could use a query with either a correlated subquery or a join.
Here's an example of query using a join operation:
SELECT n.CarId
, n.ArrivalDate
, COUNT(p.CarId) AS cnt_previous_arrivals
FROM cars n
LEFT
JOIN cars p
ON p.CarId = n.CarId
AND p.ArrivalDate >= DATEADD(HOUR,-24,n.ArrivalDate)
AND p.ArrivalDate < n.ArrivalDate
GROUP
BY n.CarId
, n.ArrivalDate
To get an equivalent result with correlated subquery, one option:
SELECT n.CarId
, n.ArrivalDate
, ( SELECT SUM(1)
FROM cars p
WHERE p.CarId = n.CarId
AND p.ArrivalDate >= DATEADD(HOUR,-24,n.ArrivalDate)
AND p.ArrivalDate < n.ArrivalDate
) AS cnt_previous_arrivals
FROM cars n
ORDER
BY n.CarId
, n.ArrivalDate

select el1.car_id,
( select count(*)
from entry_log el2
where el2.datetime between DATEADD(day, -1, el1.datetime)
and el1.datetime
and el2.car_id != el1. car_id
)
from entry_log el1
where el1.car_id = :my_car_id

How to self-join table in a way that every record is joined with the "previous" record?

I have a MS SQL table that contains stock data with the following columns: Id, Symbol, Date, Open, High, Low, Close.
I would like to self-join the table, so I can get a day-to-day % change for Close.
I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday's date).
My idea is to do something like this:
select * from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and
t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)
However I do not know if that's the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)
There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008

One option is to use a recursive cte (if I'm understanding your requirements correctly):
WITH RNCTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn
FROM quotes
),
CTE AS (
SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed
FROM RNCTE
WHERE rn = 1
UNION ALL
SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed
FROM CTE c
JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn
)
SELECT * FROM CTE
ORDER BY symbol, date
SQL Fiddle Demo
If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount -- wasn't completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.

Something like this w'd work in SQLite:
SELECT ..
FROM quotes t1, quotes t2
WHERE t1.symbol = t2.symbol
AND t1.date < t2.date
GROUP BY t2.ID
HAVING t2.date = MIN(t2.date)
Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.

Index on (symbol, date)
SELECT *
FROM quotes q_curr
CROSS APPLY (
SELECT TOP(1) *
FROM quotes
WHERE symbol = q_curr.symbol
AND date < q_curr.date
ORDER BY date DESC
) q_prev

You do something like this:
with OrderedQuotes as
(
select
row_number() over(order by Symbol, Date) RowNum,
ID,
Symbol,
Date,
Open,
High,
Low,
Close
from Quotes
)
select
a.Symbol,
a.Date,
a.Open,
a.High,
a.Low,
a.Close,
a.Date PrevDate,
a.Open PrevOpen,
a.High PrevHigh,
a.Low PrevLow,
a.Close PrevClose,
b.Close-a.Close/a.Close PctChange
from OrderedQuotes a
join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1
If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.

You can use option with CTE and ROW_NUMBER ranking function
;WITH cte AS
(
SELECT symbol, date, [Open], [High], [Low], [Close],
ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id
FROM quotes
)
SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close],
ISNULL(c2.[Close] / c1.[Close], 0) AS perc
FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1
ORDER BY c1.symbol, c1.date
For improving performance(avoiding sorting and RID Lookup) use this index
CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])
Simple demo on SQLFiddle

What you had is fine. I don't know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.
select *
from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and t1.date > t2.date
left outer join quotes t3
on t2.symbol = t3.symbol and t2.date > t3.date
where t3.date is null

You could do something like this:
DECLARE #Today DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))
;WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE],
DATEADD(DAY, -1, Date) AS yesterday
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol
AND today.yesterday = yesterday.Date
That way you limit your "today" results, if that's an option.
EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:
DECLARE #Today DATETIME, #PreviousDay DATETIME
SELECT #Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));
SELECT #PreviousDay = MAX(Date) FROM quotes WHERE Date < #Today;
WITH today AS
(
SELECT Id ,
Symbol ,
Date ,
[OPEN] ,
High ,
LOW ,
[CLOSE]
FROM quotes
WHERE date = #today
)
SELECT *
FROM today
LEFT JOIN quotes AS previousday
ON today.Symbol = previousday.Symbol
AND previousday.Date = #PreviousDay

Filling in missing dates DB2 SQL

My initial query looks like this:
select process_date, count(*) batchCount
from T1.log_comments
order by process_date asc;
I need to be able to do some quick analysis for weekends that are missing, but wanted to know if there was a quick way to fill in the missing dates not present in process_date.
I've seen the solution here but am curious if there's any magic hidden in db2 that could do this with only a minor modification to my original query.

Note: Not tested, framed it based on my exposure to SQL Server/Oracle. I guess this gives you the idea though:
*now amended and tested on DB2*
WITH MaxDateQry(MaxDate) AS
(
SELECT MAX(process_date) FROM T1.log_comments
),
MinDateQry(MinDate) AS
(
SELECT MIN(process_date) FROM T1.log_comments
),
DatesData(ProcessDate) AS
(
SELECT MinDate from MinDateQry
UNION ALL
SELECT (ProcessDate + 1 DAY) FROM DatesData WHERE ProcessDate < (SELECT MaxDate FROM MaxDateQry)
)
SELECT a.ProcessDate, b.batchCount
FROM DatesData a LEFT JOIN
(
SELECT process_date, COUNT(*) batchCount
FROM T1.log_comments
) b
ON a.ProcessDate = b.process_date
ORDER BY a.ProcessDate ASC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use DISTINCT in one column when SELECTING from many columns - sql

Related

Is it possible to speed up this join?

sql in but with multiple column

SQL Query to Return Count Based on Time

How to self-join table in a way that every record is joined with the "previous" record?

Filling in missing dates DB2 SQL

Categories

Resources