Sort many ids in a table in SQL Server - sql

I have been given a task which I should look on items table and grab first item of 2019 and last item for 2019 and set the active flags on them as active , the query I wrote only I can grab one by one depends on the store, and it takes days to finish if I have no other choice, here is my query in SQL Server:
SELECT *
FROM NODES
WHERE NODE ID = 5562
AND DATE BETWEEN '2019/01/01' AND '2019/12/30'
Basically I need the first and the last item for the year, but the problem is every Node is a specific store which has many record and I have run the query for million of records in many Nodes, is it possible if I for example say OK SQL from the given nodes take first and last item for 2019 and display to me and then update their active flag = 'Y'
Is it possible with a CTE, do I need a CTE at all?
Thank you

If I understood correctly, you could try using a CTE with a windowed function to fetch only the first row from each store after ordering by date in ascending order and the first row from each store after ordering by date in descending order.
For instance :
CREATE TABLE NODES (NodeId int,NodeDate DATETIME2,status NVARCHAR(128))
INSERT INTO NODES(NodeId,NodeDate,Status) VALUES
(1,'2019/01/01','inactive'),
(1,'2019/03/01','inactive'),
(1,'2019/06/01','inactive'),
(1,'2019/09/01','inactive'),
(1,'2019/12/01','inactive'),
(2,'2019/01/01','inactive'),
(2,'2019/03/01','inactive'),
(2,'2019/06/01','inactive'),
(2,'2019/09/01','inactive'),
(2,'2019/12/01','inactive'),
(3,'2019/01/01','inactive'),
(3,'2019/03/01','inactive'),
(3,'2019/06/01','inactive'),
(3,'2019/09/01','inactive'),
(3,'2019/12/01','inactive')
;WITH cte AS
(
SELECT status,
ROW_NUMBER() OVER (PARTITION BY NodeId ORDER BY NodeDate ASC) AS FirstDate,
ROW_NUMBER() OVER (PARTITION BY NodeId ORDER BY NodeDate DESC) AS LastDate
FROM NODES
WHERE NodeDate >= '2019/01/01' AND NodeDate < '2020/01/01'
)
UPDATE CTE SET status = 'active'
WHERE FirstDate = 1 OR LastDate = 1
SELECT * FROM NODES
Try it online
Please do note however that this operation can be non deterministic if multiple rows have the same date.
See also :
Get top 1 row of each group

Related

How Can I Retrieve The Earliest Date and Status Per Each Distinct ID

I have been trying to write a query to perfect this instance but cant seem to do the trick because I am still receiving duplicated. Hoping I can get help how to fix this issue.
SELECT DISTINCT
1.Client
1.ID
1.Thing
1.Status
MIN(1.StatusDate) as 'statdate'
FROM
SAMPLE 1
WHERE
[]
GROUP BY
1.Client
1.ID
1.Thing
1.status
My output is as follows
Client Id Thing Status Statdate
CompanyA 123 Thing1 Approved 12/9/2019
CompanyA 123 Thing1 Denied 12/6/2019
So although the query is doing what I asked and showing the mininmum status date per status, I want only the first status date. I have about 30k rows to filter through so whatever does not run overload the query and have it not run. Any help would be appreciated
Use window functions:
SELECT s.*
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY statdate) as seqnum
FROM SAMPLE s
WHERE []
) s
WHERE seqnum = 1;
This returns the first row for each id.
Use whichever of these you feel more comfortable with/understand:
SELECT
*
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY statusdate) as rn
FROM sample
WHERE ...
) x
WHERE rn = 1
The way that one works is to number all rows sequentially in order of StatusDate, restarting the numbering from 1 every time ID changes. If you thus collect all the number 1's togetyher you have your set of "first records"
Or can coordinate a MIN:
SELECT
*
FROM
sample s
INNER JOIN
(SELECT ID, MIN(statusDate) as minDate FROM sample WHERE ... GROUP BY ID) mins
ON s.ID = mins.ID and s.StatusDate = mins.MinDate
WHERE
...
This one prepares a list of all the ID and the min date, then joins it back to the main table. You thus get all the data back that was lost during the grouping operation; you cannot simultaneously "keep data" and "throw away data" during a group; if you group by more than just ID, you get more groups (as you have found). If you only group by ID you lose the other columns. There isn't any way to say "GROUP BY id, AND take the MIN date, AND also take all the other data from the same row as the min date" without doing a "group by id, take min date, then join this data set back to the main dataset to get the other data for that min date". If you try and do it all in a single grouping you'll fail because you either have to group by more columns, or use aggregating functions for the other data in the SELECT, which mixes your data up; when groups are done, the concept of "other data from the same row" is gone
Be aware that this can return duplicate rows if two records have identical min dates. The ROW_NUMBER form doesn't return duplicated records but if two records have the same minimum StatusDate then which one you'll get is random. To force a specific one, ORDER BY more stuff so you can be sure which will end up with 1

How to find the distinct records when a value was changed in a table with daily snap shots

I have a table that has a SNAP_EFF_DT (date the record was inserted into the table) field. All records are inserted on a daily basis to record any changes a specific record may have. I want to pull out only the dates and values when a change took place from a previous date.
I am using Teradata SQL Assistant to query this data. This is what I have so far:
SEL DISTINCT MIN(a.SNAP_EFF_DT) as SNAP_EFF_DT, CLIENT_ID, FAVORITE_COLOR
FROM CUSTOMER_TABLE
GROUP BY 2,3;
This does give me the first instance of a change to a specific color. However, if a customer first likes blue on 1/1/2019, then changes to green on 2/1/2019, and then changes back to blue on 3/1/2019 I won't get that last change in the results and will assume their current favorite color is green, when in fact it changed back to blue. I would like a code that returns all 3 changes.
Simply use LAG to compare the current and the previous row's color:
SELECT t.*,
LAG(FAVORITE_COLOR)
OVER (PARTITION BY CLIENT_ID
ORDER BY SNAP_EFF_DT) AS prev_color
FROM CUSTOMER_TABLE AS t
QUALIFY
FAVORITE_COLOR <> prev_color
OR prev_color IS NULL
If your Teradata version doesn't support LAG switch to
MIN(FAVORITE_COLOR)
OVER (PARTITION BY CLIENT_ID
ORDER BY SNAP_EFF_DT
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_color
One method uses JOIN
select ct.*
from CUSTOMER_TABLE ct left join
CUSTOMER_TABLE ctprev
on ctprev.client_id = ct.client_id AND
ctprev.SNAP_EFF_DT = ct.SNAP_EFF_DT - interval '1' day
where ctprev.client_id is null or
(ctprev.FAVORITE_COLOR <> ct.FAVORITE_COLOR or
. . .
);
Note: This assumes that the values are not null, although the logic can be adjusted to handle null values as well.

SQL find nearest date without going over, or return the oldest record

I have a view in SQL Server with prices of items over time. My users will be passing a date variable and I want to return the closest record without going over, or if no such record exists return the oldest record present. For example, with the data below, if the user passes April for item A it will return the March record and for item B it will return the June record.
I've tried a lot of variations with Union All and Order by but keep getting a variety of errors. Is there a way to write this using a Case Statement?
example:
case when min(Month)>Input Date then min(Month)
else max(Month) where Month <= Input Date?
Sincere apologies for attaching sample dataset as an image, I couldn't get it to format right otherwise.
Sample Dataset
You can use SELECT TOP (1) with order by DATE DESC + Item type + date comparison to get the latest. ORDER BY will order records by date, then you get the latest either this month (if exists) or earlier months.
Here's a rough outline of a query (without more of your table it's hard to be exact):
WITH CTE AS
(
SELECT
ITEM,
PRICE,
MIN(ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM) AS MIN_DATE,
MAX(INPUT_DATE<=ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM,ACTUAL_DATE) AS MATCHED_DATE
FROM TABLE
)
SELECT
CTE.ITEM,
CTE.PRICE,
CASE
WHEN
CTE.MATCHED_DATE IS NOT NULL
THEN
CTE.MATCHED_DATE
ELSE
CTE.MIN_DATE
END AS MOSTLY_MATCHED_DATE
FROM CTE
GROUP BY
CTE.ITEM,
CTE.PRICE
The idea is that in a Common Table Expression, you use the PARTITION BY function to identify the key date for each item, record by record, and then you do a test in aggregate to pull either your matched record or your default record.

Get a Row if within certain time period of other row

I have a SQL statement that I am currently using to return a number of rows from a database:
SELECT
as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE
(ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
I am wanting to do a query that will get the row with the oldest DateScanned from this query and also get another row from the database if there was one that was within a certain period of time from this row (say 5 seconds for an example). The oldest record would be relatively simple by selecting the first record in a descending sort, but how would I also get the second record if it was within a certain time period of the first?
I know I could do this process with multiple queries, but is there any way to combine this process into one query?
The database that I am using is SQL Server 2008 R2.
Also please note that the DateScanned times are just placeholders and I am taking care of that in the application that will be using this query.
Here is a fairly general way to approach it. Get the oldest scan date using min() as a window function, then use date arithmetic to get any rows you want:
select t.* -- or whatever fields you want
from (SELECT as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID,
min(DateScanned) over () as minDateScanned, DateScanned
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
) t
where datediff(second, minDateScanned, DateScanned) <= 5;
I am not really sure of sql server syntax, but you can do something like this
SELECT * FROM (
SELECT
TOP 2
as1.AssetTagID,
as1.TagID,
as1.CategoryID,
as1.Description,
as1.HomeLocationID,
as1.ParentAssetTagID ,
ar.DateScanned,
LAG(ar.DateScanned) OVER (order by ar.DateScanned desc) AS lagging
FROM
Assets AS as1
INNER JOIN AssetsReads AS ar
ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
ORDER BY
ar.DateScanned DESC
)
WHERE
lagging IS NULL or DateScanned - lagging < '5 SECONDS'
I have tried to sort the results by DateScanned desc and then just the top most 2 rows. I have then used the lag() function on DateScanned field, to get the DateScanned value for the previous row. For the topmost row the DateScanned shall be null as its the first record, but for the second one it shall be value of the first row. You can then compare both of these values to determine whether you wish to display the second row or not
more info on the lagging function: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/

report builder - queries returning incorrect results

I have set a reporting project where I would like to get stats for my tables and later integrate this into a webservice. For the following queries though, I am getting incorrect results and I will note where below:
1 - Get the number of new entries for a given day
SELECT COUNT(*) AS RecordsCount,CAST(FLOOR(CAST(dateadded AS float))
AS datetime)as collectionDate
FROM TFeed GROUP BY CAST(FLOOR(CAST(dateadded AS float))
AS datetime) order by collectionDate
works fine and I am able to put this in a bar graph successfully.
2 - Get the top 10 searchterms with the highest records per searchterm requested by a given client in the last 10 days
SELECT TOP 10 searchterm, clientId, COUNT(*) AS TermResults FROM TFeed
where dateadded > getdate() - 10 GROUP BY
searchterm,clientId order by TermResults desc
does not work
If I do a query in the Database for one of those terms that returns 98 in the report, the result is 984 in the database.
3 - I need to get the number of new records per client for a given day as well.
Also I was wondering if it is possible to put these queries into one report and not individual reports for each query which is not a big deal but having to cut and paste into one doc afterwards is tedious.
Any ideas appreciated
For #2,
WITH tmp as
(
SELECT clientId, searchTerm, COUNT(1) as TermResults,
DENSE_RANK() OVER (partition by clientId
ORDER BY clientId, COUNT(1) DESC) as rnk
FROM TFeed
WHERE dateadded > GETDATE() - 10
GROUP BY clientId, searchterm
)
SELECT *
FROM tmp
WHERE rnk < 11
USE RANK() if you want to skip a rank if there are two matches (if lets say term1 and term2 have the same number of count, they are both rank 1 and the following term will be ranked 3rd instead of 2nd
For #3,
you can define multiple datasets within one report. Then you would just create three charts / table and associate those with their respective datasets