How to get the second row by date and by id, without group by (not sure about row number) - sql

My dataset is about sales, each line corresponds to an invoice. It is possible to have 2 registers in the same day for the same customer, if he had bought twice in that day.
As you can see in the image below, the blue square shows us that customer 355122 (id_cliente = customer_id) bought twice (275831N and 275826N invoice's id) in the same day (2020-12-19) (penult_data = second-last date). This query is meant to be a support table, to left join the main table and bring those results.
First of all, i've created a row number just over customer_id (blue arrow, aux), so that I could just join with aux = 2 (that should be the second-last register), but in cases that the customer bought twice that day, the second-last invoice is not the second-last date he bought. I need the second-last DATE. He can buy 1,2,3,4,5 times a day, so I cannot assume a correct aux number to filter.
Then, for some reason, I also created an aux2, it's a row number over customer and date, but it really didn't help. I needed something that would repeat the index for the same date, so that index = 2 would be the second-last date.
I cannot use group by because i'm retrieving the salesman id (penult_vend), the store id (penult_empe), and so on from the second-last date
This is the output of part of the query I'm using (as I said, the support table to left join the main table). I'm filtering to this customer's id.
Does somebody knows any function or method to make this work?
I'm using google big query.
Thanks

Assuming the column penult_data has only date information without time of the day, you can find the second to last "date" and then the last "invoice" on that date by using the DENSE_RANK() and ROW_NUMBER() functions:
dense_rank() over(partition by id_cliente
order by penult_data desc) as rnd,
row_number() over(partition by id_cliente, penult_data
order by penult_nf desc) as rnf,
Then, you can use the filtering condition:
where rnd = 2 and rnf = 1

Related

How Can I Retrieve The Earliest Date and Status Per Each Distinct ID

I have been trying to write a query to perfect this instance but cant seem to do the trick because I am still receiving duplicated. Hoping I can get help how to fix this issue.
SELECT DISTINCT
1.Client
1.ID
1.Thing
1.Status
MIN(1.StatusDate) as 'statdate'
FROM
SAMPLE 1
WHERE
[]
GROUP BY
1.Client
1.ID
1.Thing
1.status
My output is as follows
Client Id Thing Status Statdate
CompanyA 123 Thing1 Approved 12/9/2019
CompanyA 123 Thing1 Denied 12/6/2019
So although the query is doing what I asked and showing the mininmum status date per status, I want only the first status date. I have about 30k rows to filter through so whatever does not run overload the query and have it not run. Any help would be appreciated
Use window functions:
SELECT s.*
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY statdate) as seqnum
FROM SAMPLE s
WHERE []
) s
WHERE seqnum = 1;
This returns the first row for each id.
Use whichever of these you feel more comfortable with/understand:
SELECT
*
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY statusdate) as rn
FROM sample
WHERE ...
) x
WHERE rn = 1
The way that one works is to number all rows sequentially in order of StatusDate, restarting the numbering from 1 every time ID changes. If you thus collect all the number 1's togetyher you have your set of "first records"
Or can coordinate a MIN:
SELECT
*
FROM
sample s
INNER JOIN
(SELECT ID, MIN(statusDate) as minDate FROM sample WHERE ... GROUP BY ID) mins
ON s.ID = mins.ID and s.StatusDate = mins.MinDate
WHERE
...
This one prepares a list of all the ID and the min date, then joins it back to the main table. You thus get all the data back that was lost during the grouping operation; you cannot simultaneously "keep data" and "throw away data" during a group; if you group by more than just ID, you get more groups (as you have found). If you only group by ID you lose the other columns. There isn't any way to say "GROUP BY id, AND take the MIN date, AND also take all the other data from the same row as the min date" without doing a "group by id, take min date, then join this data set back to the main dataset to get the other data for that min date". If you try and do it all in a single grouping you'll fail because you either have to group by more columns, or use aggregating functions for the other data in the SELECT, which mixes your data up; when groups are done, the concept of "other data from the same row" is gone
Be aware that this can return duplicate rows if two records have identical min dates. The ROW_NUMBER form doesn't return duplicated records but if two records have the same minimum StatusDate then which one you'll get is random. To force a specific one, ORDER BY more stuff so you can be sure which will end up with 1

I want NAV price as per (Today date minus 1) date

I have two tables. One is NAV where product daily new price is updated. Second is TDK table where item wise stock is available.
Now I want to get a summery report as per buyer name where all product wise total will come and from table one latest price will come.
I have tried below query...
SELECT dbo.TDK.buyer, dbo.NAV.Product_Name, sum(dbo.TDK.TD_UNITS) as Units, sum(dbo.TDK.TD_AMT) as 'Amount',dbo.NAV.NAValue
FROM dbo.TDK INNER JOIN
dbo.NAV
ON dbo.TDK.Products = dbo.NAV.Product_Name
group by dbo.TDK.buyer, dbo.NAV.Product_Name, dbo.NAV.NAValue
Imnportant: Common columns in both tables...
Table one NAV has column as Products
Table two TDK has column as Product_Name
If I have NAValue 4 records for one product then this query shows 4 lines with same total.
What I need??
I want this query to show only one line with latest NAValue price.
I want display one more line with Units*NAValue (latest) as "Latest Market Value".
Please guide.
What field contains the quote date? I am assuming you have a DATIME field, quoteDate, in dbo.NAV table and my other assumption is that you only store the Date part (i.e. mid-night, time = 00:00:00).
SELECT
t.buyer,
n.Product_Name,
sum(t.TD_UNITS) as Units,
sum(t.TD_AMT) as 'Amount',
n.NAValue
FROM dbo.TDK t
INNER JOIN dbo.NAV n
ON t.Products = n.Product_Name
AND n.quoteDate > getdate()-2
group by t.buyer, n.Product_Name, n.NAValue, n.QuoteDate
GetDate() will give you the current date and time. Subtracting 2 would get it before yesterday but after the day before yesterday.
Also, add n.quoteDate in your select and group by. Even though you don't need it, in case that one day you have a day of bad data with double record in NAV table, one with midnight time and another with 6 PM time.
Your code looks like SQL Server. I think you just want APPLY:
SELECT t.buyer, n.Product_Name, t.TD_UNITS as Units, t.TD_AMT as Amount, n.NAValue
FROM dbo.TDK t CROSS APPLY
(SELECT TOP (1) n.*
FROM dbo.NAV n
WHERE t.Products = n.Product_Name
ORDER BY ?? DESC -- however you define "latest"
) n;

How to find the distinct records when a value was changed in a table with daily snap shots

I have a table that has a SNAP_EFF_DT (date the record was inserted into the table) field. All records are inserted on a daily basis to record any changes a specific record may have. I want to pull out only the dates and values when a change took place from a previous date.
I am using Teradata SQL Assistant to query this data. This is what I have so far:
SEL DISTINCT MIN(a.SNAP_EFF_DT) as SNAP_EFF_DT, CLIENT_ID, FAVORITE_COLOR
FROM CUSTOMER_TABLE
GROUP BY 2,3;
This does give me the first instance of a change to a specific color. However, if a customer first likes blue on 1/1/2019, then changes to green on 2/1/2019, and then changes back to blue on 3/1/2019 I won't get that last change in the results and will assume their current favorite color is green, when in fact it changed back to blue. I would like a code that returns all 3 changes.
Simply use LAG to compare the current and the previous row's color:
SELECT t.*,
LAG(FAVORITE_COLOR)
OVER (PARTITION BY CLIENT_ID
ORDER BY SNAP_EFF_DT) AS prev_color
FROM CUSTOMER_TABLE AS t
QUALIFY
FAVORITE_COLOR <> prev_color
OR prev_color IS NULL
If your Teradata version doesn't support LAG switch to
MIN(FAVORITE_COLOR)
OVER (PARTITION BY CLIENT_ID
ORDER BY SNAP_EFF_DT
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_color
One method uses JOIN
select ct.*
from CUSTOMER_TABLE ct left join
CUSTOMER_TABLE ctprev
on ctprev.client_id = ct.client_id AND
ctprev.SNAP_EFF_DT = ct.SNAP_EFF_DT - interval '1' day
where ctprev.client_id is null or
(ctprev.FAVORITE_COLOR <> ct.FAVORITE_COLOR or
. . .
);
Note: This assumes that the values are not null, although the logic can be adjusted to handle null values as well.

SQL find nearest date without going over, or return the oldest record

I have a view in SQL Server with prices of items over time. My users will be passing a date variable and I want to return the closest record without going over, or if no such record exists return the oldest record present. For example, with the data below, if the user passes April for item A it will return the March record and for item B it will return the June record.
I've tried a lot of variations with Union All and Order by but keep getting a variety of errors. Is there a way to write this using a Case Statement?
example:
case when min(Month)>Input Date then min(Month)
else max(Month) where Month <= Input Date?
Sincere apologies for attaching sample dataset as an image, I couldn't get it to format right otherwise.
Sample Dataset
You can use SELECT TOP (1) with order by DATE DESC + Item type + date comparison to get the latest. ORDER BY will order records by date, then you get the latest either this month (if exists) or earlier months.
Here's a rough outline of a query (without more of your table it's hard to be exact):
WITH CTE AS
(
SELECT
ITEM,
PRICE,
MIN(ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM) AS MIN_DATE,
MAX(INPUT_DATE<=ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM,ACTUAL_DATE) AS MATCHED_DATE
FROM TABLE
)
SELECT
CTE.ITEM,
CTE.PRICE,
CASE
WHEN
CTE.MATCHED_DATE IS NOT NULL
THEN
CTE.MATCHED_DATE
ELSE
CTE.MIN_DATE
END AS MOSTLY_MATCHED_DATE
FROM CTE
GROUP BY
CTE.ITEM,
CTE.PRICE
The idea is that in a Common Table Expression, you use the PARTITION BY function to identify the key date for each item, record by record, and then you do a test in aggregate to pull either your matched record or your default record.

analyze range and if true tell me

I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data