How to write a query that would leave 1 row out - sql

I have a set of data that looks like this I want to remove one row for each of the debnrs that has a p in it for type. I don't care which one. The two rows with P in the type are identical except for the date. How would I select just one with a P in the type.
debnr docno date type num amount
4 NULL 2013-08-29 07:26:25.000 P 1761 -12
4 NULL 2013-09-12 00:00:00.000 P 1761 -12
4 168371 2013-08-29 00:00:00.000 I 168371 12
5 NULL 2013-10-11 09:24:58.000 P 7287 -24
5 NULL 2013-10-14 00:00:00.000 P 7287 -24
5 170366 2013-10-11 00:00:00.000 I 170366 24
6 NULL 2013-10-24 00:00:00.000 P 4023 -465
6 NULL 2013-10-24 09:42:18.000 P 4023 -465
6 171095 2013-10-24 00:00:00.000 I 171095 465
7 NULL 2013-12-16 00:00:00.000 P 171502 -394.2
7 NULL 2013-12-16 00:00:00.000 P 6601 -394.2
7 171502 2013-10-30 00:00:00.000 I 171502 394.2
how would I get it to look like this.
4 NULL 2013-09-12 00:00:00.000 P 1761 -12
4 168371 2013-08-29 00:00:00.000 I 168371 12
5 NULL 2013-10-14 00:00:00.000 P 7287 -24
5 170366 2013-10-11 00:00:00.000 I 170366 24
6 NULL 2013-10-24 09:42:18.000 P 4023 -465
6 171095 2013-10-24 00:00:00.000 I 171095 465
7 NULL 2013-12-16 00:00:00.000 P 6601 -394.2
7 171502 2013-10-30 00:00:00.000 I 171502 394.2

Shot in the dark:
select
debnr,
docno,
max(date),
type,
num,
amount
from magical_table
group by
debnr,
docno,
type,
num,
amount

You could GROUP and use an aggregate given your sample above, if however the amount field weren't identical, for instance, then you could use the ROW_NUMBER() function for this to avoid needing an aggregate:
;WITH cte AS (SELECT *
,CASE WHEN TYPE = 'P' THEN ROW_NUMBER() OVER(PARTITION BY debnr ORDER BY (SELECT 1))
ELSE 0
END AS RN
FROM Table1)
SELECT *
FROM cte
WHERE RN <= 1
Demo: SQL Fiddle
The ORDER BY (SELECT 1) could be changed to any field, that's just one way to get an arbitrary result if you don't want a min/max.

Want you line with type "I" ungrouped ?
select debnr, docno, max(date), type, num, amount
from magical_table
where type = "P"
group by debnr, docno, type, num, amount
UNION
select debnr, docno, date, type, num, amount
from magical_table
where type = "I"

Related

Get max date for each from either of 2 columns

I have a table like below
AID BID CDate
-----------------------------------------------------
1 2 2018-11-01 00:00:00.000
8 1 2018-11-08 00:00:00.000
1 3 2018-11-09 00:00:00.000
7 1 2018-11-15 00:00:00.000
6 1 2018-12-24 00:00:00.000
2 5 2018-11-02 00:00:00.000
2 7 2018-12-15 00:00:00.000
And I am trying to get a result set as follows
ID MaxDate
-------------------
1 2018-12-24 00:00:00.000
2 2018-12-15 00:00:00.000
Each value in the id columns(AID,BID) should return the max of CDate .
ex: in the case of 1, its max CDate is 2018-12-24 00:00:00.000 (here 1 appears under BID)
in the case of 2 , max date is 2018-12-15 00:00:00.000 . (here 2 is under AID)
I tried the following.
1.
select
g.AID,g.BID,
max(g.CDate) as 'LastDate'
from dbo.TT g
inner join
(select AID,BID,max(CDate) as maxdate
from dbo.TT
group by AID,BID)a
on (a.AID=g.AID or a.BID=g.BID)
and a.maxdate=g.CDate
group by g.AID,g.BID
and 2.
SELECT
AID,
CDate
FROM (
SELECT
*,
max_date = MAX(CDate) OVER (PARTITION BY [AID])
FROM dbo.TT
) AS s
WHERE CDate= max_date
Please suggest a 3rd solution.
You can assemble the data in a table expression first, and the compute the max for each value is simple. For example:
select
id, max(cdate)
from (
select aid as id, cdate from t
union all
select bid, cdate from t
) x
group by id
You seem to only care about values that are in both columns. If this interpretation is correct, then:
select id, max(cdate)
from ((select aid as id, cdate, 1 as is_a, 0 as is_b
from t
) union all
(select bid as id, cdate, 1 as is_a, 0 as is_b
from t
)
) ab
group by id
having max(is_a) = 1 and max(is_b) = 1;

From Change Log Table to Status on a Given Day

I am trying to convert a change log table into a historical status table using BigQuery's Standard SQL.
The part giving me a hang up is how to select the most recent change log that is before the date to join on.
I had not encountered window functions or indexing during my college years, so I would appreciate guidance on how to apply those functions if they're part of the ideal solution.
Change_Logs table
Update Key Tostring
1 2019-01-30 17:57:51.910 PS-5864 To Do
2 2019-02-11 20:59:08.582 PS-5864 In Progress
3 2019-02-12 19:52:18.733 PS-5864 Done
4 2019-01-31 16:52:12.832 PS-4672 To Do
5 2019-02-11 14:11:13.442 PS-4672 In Progress
6 2019-02-12 04:22:33.111 PS-4672 Done
Dates table
Date
1 2019-02-10
2 2019-02-11
3 2019-02-12
4 2019-02-13
Desired Result:
Date Key Status
1 2019-02-10 00:00:00.000 PS-5864 To Do
2 2019-02-10 00:00:00.000 PS-4672 To Do
3 2019-02-11 00:00:00.000 PS-5864 To Do
4 2019-02-11 00:00:00.000 PS-4672 To Do
5 2019-02-12 00:00:00.000 PS-5864 In Progress
6 2019-02-12 00:00:00.000 PS-4672 In Progress
7 2019-02-13 00:00:00.000 PS-5864 Done
8 2019-02-13 00:00:00.000 PS-4672 Done
Below is for BigQuery Standard SQL
#standardSQL
SELECT d.date, key,
ARRAY_AGG(status ORDER BY l.update DESC LIMIT 1)[OFFSET(0)] status
FROM `project.dataset.dates` d
JOIN `project.dataset.change_logs` l
ON DATE_DIFF(d.date, DATE(l.update), DAY) > 0
GROUP BY d.date, key
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.change_logs` AS (
SELECT DATETIME '2019-01-30 17:57:51.910' `update`, 'PS-5864' key, 'To Do' status UNION ALL
SELECT '2019-02-11 20:59:08.582', 'PS-5864', 'In Progress' UNION ALL
SELECT '2019-02-12 19:52:18.733', 'PS-5864', 'Done' UNION ALL
SELECT '2019-01-31 16:52:12.832', 'PS-4672', 'To Do' UNION ALL
SELECT '2019-02-11 14:11:13.442', 'PS-4672', 'In Progress' UNION ALL
SELECT '2019-02-12 04:22:33.111', 'PS-4672', 'Done'
), `project.dataset.dates` AS (
SELECT DATE '2019-02-10' `date` UNION ALL
SELECT '2019-02-11' UNION ALL
SELECT '2019-02-12' UNION ALL
SELECT '2019-02-13'
)
SELECT d.date, key,
ARRAY_AGG(status ORDER BY l.update DESC LIMIT 1)[OFFSET(0)] status
FROM `project.dataset.dates` d
JOIN `project.dataset.change_logs` l
ON DATE_DIFF(d.date, DATE(l.update), DAY) > 0
GROUP BY d.date, key
-- ORDER BY d.date, key
with result
Row date key status
1 2019-02-10 PS-4672 To Do
2 2019-02-10 PS-5864 To Do
3 2019-02-11 PS-4672 To Do
4 2019-02-11 PS-5864 To Do
5 2019-02-12 PS-4672 In Progress
6 2019-02-12 PS-5864 In Progress
7 2019-02-13 PS-4672 Done
8 2019-02-13 PS-5864 Done
The key idea is to generate the rows with a cross join. Then what you really want is lag(. . . ignore nulls) -- but not supported in BigQuery.
Instead, you can do some array manipulation:
select d.date, cl.key,
array_agg(cl.status ignore nulls order by d.date desc limit 2)[ordinal(2)]
from dates d cross join
(select distinct key from change_logs cl) k left join
change_logs cl
on date(cl.update) = d.date and cl.key = k.key;
EDIT:
The above is not quite correct, because we are missing dates that occur before the specified period. I think the simplest method is to add them and then remove them:
select *
from (select d.date, cl.key,
array_agg(cl.status ignore nulls order by d.date desc limit 2)[ordinal(2)]
from (select d.date
from dates d
union
select distinct date(cl.update)
from change_logs
) d cross join
(select distinct key from change_logs cl) k left join
change_logs cl
on date(cl.update) = d.date and cl.key = k.key
)
where date in (select d.date from dates);

How to select only return the first row when multiple rows returned in sql

I have the following data:
Id Week1 Week2 Date
-------------------------------------------------------------------------------
C0935336-B424-E911-8117-005056A82772 201906 201904 2019-02-02 00:00:00.000
18D809B1-8725-E911-8117-005056A82772 201907 201904 2019-02-09 00:00:00.000
C95855A0-9428-E911-8117-005056A82772 201908 201905 2019-02-16 00:00:00.000
5ABE80F6-2531-E911-8117-005056A82772 201909 201905 2019-02-23 00:00:00.000
6B520DE4-9445-E911-8118-005056A82772 201910 201906 2019-03-02 00:00:00.000
ADD0A8D0-EE2E-E911-8117-005056A82772 201911 201906 2019-03-09 00:00:00.000
As you can see, Week2 as duplicate entries and I need to return the first row of each pair of rows returned so that I end up with something similar to this.
Id Week1 Week2 Date
-------------------------------------------------------------------------------
C0935336-B424-E911-8117-005056A82772 201906 201904 2019-02-02 00:00:00.000
C95855A0-9428-E911-8117-005056A82772 201908 201905 2019-02-16 00:00:00.000
6B520DE4-9445-E911-8118-005056A82772 201910 201906 2019-03-02 00:00:00.000
I'm using the following in SQL:
SELECT DISTINCT
ROW_NUMBER() OVER (PARTITION BY Weeks.Week2 ORDER BY Weeks.Week2) AS Row#,
Data.Id, Weeks.Week1, Weeks.Week2, Weeks.Date
FROM
Data
INNER JOIN
Weeks ON Data.WeekN = Weeks.Week1
INNER JOIN
Users ON Data.UserId = Users.UserId
WHERE
Weeks.Week2 IN (SELECT DISTINCT Weeks.Week2
FROM Data
INNER JOIN Weeks ON Data.Week = Weeks.Week1
INNER JOIN Users ON Data.UserId = Users.UserId
WHERE Data.UserId = 1234 AND Weeks.Week1 >= 201907)
ORDER BY
Weeks.Week2
Which introduces a row number for each set or rows returned:
Row# Id Week1 Week2 Date
-----------------------------------------------------------------------------------
1 C0935336-B424-E911-8117-005056A82772 201906 201904 2019-02-02 00:00:00.000
2 18D809B1-8725-E911-8117-005056A82772 201907 201904 2019-02-09 00:00:00.000
1 C95855A0-9428-E911-8117-005056A82772 201908 201905 2019-02-16 00:00:00.000
2 5ABE80F6-2531-E911-8117-005056A82772 201909 201905 2019-02-23 00:00:00.000
1 6B520DE4-9445-E911-8118-005056A82772 201910 201906 2019-03-02 00:00:00.000
2 ADD0A8D0-EE2E-E911-8117-005056A82772 201911 201906 2019-03-09 00:00:00.000
My question is how do I select all the rows where the Row# is 1 ?
As #stickybit mentioned, you can use:
SELECT
Id
, Week1
, Week2
, Date
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY Weeks.Week2 ORDER BY Weeks.Week2) AS Row#
, Data.Id
, Weeks.Week1
, Weeks.Week2
, Weeks.Date
FROM
Data
INNER JOIN Weeks ON Data.WeekN = Weeks.Week1
INNER JOIN Users ON Data.UserId = Users.UserId
WHERE Weeks.Week2 IN
(
SELECT DISTINCT Weeks.Week2
FROM
Data
INNER JOIN Weeks ON Data.Week = Weeks.Week1
INNER JOIN Users ON Data.UserId = Users.UserId
WHERE
Data.UserId = 1234
AND Weeks.Week1 >= 201907
)
) Q
WHERE Row# = 1
You don't need to worry about the ORDER BY, as the ROW_NUMBER() function is taking care of that for you in its OVER() clause.
You also don't need DISTINCT, as the ROW_NUMBER() function will prevent it from having any effect anyway.

How to get single result when a column has the same value but the second column have different value

I have this table
Id VendorId ClaimRequestDate
1 5 2017-12-14 00:00:00.000
2 5 2018-02-02 00:00:00.000
7 5 2018-02-07 11:08:25.257
I want my result to show only the latest date for each VendorId starting from date later than 2 Feb 2018
what I've done now
SELECT DISTINCT
[Project1].[Id] AS [Id],
[Project1].[VendorId] AS [VendorId],
[Project1].[ClaimRequestDate] AS [ClaimRequestDate]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[VendorId] AS [VendorId],
[Extent1].[ClaimRequestDate] AS [ClaimRequestDate]
FROM [dbo].[Claim] AS [Extent1]
WHERE [Extent1].[ClaimRequestDate] >= '2018-02-02 00:00:00.000'
) AS [Project1]
ORDER BY [Project1].[ClaimRequestDate] DESC
But my result is
Id VendorId ClaimRequestDate
7 5 2018-02-07 11:08:25.257
2 5 2018-02-02 00:00:00.000
Can someone help me with this
There are tree problem in your query. one is
WHERE [Extent1].[ClaimRequestDate] >= '2018-02-02 00:00:00.000'
Row is >= should be ">" second one is your query gain all rows from date later than 2018-02-02 If a vendorId has more than a value Query would return you can try this
SELECT * FROM Claim c
where ClaimRequestDate IN (select MAX(ClaimRequestDate) from claim c1
where c.vendorId =c1.vendorId and c1.Claimrequestdate >'2018.02.02')
Third is this query when your vendorId has more than same max(Claimrequestdate) would return all of them
Id VendorId ClaimRequestDate
1 5 2017-12-14 00:00:00.000
2 5 2018-02-02 00:00:00.000
7 5 2018-02-07 11:08:25.257
8 5 2018-02-07 11:08:25.257
returns
Id VendorId ClaimRequestDate
7 5 2018-02-07 11:08:25.257
8 5 2018-02-07 11:08:25.257
For these reason I suggest this query for use
SELECT * FROM Claim c
where CAST(ClaimRequestDate AS VARCHAR)+ CAST(ID AS VARCHAR) IN (select
MAX(CAST(ClaimRequestDate AS VARCHAR)+ CAST(ID AS VARCHAR)) from claim c1
where c.vendorId =c1.vendorId and c1.Claimrequestdate >'2018.02.02'
)
Try the following SQL:
select aa.* from [Claim] as aa inner join
(
select [VendorId], max([Id]) as maxId from [Claim]
where [ClaimRequestDate] >= '2018-02-03 00:00:00'
group by [VendorId]
) as bb on aa.[Id] = bb.[maxId]

Calculate discount between weeks

I have a table containing product price data, like that:
ProductId RecordDate Price
46 2015-01-17 14:35:05.533 112.00
47 2015-01-17 14:35:05.533 88.00
45 2015-01-17 14:35:05.533 134.00
I have been able to group data by week and product, with this query:
SET DATEFIRST 1;
SELECT DATEADD(WEEK, DATEDIFF(WEEK, 0, [RecordDate]), 0) AS [Week], ProductId, MIN([Price]) AS [MinimumPrice]
FROM [dbo].[ProductPriceHistory]
GROUP BY DATEADD(WEEK, DATEDIFF(WEEK, 0, [RecordDate]), 0), ProductId
ORDER BY ProductId, [Week]
obtaining this result:
Week Product Price
2015-01-12 00:00:00.000 1 99.00
2015-01-19 00:00:00.000 1 98.00
2015-01-26 00:00:00.000 1 95.00
2015-02-02 00:00:00.000 1 95.00
2015-02-09 00:00:00.000 1 95.00
2015-02-16 00:00:00.000 1 95.00
2015-02-23 00:00:00.000 1 80.00
2015-03-02 00:00:00.000 1 97.00
2015-03-09 00:00:00.000 1 85.00
2015-01-12 00:00:00.000 2 232.00
2015-01-19 00:00:00.000 2 233.00
2015-01-26 00:00:00.000 2 194.00
2015-02-02 00:00:00.000 2 194.00
2015-02-09 00:00:00.000 2 199.00
2015-02-16 00:00:00.000 2 199.00
2015-02-23 00:00:00.000 2 199.00
2015-03-02 00:00:00.000 2 214.00
Now for each product I'd like to get the difference between the last two week values, so that I can calculate the discount. I don't know how to write this as a SQL Query!
EDIT:
Expected output would be something like that:
Product Price
1 -12.00
2 15.00
Thank you!
since you are using Sql Server 2014 you can use LAG or LEAD window function to do this.
Generate Row number to find the last two weeks for each product.
;WITH cte
AS (SELECT *,
Row_number()OVER(partition BY product ORDER BY weeks DESC)rn
FROM Yourtable)
SELECT product,
price
FROM (SELECT product,
Price=price - Lead(price)OVER(partition BY product ORDER BY rn)
FROM cte a
WHERE a.rn <= 2) A
WHERE price IS NOT NULL
SQLFIDDLE DEMO
Traditional solution, can be used before Sql server 2012
;WITH cte
AS (SELECT *,
Row_number()OVER(partition BY product
ORDER BY weeks DESC)rn
FROM Yourtable)
SELECT a.Product,
b.Price - a.Price
FROM cte a
LEFT JOIN cte b
ON a.Product = b.Product
AND a.rn = b.rn + 1
WHERE a.rn <= 2
AND b.Product IS NOT NULL