SQL deleting rows with duplicate dates conditional upon values in two columns

SQL deleting rows with duplicate dates conditional upon values in two columns - sql

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)

The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5

You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

Related

SQL: How do I combine similar value rows into one, not affecting the rest

Is there a way to merge similar values in the same column and not affect the rest, for example:
I want to sum Amount by Company and ID too.

You cannot get the data you want to display. You will be getting company name being repeated. If you want to dispaly data in the way, where company name is not repeating for subsequent rows, you have to use EXCEL or some other presentation layer tool.
SELECT Company, ID, SUM(Amount)
FROM Table1
GROUP BY Company,ID
+---------+-----+--------+
| Company | ID | Amount |
+---------+-----+--------+
| ABC | 001 | 3 |
| ABC | 002 | 3 |
| DEF | 002 | 10 |
| DEF | 003 | 5 |
+---------+-----+--------+

How to find two consecutive rows sorted by date, containing a specific value?

I have a table with the following structure and data in it:
| ID | Date | Result |
|---- |------------ |-------- |
| 1 | 30/04/2020 | + |
| 1 | 01/05/2020 | - |
| 1 | 05/05/2020 | - |
| 2 | 03/05/2020 | - |
| 2 | 04/05/2020 | + |
| 2 | 05/05/2020 | - |
| 2 | 06/05/2020 | - |
| 3 | 01/05/2020 | - |
| 3 | 02/05/2020 | - |
| 3 | 03/05/2020 | - |
| 3 | 04/05/2020 | - |
I'm trying to write an SQL query (I'm using SQL Server) which returns the date of the first two consecutive negative results for a given ID.
For example, for ID no. 1, the first two consecutive negative results are on 01/05 and 05/05.
The first two consecutive results for ID No. 2 are on 05/05 and 06/05.
The first two consecutive negative results for ID No. 3 are on on 01/05 and 02/05 .
So the query should produce the following result:
| ID | FirstNegativeDate |
|---- |------------------- |
| 1 | 01/05 |
| 2 | 05/05 |
| 3 | 01/05 |
Please note that the dates aren't necessarily one day apart. Sometimes, two consecutive negative tests may be several days apart. But they should still be considered as "consecutive negative tests". In other words, two negative tests are not 'consecutive' only if there is a positive test result in between them.
How can this be done in SQL? I've done some reading and it looks like maybe the PARTITION BY statement is required but I'm not sure how it works.

This is a gaps-and-island problem, where you want the start of the first island of '-'s that contains at least two rows.
I would recommend lead() and aggregation:
select id, min(date) first_negative_date
from (
select t.*, lead(result) over(partition by id order by date) lead_result
from mytable t
) t
where result = '-' and lead_result = '-'
group by id

Use LEAD or LAG functions over ID partition ordered by your Date column.
Then simple check where LEAD/LAG column is equal to Result.
You'll need also to filter the top ones.
The image attached just shows what LEAD/LAG would return

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?

You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Pivot SSRS Dataset

I have a dataset which looks like so
ID | PName | Node | Val |
1 | Tag | Name | XBA |
2 | Tag | Desc | Dec1 |
3 | Tag | unit | Int |
6 | Tag | tids | 100 |
7 | Tag | post | AAA |
1 | Tag | Name | XBB |
2 | Tag | Desc | Des9 |
3 | Tag | unit | Float |
7 | Tag | post | BBB |
6 | Tag | tids | 150 |
I would like the result in my report to be
Name | Desc | Unit | Tids | Post |
XBA | Dec1 | int | 100 | AAA |
XBB | Des9 | Float | 150 | BBB |
I have tried using a SSRS Matrix with
Row: PName
Data: Node
Value: Val
The results were simply one row with Name and next row with desc and next with unit etc. Its not all in the same rows and also the second row was missing. This is possibly because there is no grouping on the dataset.
What is a good way of achieving the expected results?

I would not recommend this for a production scenario but if you need to knock out a report quickly or something you can try this. I would just not feel comfortable that the order of the records you get will always be what you expect.
You COULD try to insert the results of the SP into a table (regular table, temp table, table variable...doesn't matter really as long as you can get an identity column added). Assuming that the rows always come out in the correct order (which is probably not a valid assumption 100% of the time) then add an identity column on the table to get a unique row number for each row. From there you should be able to write some math logic to "group" your values together and then pivot out what you want.
create table #temp (ID int, PName varchar(100), Node varhar(100), Val varchar(100))
insert #temp exec (your stored proc)
alter table #temp add UniqueID int identity
then use UniqueID (modulo on 5 perhaps?) to group records together and then pivot

Unique string table in SQL and replacing index values with string values during query

I'm working on an old SQL Server database that has several tables that look like the following:
|-------------|-----------|-------|------------|------------|-----|
| MachineName | AlarmName | Event | AlarmValue | SampleTime | ... |
|-------------|-----------|-------|------------|------------|-----|
| 3 | 180 | 8 | 6.780 | 2014-02-24 | |
| 9 | 67 | 8 | 1.45 | 2014-02-25 | |
| ... | | | | | |
|-------------|-----------|-------|------------|------------|-----|
There is a separate table in the database that only contains unique strings, as well as the index for each unique string. The unique string table looks like this:
|----------|--------------------------------|
| Id | String |
|----------|--------------------------------|
| 3 | MyMachine |
| ... | |
| 8 | High CPU Usage |
| ... | |
| 67 | 404 Error |
| ... | |
|----------|--------------------------------|
Thus, when we want to get something out of the database, we get the respective rows out, then lookup each missing string based on the index value.
What I'm hoping to do is to replace all of the string indexes with the actual values in a single query without having to do post-processing on the query result.
However, I can't figure out how to do this in a single query. Do I need to use multiple JOINs? I've only been able to figure out how to replace a single value by doing something like -
SELECT UniqueString.String AS "MachineName" FROM UniqueString
JOIN Alarm ON Alarm.MachineName = UniqueString.Id
Any help would be much appreciated!

Yes, you can do multiple joins to the UniqueStrings table, but change the order to start with the table you are reporting on and use unique aliases for the joined table. Something like:
SELECT MN.String AS 'MachineName', AN.String as 'AlarmName' FROM Alarm A
JOIN UniqueString MN ON A.MachineName = MN.Id
JOIN UniqueString AN ON A.AlarmName = AN.Id
etc for any other columns

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL deleting rows with duplicate dates conditional upon values in two columns - sql

The following works with your test data SELECT ID, date, MIN(code), MAX(prog) FROM table GROUP BY date You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query. SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5

You can use min() function: (See the details here) select ID, DATE, min(CODE), max(PROG) from table group by DATE I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

Related

SQL: How do I combine similar value rows into one, not affecting the rest

How to find two consecutive rows sorted by date, containing a specific value?

Returning singular row/value from joined table date based on closest date

Pivot SSRS Dataset

Unique string table in SQL and replacing index values with string values during query

Categories

Resources