subtract every next column value from previous? - sql

I have a dataset, where somehow the next singular data is added on top of the previous data for one row, and that for every column, which means,
row with ID 1 is the original pure data, but row with e.g ID 10 has added the data from the previous 9 datasets on itself...
what I now want is to get the original pure data for every distinct item, which means for every ID, how can I substract all data from lets say ID, 10? I would have to substract those of the previous one, for ID 9 and so on...
I want to do this either in SQL Server or in Rapidminer, I am working with those tools, any idea?
here is a sample:
ID col1 col2 col3
1 12 2 3
2 15 5 5
3 20 8 8
so the real correct data for Item with ID 3 is not 20, 8, 8 it is (20-15),(8-5),(8-5) so its 5,3,3...
subtract the later from its previous for every item except the first..
1 12 2 3

Try it out with lag series operator, it will work for sure! To get this operator you should install the series extension from the RM marketplace.
What this operator does - he copies the selected attributes and pushes every row of the example set for one point, so row with ID 1 gets a copy with ID 2 etc (you can also specify the value for a lag). Afterwards you can substract one value from another with Generate Attributes.

I think lag() is the answer to your question:
select (case when id = 1 then col
else col - lag(col) over (order by id)
end)
However, sample data would clarify the question.

Within RapidMiner there is the Differentiate operator contained in the Series extension (which is not installed by default and needs to be downloaded from the RapidMiner Marketplace). This can be used to calculate differences between attributes in adjacent examples.

Related

Choosing which rows to sum and average in either SSRS or SQL

ROW column 1 column 2
1 A 1
2 A 1
3 A 3
4 A 1
5 A 2
6 B 1
7 B 3
8 B 1
Pic of table
Lets say I have this table as shown above. I want to be able to average SELECTED values from column 2. Am I able to use any function in SSRS that allows me to select which value to use to average? The end goal is to allow the user to interactively choose which value to average.
For example if I would want to use ("Row 1 + Row 2 + Row 4")/3, or (Row 6 + Row 8)/2, how can I go about letting the end user to choose those values to average?
Is there something that I need to do in SQL first to make it easier in SSRS?
The idea is by using report parameter and dataset filter
Add parameter in SSRS to allow user input of multiple values, set the available values for row-1, row-2, and so on
here for your reference how to add the parameter in SSRS
https://learn.microsoft.com/en-us/sql/reporting-services/report-design/add-change-or-delete-a-report-parameter-report-builder-and-ssrs?view=sql-server-ver15#:~:text=To%20add%20or%20edit%20a,or%20accept%20the%20default%20name.
after you add the parameter, let's say you already have a dataset which is SQL query such as:
SELECT *
FROM the_table
Right click on your dataset, on properties, in the filter tab, add a filter for the column ROW IN parameter that you have made earlier
after you add filter on your dataset, on your report, simply use that dataset and put expression AVG(Column 2)

Populate NULL Values based on Array Formula

New user, so apologies in advance for bad formatting.
Essentially what I'm trying to do is be able to populate the staff_hours column where it equals NULL with the one value that IS NOT NULL. As you can see from the screenshot, there will only be one person who staffs an open cl_hole_staffing_no and as a result will have a start_dt (with time) and end_dt (with time) along with staff_hours. 16 people were offered a shift, and the person in row 15 accepted it is what is going on here.
The ideal output would be the staff_hours column is populated with the amount of time of the one person who ended up taking the open job, so 24.00 in this example. How can I write a formula to do this? I was thinking something like an array function in Excel, but am not sure how to do that in SQL.
Your explanation is a bit confusing about what you are really trying to achieve. However I think that what you really want is just to populate the staff_hours column, which can be achieved with the following:
UPDATE
your_table_name
SET
staff_hours = 24
WHERE
staff_hours is NULL;
EDIT
I get it now. You want to operate with the two dates and extract the amount of hours between them. Since you are in sql-server you can actually define a Computed Column in which you can use the values from other columns to compute the value you want.
You will need to create your table again. (The example below contains only the necessary attributes for it to work)
CREATE TABLE your_table_name
( id INT IDENTITY (1,1) NOT NULL
, staff_start_dt DATETIME
, staff_end_dt DATETIME
, staff_hours AS DATEDIFF(hh, staff_start_dt , staff_end_dt)
);
Now every time you insert a record on the table with both staff_start_dt and staff_end_dt, the column staff_hours will automatically compute the number of hours between the two dates.
[pre]
Code (vb):
A B C
1 10 X X
2 11 A Y
3 12 Y Z
4 13 B
5 14 B
6 15 Z
[/pre]
Assuming that the rows in Col A is Named "datarange"
And your criteria is in C1:C3
The following formula will return an array {10,12,15}
=SMALL(COUNTIF(C1:C3,B1:B6)*datarange, ROW(INDEX(A:A,SUMPRODUCT(--(COUNTIF(C1:C3,B1:B6)=0))+1):INDEX(A:A,ROWS(datarange))))
COUNTIF(C1:C3,B1:B6)*datarange returns {10;0;12;0;0;15}
The segment ROW(INDEX(....):INDEX(...)) returns {4;5;6}, indicating the number of non-zero values.
The SMALL() function then returns the 4th smallest, 5th smallest and 6th smallest values.
One disadvantage with this approach is that you get a sorted sub-list. Perhaps that would work for you.

INFORMATICA Using transformation to get desired target from a single flat file (see pictures)

I just started out using Informatica and currently I am figuring out how to get this to a target output (flat file to Microsoft SSIS):
ID Letter Parent_ID
---- ------ ---------
1 A NULL
2 B 1
3 C 1
4 D 2
5 E 2
6 F 3
7 G 3
8 H 4
9 I 4
From (assuming that this is a comma-delimited flat file):
c1,c2,c3,c4
A,B,D,H
A,B,D,I
A,B,E
A,C,F
A,C,G
EDIT: Where c1 c2 c3 and c4 being a header.
EDIT: A more descriptive representation of what I want to acheive:
EDIT: Here is what I have so far (Normalizer for achieving the letter column and Sequence Generator for ID)
Thanks in advance.
I'd go with a two-phased approach. Here's the general idea (not a full, step-by-step solution).
Perform pivot to get all values in separate rows (eg. from "A,B,D,H" do a substring and union the data to get four rows)
Perform sort with distinct and insert into target to get IDs assigned. End of mapping one.
In mapping two add a Sequence to add row numbers
Do the pivot again
Use expression variable to refer previous row and previous RowID (How do I get previous row?)
If current RowID doesn't match previous RowID, this is a top node and has no parent.
If previous row exists and the RowID is matching, previous row is a parent. Perform a lookup to get it's ID from DB and use as Parent_ID. Send update to DB.

What is the best way to reassign ordinal number of a move operation

I have a column in the sql server called "Ordinal" that is used to indicate the display order of the rows. It starts from 0 and skips 10 for the next row. so we have something like this:
Id Ordinal
1 0
2 20
3 10
It skips 10 because we wanted to be able to move item in between items (based on ordinal) without having to reassign ordinal number for the entire table.
As you can imagine eventually, Ordinal number will need to be reassign somehow for a move in between operation either on surrounding rows or for the entire table as the unused ordinal numbers between the target items are all used up.
Is there any algorithm that I can use to effectively reorder the ordinal number for the move operation taken in the consideration like long term maintainability of the table and minimizing update operations of the table?
You can re-number the sequences using a somewhat complicated UPDATE statement:
UPDATE u
SET u.sequence = 10 * (c.num_below-1)
FROM test u
JOIN (
SELECT t.id, count(*) AS num_below
FROM test t
JOIN test tr ON tr.sequence <= t.sequence
GROUP BY t.id
) c ON c.id=u.id
The idea is to obtain a count of items with the sequence lower than that of the current row, multiply the count by ten, and assign it as the new count.
The content of test before the UPDATE:
ID Sequence
__ ________
1 0
2 10
3 20
4 12
The content of test after the UPDATE:
ID Sequence
__ ________
1 0
2 30
3 10
4 20
Now the sequence numbers are evenly spread again, so you can continue inserting in the middle until you run out of new sequence numbers; then you can re-number again.
Demo.
These won't answer your question directly--I just thought I might suggest some other approaches:
One possibility--don't try to do it by hand. Have your software manage the numbers. If they need re-writing, just save them with new numbers.
a second--use a "Linked List" instead. In each record store the index of the next record you want displayed, then have your code load that directly into a linked list.
Yet another simple approach. Let's say you're inserting a new record with an ordinal equal x.
First, check if there's a row having ordinal value equal x. In case there's one, just update all the records having the ordinal value equal or bigger than x increasing them by y. Then, you are safe to insert a new record.
This way you're sure you'll not run update every time and of course, you'll keep the order.

Access SQL how to make an increment in SELECT query

I Have an SQL query giving me X results, I want the query output to have a coulmn called
count making the query somthing like this:
count id section
1 15 7
2 3 2
3 54 1
4 7 4
How can I make this happen?
So in your example, "count" is the derived sequence number? I don't see what pattern is used to determine the count must be 1 for id=15 and 2 for id=3.
count id section
1 15 7
2 3 2
3 54 1
4 7 4
If id contained unique values, and you order by id you could have this:
count id section
1 3 2
2 7 4
3 15 7
4 54 1
Looks to me like mikeY's DSum approach could work. Or you could use a different approach to a ranking query as Allen Browne described at this page
Edit: You could use DCount instead of DSum. I don't know how the speed would compare between the two, but DCount avoids creating a field in the table simply to store a 1 for each row.
DCount("*","YourTableName","id<=" & [id]) AS counter
Whether you go with DCount or DSum, the counter values can include duplicates if the id values are not unique. If id is a primary key, no worries.
I frankly don't understand what it is you want, but if all you want is a sequence number displayed on your form, you can use a control bound to the form's CurrentRecord property. A control with the ControlSource =CurrentRecord will have an always-accurate "record number" that is in sequence, and that will update when the form's Recordsource changes (which may or may not be desirable).
You can then use that number to navigate around the form, if you like.
But this may not be anything like what you're looking for -- I simply can't tell from the question you've posted and the "clarifications" in comments.
The only trick I have seen is if you have a sequential id field, you can create a new field in which the value for each record is 1. Then you do a running sum of that field.
Add to your query
DSum("[New field with 1 in it]","[Table Name]","[ID field]<=" & [ID Field])
as counterthing
That should produce a sequential count in Access which is what I think you want.
HTH.
(Stolen from Rob Mills here:
http://www.access-programmers.co.uk/forums/showthread.php?p=160386)
Alright, I guess this comes close enough to constitute an answer: the following link specifies two approaches: http://www.techrepublic.com/blog/microsoft-office/an-access-query-that-returns-every-nth-record/
The first approach assumes that you have an ID value and uses DCount (similar to #mikeY's solution).
The second approach assumes you're OK creating a VBA function that will run once for EACH record in the recordset, and will need to be manually reset (with some VBA) every time you want to run the count - because it uses a "static" value to run its counter.
As long as you have reasonable numbers (hundreds, not thousands) or records, the second approach looks like the easiest/most powerful to me.
This function can be called from each record if available from a module.
Example: incrementingCounterTimeFlaged(10,[anyField]) should provide your query rows an int incrementing from 0.
'provides incrementing int values 0 to n
'resets to 0 some seconds after first call
Function incrementingCounterTimeFlaged(resetAfterSeconds As Integer,anyfield as variant) As Integer
Static resetAt As Date
Static i As Integer
'if reset date < now() set the flag and return 0
If DateDiff("s", resetAt, Now()) > 0 Then
resetAt = DateAdd("s", resetAfterSeconds, Now())
i = 0
incrementingCounterTimeFlaged = i
'if reset date > now increments and returns
Else
i = i + 1
incrementingCounterTimeFlaged = i
End If
End Function
autoincrement in SQL
SELECT (Select COUNT(*) FROM table A where A.id<=b.id),B.id,B.Section FROM table AS B ORDER BY B.ID Asc
You can use ROW_NUMBER() which is in SQL Server 2008
SELECT ROW_NUMBER() OVER (ORDER By ID DESC) RowNum,
ID,
Section
FROM myTable
Then RowNum displays sequence of row numbers.