OpenRefine - add sequence number, reset for each record - openrefine

I have some records containing multiple rows. I want to give each row within a record a unique ID based on the string in the first row, containing the original ID + _01 _02 _03 and so forth.
Then I would like the counter to reset when the next record with a new string begins, and repeat the above.

You can use the following grel expression value +row.index - row.record.fromRowIndex+1 to generate the id. You can read more about row and record index here: http://kb.refinepro.com/2012/06/creating-row-and-record-index.html

Related

SQL query - How to achieve the subsequent column updation by summing up the value of current row in Single select query (need to avoid while loop)

The logic which we are trying to achieve in single query is as follows.
We need to loop based on row number column. So, on each loop we need to sum-up remaining value and new value.. resultant value to be updated in "by summing up column". and the decimal part to be updated in decimal value column.
in next step, need to sum-up the decimal value column by grouping on row number. and the resultant to be updated in remaining value column of next row number
the above step 1-2 to be continued till we reach last record.
We achieved this through while loop.. But trying to achieve this without while loop.
Can someone please give idea to achieve this
Please refer the attached image for understanding table
enter image description here

Talend - Count row on tOracleInput

May I ask how to count the row of tOracleInput and place it to the tOracleOutput. At the same time, can I add the values of that column SUM(tOracleOutput.OS_BALANCE)?
You could use the tAggregateRow component like this:
You should leave group by paramaters empty and create an output schema that will hold the sum and count. The row generated will then be fed to tOracleOutput.

Sequential Numbers Automatically Populate a field

I have an Access database that we use to house Worker's Compensation Accident Information. One of the required fields is an OSHA recordable number that is sequential starting with "01" and the two digit year of the accident (ex. 01-14).
I need to be able to programmitically look into my table see what numbers have already been used and find the next number in the sequence. It also needs to reset to 1 at the beginning of a new year.
Example:
table reads
01-14
02-14
03-14
The new number that populates the textbox should be 04-14
Help!
Given that you have a multi-user database, see insert query with sequential primary key or Access VBA: Find max number in column and add 1 for code to get the next sequential number.
You can use Year and the seed number to create the next OSHA number. You can reset the seed number on year change.
DMax is a possibility, but I strongly suggest you do not use it in a multi-user database.
From what you have described, it sounds like the OSHA number should be generated as and when needed instead of being assigned and stored when the table row is created. My suggestion would be to just have the primary key (accident_id) be an autoincremented long integer, the standard practice in Access. And also you need an accident_date column to be a datetime or similar. Or at the very least an accident_year column. Then when you need an OSHA number (say in a report), just have some VBA to generate it using the primary key--accident_id--and the accident_date or accident_year column. You will be taking advantage of the fact that Access autoincremented primary keys are never re-used--even when their rows are deleted, those numbers are never recycled to be used in other rows. So given a long integer primary key, and a date, you can always reproduce the exact same OSHA number, with a simple algorithm something like the following:
function osha_number(accident_id as long) as string
accident_year = ... ' get (last two digits of) accident year from accidents table using ID
year_first_accident_id = ... ' get ID of first accident of this year
year_this_accident_num = accident_id - year_first_accident_id + 1
osha_number = year_this_accident_num & "-" & accident_year
end function

Datatable Compute Method filter on row number

I use a query which fetches say 50 records and passes it to a datatable. This record is then displayed in a tabular format. The display has pagination used displaying 10 records at a time. There is a facility to move to next or previous set of record or move forward or backwards by 1 record.
I have to find Min and Max of a column for the set of record currently visible. I am planning to use Compute method but I am not sure if it allows filtering on anything other than the columns in datatable.
Do I have to include row number in my query or is there a better solution (something along the line mentioned below)?
CType(dtLineup.Compute("Min(ArrivalDate)", dt.row(2) to dt.row(12)), Date)
There is nothing like your pseudo code in MSDN on DataColumn.Expression. You could include a row number in your query, as you said, but an alternative is to add a row number column to your data table and use that in the filter expression.
DataColumn col = new DataColumn("rownumber", typeof(int));
col.AutoIncrement = true;
col.AutoIncrementSeed = 1;
datatable.Columns.Add(col);
Another alternative could be to do paging by linq (Skip-Take) and compute the aggregate function over the returned rows. But that may be a major departure of your current application structure.

SSIS 2008 Row Count Transformation - Row count return 0

This should be rather simple but I don't know why I get Row Count as Zero when I use ROW COUNT transformation in Data Flow Task. I have created a variable(NoOfRecords) with Package scope.
Variable name set to variable NoOfRecords in Row Count Transformation.
Used a Derived column to assign the row count.
The package runs successfully and shows record count 265
But the Derived column shows record count as 0 instead of 265 rows.
After the Row Count, add an Aggregate Taks and select count option in the Operation tab in the Aggregate task properties.
Then you can use the row count variable for further operation where it holds the total row count of the input file.
Row Count is processed after rows has passed.
You're adding the variable to each row as they pass through the Derived Column step, but at this time, the variable has not been updated (as it happens after all rows has passed) - so the value 0 is correct.
You -might- be able to achieve this by using an asynchronous task before your derived (but i'm not sure this'll work, it just popped to my mind). Add a Sort or Aggregate step before your Derived and try again.
I used this in the query as an efficient way of getting the row count:
count(all SnapshotDate) over () as nRowCount
Here's the successful technique for recording rows that worked in my situation.
The scenario is I want to log the rows migrated between tables. The RowCount doesn't get populated until you exit the DataFlow.
[Control Flow]
1. Data Flow Task
a. read origin data - Source control
b. Add RowCount transformation. Link a to b.
Right-click RowCount and map to UserVariable (int64)
c. Add Destination control for loading table.
d. Link b to c.
2. Add Execute SQL Task to ControlFlow. right click, edit
INSERT SQL statement: Insert Into LogTable(rowcount) Values(?)
Parameter Mapping
Variable Direction DataType ParameterName ParameterSize
User::RowCount INPUT LONG 0 -1