How to run a cucumber scenario multiple times using the data coming from database in using java - cucumber-jvm

Below is My Scenario:
Scenario: Delete Customer
Given We declare a new Request
And We have below Path parameters
| userid | |
| magcode | |
And We have below Header parameters
| sharedsecret | |
And We log the Request
When We send Delete request to service "DeleteCustomerWebservice"
Then The response status code should be 200
Here I am deleting customer only once, but same thing i have to do multiple times But the data should be coming from database.

I dont think you can make your database as datatable for your story. Cucumber isnt designed for this style i believe. Instead handle this in a do/while or for loop inside your test to iterate over all records of resultset.

Related

Filter data with parameter tableau

I'm trying to filter data with parameter based on user_id with tableau. There's 3 columns: user_id, content_team, seconds. Some of the sample datas are:
user_id | content_team | seconds
1 | AR | 2030
2 | VR | 1020
1 | AR | 1500
2 | VR | 3000
3 | HOLO | 2230
Right now I have the sum(seconds) of the content_team for all the users combine
How can I create a parameter in which it takes a user_id and display the sum(seconds) of the content_team for that particular user only? Thanks
The use case for using parameters in filters is usually that you have more than one data source that are not joined but do contain a common field. There is little benefit to using parameters when there is only one data source as simple filters work better.
But when you want common filters across multiple sources, can't join them into a common source, but don't want to lose some of the functionality in each sheet, then parameters work.
Say you have multiple data sources at different levels of detail but containing the same filterable field say user_id then you can create a parameter built on that field (and recent versions of Tableau can automatically update the content of the parameter on loading the workbook). The way to filter across the sources is to create a parameter called, say, user_id_parameter which is updated when the workbook is opened and then to filter each individual sheet using a calculation eg [user_id]=user_id_parameter where the user_id in each case is from the relevant data source. then, when the parameter is changed, each separate data source will be filtered to show only the data with that user_id even across multiple otherwise unlinked data sources.
This functionality (now that parameters can update on load) vastly simplifies the process of creating "universal" filters that work across multiple data sources in a workbook.
If you only have one data source, the technique still works but has little to no advantage over just filtering on the relevant field (and, don't forget, you can set a filter to apply to all sheets or a range of specified sheets).

SSIS ForEach ADO Enumerator - Performance Issues

This is a best practice/other approach question about using a ADO Enumerator ForEach loop.
My data is financial accounts, coming from a source system into a data warehouse.
The current structure of the data is a list of financial transactions eg.
+-----------------------+----------+-----------+------------+------+
| AccountGUID | Increase | Decrease | Date | Tags |
+-----------------------+----------+-----------+------------+------+
| 00000-0000-0000-00000 | 0 | 100.00 | 01-01-2018 | Val1 |
| 00000-0000-0000-00000 | 200.00 | 0 | 03-01-2018 | Val3 |
| 00000-0000-0000-00000 | 400.00 | 0 | 06-01-2018 | Val1 |
| 00000-0000-0000-00000 | 0 | 170.00 | 08-01-2018 | Val1 |
| 00000-0000-0000-00002 | 200.00 | 0 | 04-01-2018 | Val1 |
| 00000-0000-0000-00002 | 0 | 100.00 | 09-01-2018 | Val1 |
+-----------------------+----------+-----------+------------+------+
My SSIS Package, current has two forEach Loops
All Time Balances
End Of Month Balances
All Time Balances
Passes AccountGUID into the loop and selects all transactions for that account. It then orders them by date with the first transaction being first and assigns it a sequence number.
Once the sequence number is assigned, it begins to count the current balances based on the increase and decrease cols, along with the tag col to work out which balance its dealing with.
It finishes this off by assigning the latest record with a Current flag.
All Time Balances - Work Flow
->Get All Account ID's in Staging table
|-> Write all Account GUID's to object variable
|--> ADO Enumerator ForEach - Loop Account GUID List - Write GUID to variable
|---> (Data Flow) Select all transactions for Account GUID
|----> (Data Flow) Order all transactions by date and assign Sequence number
|-----> (Data Flow) Run each row through a script component transformation to calculate running totals for each record
|------> (Data Flow) Insert balance data into staging table
End Of Month Balances
The second package, End of Month does something very similar with the exception of a second loop. The select will find the earliest transnational record and the latest transnational record. Using those two dates it will figure out all the months between those two and loop for each of those months.
Inside the date loop, it does pretty much the same thing, works out the balances based on tags and stamps the end of month record for each account.
The Issue/Question
All of this currently works fine, but the performance is horrible.
In one database with approx 8000 Accounts and 500,000 transactions. This process takes upwards of a day to run. This being one of our smaller clients, I tremble at the idea of running it for our heavy databases.
Is there a better approach to doing this, using SQL cursors or so other neat way I have not seen?
Ok, so I have managed to take my package execution from around 3 days to about 11 minutes all up.
I ran a profiler and standard windows stats while running the loops and found a few interesting things.
Firstly, there was almost no utilization of HDD, CPU, RAM or network during the execution of the packages. It told me what I kind of already knew, that it was not running as quickly as it could.
What I did notice, between each execution of the loop there was a 1 to 2ms delay before the next instance of the loop started executing.
Eventually I found that every time a new instance of the loop began, SSIS created a new connection to the SQL database, it appears that this is SSIS's default behavior. Whenever you create a Source or Destination, you are adding a connection delay to your project.
The Fix:
Now this was an odd fix, you need to go into your connection manager (The odd bit) it must be the onscreen window not in the right hand project manager window.
If you select your connect that is referenced in the loop, the properties window on the right side (In my layout anyway) you will see the option called "RetainSameConnection" which be default is set to false.
By setting this to true, I eliminated the 2ms delay.
Considerations:
In doing this I created a heap of other issues, which really just highlighted areas of my package that I had not thought out well.
Some things that appears to be impacted by this change were stored procedures that used temp tables, these seemed to break instantly. I assume that is because of how SQL handles temp tables, in closing the connection and reopening, you can be pretty certain that the temp table is gone. With the same connection setting, the chance of running into temp tables appears to be an issue again.
I removed all temp tables and replaced them with CTE statements, this appears to fix this issue.
The second major issue I found was with tasks that ran parallel and both used the same connection manager. From this I received an error that SQL is still trying to run the previous statement. This bombed out my package.
To get around this, I created a duplicate connection manager (All up I made three connection managers for the same database).
Once I had my connections set up, I went into each of my parallel Source and Destinations and assigned them their own connection manager. This appears to have resolved the last error I received.
Conclusion:
They may be more unforeseen issues in doing this, but for now my packages are lightening quick and this highlighted some faults in my design.

how can I find all dashboards in splunk, with usage information?

I need to locate data that has become stale in our Splunk instance - so that I can remove it
I need a way to find all the dashboards, and sort them by usage. From the audit logs I've been able to find all the actively used logs, but as my goal is to remove data, I most need the dashboards not in use
any ideas?
You can get a list of all dashboards using | rest /services/data/ui/views | search isDashboard=1. Try combining that with your search for active dashboards to get those that are not active.
| rest /services/data/ui/views | search isDashboard=1 NOT [<your audit search> | fields id | format]

How to handle concurrent requests that delete and create the same rows?

I have a table that looks like the following:
game_stats table:
id | game_id | player_id | stats | (many other cols...)
----------------------
1 | 'game_abc' | 8 | 'R R A B S' | ...
2 | 'game_abc' | 9 | 'S B A S' | ...
A user uploads data for a given game in bulk, submitting both players' data at once. For example:
"game": {
id: 'game_abc',
player_stats: {
8: {
stats: 'R R A B S'
},
9: {
stats: 'S B A S'
}
}
}
Submitting this to my server should result in the first table.
Instead of updating the existing rows when the same data is submitted again (with revisions, for example) what I do in my controller is first delete all existing rows in the game_stats table that have the given game_id:
class GameStatController
def update
GameStat.where("game_id = ?", game_id).destroy_all
params[:game][:player_stats].each do |stats|
game_stat.save
end
end
end
This works fine with a single threaded or single process server. The problem is that I'm running Unicorn, which is a multi-process server. If two requests come in at the same time, I get a race condition:
Request 1: GameStat.where(...).destroy_all
Request 2: GameStat.where(...).destroy_all
Request 1: Save new game_stats
Request 2: Save new game_stats
Result: Multiple game_stat rows with the same data.
I believe somehow locking the rows or table is the way to go to prevent multiple updates at the same time - but I can't figure out how to do it. Combining with a transaction seems the right thing to do, but I don't really understand why.
EDIT
To clarify why I can't figure out how to use locking: I can't lock a single row at a time, since the row is simply deleted and not modified.
AR doesn't support table-level locking by default. You'll have to either execute db specific SQL or use a gem like Monogamy
Wrapping up the save statements in a transaction will speed things up if nothing else.
Another alternative is to implement the lock with Redis. Gems like redis-lock are also available. This will probably be less risky as it doesn't touch the DB, and you can set Redis keys to expire.

Is it important to have an automated acceptance tests to test whether a field saves to a database?

I'm using SpecFlow for the automated Acceptance Testing framework and NHibernate for persistance. Many of the UI pages for an intranet application that I'm working on are basic data entry pages. Obviously adding a field to one of these pages is considered a "feature", but I can't think of any scenarios for this feature other than
Given that I enter data X for field Y on Record 1
And I click Save
When I edit Record 1
Then I should data X for field Y
How common and necessary is it to automate tests like this? Additionally, I'm using NHibernate so it's not like I'm handrolling my own data persistance layer. Once I add a property to my mapping file, there is a high chance that it won't get deleted by mistake. When considering this, isn't a "one-time" manual test enough? I'm eager to hear your suggestions and experience in this matter.
I usually have scenarios like "successful creation of ..." that tests the success case (you fill-in all required fields, all input is valid, you confirm, and finally it is really saved).
I don't think that you can easily define a separate scenario for one single field, because usually the scenario of successful creation requires several other criteria to be met "at the same time" (e.g. all required fields must be filled).
For example:
Scenario: Successful creation of a customer
Given I am on the customer creation page
When I enter the following customer details
| Name | Address |
| Cust | My addr |
And I save the customer details
Then I have a new customer saved with the following details
| Name | Address |
| Cust | My addr |
Later I can add additional fields to this scenario (e.g. the billing address):
Scenario: Successful creation of a customer
Given I am on the customer creation page
When I enter the following customer details
| Name | Address | Billing address |
| Cust | My addr | Bill me here |
And I save the customer details
Then I have a new customer saved with the following details
| Name | Address | Billing address |
| Cust | My addr | Bill me here |
Of course there can be more scenarios related to the new field (e.g. validations, etc), that you have to define or extend.
I think if you take this approach you can avoid having a lot of "trivial" scenarios. And I can argue that this is the success case of the "create customer feature", which deserves a single test at least.