get size of my kube audit log ingested daily in azure

get size of my kube audit log ingested daily in azure - azure-log-analytics

I would like to know how can I get the size (in terms of GB) of my kub audit log ingested on daily basis. Is there a KQL query which I can run in my log analytics workspace to find that out?
The reason I want is because I would like to calculate the azure consumption. Thanks

By using the usage table, it is possible to review how much data was ingested into an LA workspace.
Scope spans from solutions to data types (which correlates usually to the destination table, but not always).
Kube-audit is only exportable by default to the AzureDiagnostic table, a table shared among many azure resources, hence - it is impossible to differentiate the source of each record within the total count.
for example, I've being using the following query to review how much data was ingested at the scope of my AzureDiagnostic table in the last 10 days:
Usage
| where TimeGenerated > startofday(ago(10d))
| where DataType == 'AzureDiagnostics'
| summarize IngestedGB = sum(Quantity) / 1000 by bin(TimeGenerated, 1h)
| render timechart
In my case all data originated from Kube-audit logs, but, it shouldn't be the case of most users:
AzureDiagnostics
| where TimeGenerated > startofday(ago(10d))
| summarize count() by bin(TimeGenerated, 1h), Category
| render timechart

Related

How to write a Splunk query to count response codes of multiple endpoints

I'm trying to monitor performance/metrics of my application as an external system is going through a heavy data ingest. Currently, I can easily watch one endpoint using the following
index=my_index environment=prod service=myservice api/myApi1 USER=user1 earliest=07/19/2021:12:00:00 | stats count by RESPONSECODE
How can I adjust this query to include the additional endpoints I'd like to monitor? Ultimately I'd like a pie chart showing the total numbers of successes and failures across this API for the user.
Thanks all!
Edit: In the above query, api/myApi1 is the field I'm referring to. How can I include additional api/myApi# endpoints properly?

Include additional endpoints by adding them to the base query or by making the base query less specific.
index=my_index environment=prod service=myservice api/myApi1 USER IN (user1 user2 user3) earliest=07/19/2021:12:00:00
| stats count by USER, RESPONSECODE
OR
index=my_index environment=prod service=myservice api/myApi1 USER=* earliest=07/19/2021:12:00:00
| stats count by USER, RESPONSECODE

Few hourly events are missing in the splunk dashboard

Need help on the below weird issue.
We have a Splunk query where it is built on the logs. Below is the query used.
index=aws_lle_airflow "INFO - Source count for aws-bda-lle.marketing.bq_mktg_campaign:"
| rex field=_raw "INFO - Source count for aws-bda-lle.marketing.bq_mktg_campaign: (?<bqTableRecordCount>[^\"]+)"
| table _time bqTableRecordCount
| sort _time
problem is ideally as per the table inserts 10 events should be displayed in the dashboard but only 8 events are shown.
Even though the logs are written not sure why the dashboard is not reflecting. Could someone help me what could be the issue and what needs to be done to resolve.

Splunk query to get user, saved search name, last time the query ran

From Splunk, I am trying to get the user, saved search name and last time a query ran ?
A single Splunk query will be nice.
I am very new to Splunk and I have tried these queries :-
index=_audit action=search info=granted search=*
| search IsNotNull(savedsearch_name) user!="splunk-system-user"
| table user savedserach_name user search _time
The above query , is always empty for savesearch_name.

Splunk's audit log leaves a bit to be desired. For better results, search the internal index.
index=_internal savedsearch_name=* NOT user="splunk-system-user"
| table user savedsearch_name _time
You won't see the search query, however. For that, use REST.
| rest /services/saved/searches | fields title search
Combine them something like this (there may be other ways)
index=_internal savedsearch_name=* NOT user="splunk-system-user"
| fields user savedsearch_name _time
| join savedsearch_name [| rest /services/saved/searches
| fields title search | rename title as savedsearch_name]
| table user savedsearch_name search _time

Note that you have a typo in your query. "savedserach_name" should be "savedsearch_name".
But I also recommend a free app that has a dedicated search tool for this purpose.
https://splunkbase.splunk.com/app/6449/
Specifically the "user activity" view within that app.
Why it's a complex problem - part of the puzzle is in the audit log's info="granted" event, another part is in the audit log's info="completed" event, even more of it is over in the introspection index. You need those three stitched together, and the auditlog is plagued with parsing problems and autokv compounds the problem by extracting all of fields from the SPL itself.
That User Activity view will do all of this for you, sidestep pretty thorny autokv problems in the audit data, and not just give you all of this per search, but also present stats and rollups by user, app, dashboard, even by sourcetypes-that-were-actually-searched
it also has a macro called "calculate pain" that will score a "pain" number for each search, and then sum up all the "pain" in the by-user, by-app, by-sourcetype rollups etc. So that admins can try and pick off the worst offenders first.
it's up on SB here and approved for both Cloud and onprem - https://splunkbase.splunk.com/app/6449/
(and there's a #sideview_ui channel for it in the community slack.)

SSIS ForEach ADO Enumerator - Performance Issues

This is a best practice/other approach question about using a ADO Enumerator ForEach loop.
My data is financial accounts, coming from a source system into a data warehouse.
The current structure of the data is a list of financial transactions eg.
+-----------------------+----------+-----------+------------+------+
| AccountGUID | Increase | Decrease | Date | Tags |
+-----------------------+----------+-----------+------------+------+
| 00000-0000-0000-00000 | 0 | 100.00 | 01-01-2018 | Val1 |
| 00000-0000-0000-00000 | 200.00 | 0 | 03-01-2018 | Val3 |
| 00000-0000-0000-00000 | 400.00 | 0 | 06-01-2018 | Val1 |
| 00000-0000-0000-00000 | 0 | 170.00 | 08-01-2018 | Val1 |
| 00000-0000-0000-00002 | 200.00 | 0 | 04-01-2018 | Val1 |
| 00000-0000-0000-00002 | 0 | 100.00 | 09-01-2018 | Val1 |
+-----------------------+----------+-----------+------------+------+
My SSIS Package, current has two forEach Loops
All Time Balances
End Of Month Balances
All Time Balances
Passes AccountGUID into the loop and selects all transactions for that account. It then orders them by date with the first transaction being first and assigns it a sequence number.
Once the sequence number is assigned, it begins to count the current balances based on the increase and decrease cols, along with the tag col to work out which balance its dealing with.
It finishes this off by assigning the latest record with a Current flag.
All Time Balances - Work Flow
->Get All Account ID's in Staging table
|-> Write all Account GUID's to object variable
|--> ADO Enumerator ForEach - Loop Account GUID List - Write GUID to variable
|---> (Data Flow) Select all transactions for Account GUID
|----> (Data Flow) Order all transactions by date and assign Sequence number
|-----> (Data Flow) Run each row through a script component transformation to calculate running totals for each record
|------> (Data Flow) Insert balance data into staging table
End Of Month Balances
The second package, End of Month does something very similar with the exception of a second loop. The select will find the earliest transnational record and the latest transnational record. Using those two dates it will figure out all the months between those two and loop for each of those months.
Inside the date loop, it does pretty much the same thing, works out the balances based on tags and stamps the end of month record for each account.
The Issue/Question
All of this currently works fine, but the performance is horrible.
In one database with approx 8000 Accounts and 500,000 transactions. This process takes upwards of a day to run. This being one of our smaller clients, I tremble at the idea of running it for our heavy databases.
Is there a better approach to doing this, using SQL cursors or so other neat way I have not seen?

Ok, so I have managed to take my package execution from around 3 days to about 11 minutes all up.
I ran a profiler and standard windows stats while running the loops and found a few interesting things.
Firstly, there was almost no utilization of HDD, CPU, RAM or network during the execution of the packages. It told me what I kind of already knew, that it was not running as quickly as it could.
What I did notice, between each execution of the loop there was a 1 to 2ms delay before the next instance of the loop started executing.
Eventually I found that every time a new instance of the loop began, SSIS created a new connection to the SQL database, it appears that this is SSIS's default behavior. Whenever you create a Source or Destination, you are adding a connection delay to your project.
The Fix:
Now this was an odd fix, you need to go into your connection manager (The odd bit) it must be the onscreen window not in the right hand project manager window.
If you select your connect that is referenced in the loop, the properties window on the right side (In my layout anyway) you will see the option called "RetainSameConnection" which be default is set to false.
By setting this to true, I eliminated the 2ms delay.
Considerations:
In doing this I created a heap of other issues, which really just highlighted areas of my package that I had not thought out well.
Some things that appears to be impacted by this change were stored procedures that used temp tables, these seemed to break instantly. I assume that is because of how SQL handles temp tables, in closing the connection and reopening, you can be pretty certain that the temp table is gone. With the same connection setting, the chance of running into temp tables appears to be an issue again.
I removed all temp tables and replaced them with CTE statements, this appears to fix this issue.
The second major issue I found was with tasks that ran parallel and both used the same connection manager. From this I received an error that SQL is still trying to run the previous statement. This bombed out my package.
To get around this, I created a duplicate connection manager (All up I made three connection managers for the same database).
Once I had my connections set up, I went into each of my parallel Source and Destinations and assigned them their own connection manager. This appears to have resolved the last error I received.
Conclusion:
They may be more unforeseen issues in doing this, but for now my packages are lightening quick and this highlighted some faults in my design.

how can I find all dashboards in splunk, with usage information?

I need to locate data that has become stale in our Splunk instance - so that I can remove it
I need a way to find all the dashboards, and sort them by usage. From the audit logs I've been able to find all the actively used logs, but as my goal is to remove data, I most need the dashboards not in use
any ideas?

You can get a list of all dashboards using | rest /services/data/ui/views | search isDashboard=1. Try combining that with your search for active dashboards to get those that are not active.
| rest /services/data/ui/views | search isDashboard=1 NOT [<your audit search> | fields id | format]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas