I want to know who accessed hive tables in last one month. Audit logs are disabled and there is no security layer. Please help!
Related
I am using firebase analytics and bigguery with average of 50~60 GB daily data.
For the most recent daily table, a query gives different result from yesterday even if query conditions are exact same including target date.
I just found that there are 1~2days gap between table creation date and last modified date.
I assume the difference between the query results are because of this. (Calculating on different data volume, maybe)
Is this date gap means a single daily table needs at least 2 days to be fully loaded from intraday table?
Thanks in advance.
biqguery table info
In the documentation we can find the following information:
After you link a project to BigQuery, the first daily export of
events creates a corresponding dataset in the associated BigQuery
project. Then, each day, raw event data for each linked app populates
a new daily table in the associated dataset, and raw event data is
streamed into a separate intraday BigQuery table in real-time.
It seems that the intraday table is loaded to the main table each day and if you want to access this data in real-time you`ll have to use this intraday separate table.
If this information doesn`t help you, please provide some extra information so I can help you more efficiently.
I want to take backup of my database xyz.
Tables of this database should contain all records for last ten days only.
Is it possible? If yes then how I can achieve it?
You could check the answers posted here.
Or if you specify 10 days because that was the date of the LAST backup operation, you can use MySQL Backup's Incremental backup operations.
If you need to capture some of the DB to synchronize it with a different DB, this SQLyog information might be helpful.
I want to get the number of users who have used the particular table or all the tables in any of the DML scripts in teradata.
You will need to have enabled Query Logging with OBJECTS to capture this information in the data dictionary (DBC). Typically this data is moved from DBC to a set of historical tables elsewhere on the system for analysis and audit purposes. Check with your DBA team for how they are managing DBQL within your environment.
I have an audit table setup which essentially mirrors one of my tables along with a date, user and command type. Here's how it might look like:
AuditID UserID Individual modtype user audit_performed
1 1239 Day Meff INSERT dbo 2010-11-04 14:50:56.357
2 2334 Dasdf fdlla INSERT dbo 2010-11-04 14:51:07.980
3 3324 Dasdf fdla DELETE dbo 2010-11-04 14:51:11.130
4 5009 Day Meffasdf UPDATE dbo 2010-11-04 14:51:12.777
Since these types of tables can get big pretty quick - I was thinking of putting in some sort of automatic delete of the older rows. So for example if I have 3 months of history - if I could delete the first month while retaining the last two. And again all of this must be automatic - I imagine once a certain date is hit, a query activates and deletes the oldest month with audit data. What is the best way to do this?
I'm using SQL Server 2005 by the way.
A SQL agent job should be fine here. You definitely don't need to do this on every single insert with a trigger. I doubt you even need to do it every day. You could schedule a job that runs once a month and clears out anything older than 2 months (so at most you'd have 3 months of data minus 1 day at any given time).
You could use SQL Server agent..you can schedule a repeating job like deleting entries from the current audit table after certain period. Here is how you would do it.
I would recommend storing the data in another table audit_archive table and deleting it from the current audit table. So, that in case you want some history you still have it and your table also doesn't get too big.
You could try a trigger every time a row is added it will clear anything older than 3 months.
You could also try SQL Agent to run a script every day that will do that.
Have you looked at using triggers? You could define a trigger to run when you add a row (on INSERT) that deletes any rows that are more than three months old.
We have a warehouse database that contains a year of data up to now. I want to create report database that represents the last 3 months of data for reporting purposes. I want to be able to keep the two databases in sync. Right now, every 10 minutes I execute a package that will grab the most recent rows from the warehouse and adds them to the report db. The problem is that I only get new rows but not new updates.
I would like to know what are the various ways of solving this scenario.
Thanks
look into replication, mirroring or log shipping
If you are using SQL 2000 or below, replication is your best bet. Since you are doing this every ten minutes, you should definitely look at transactional replication.
If you are using SQL 2005 or greater, you have more options available to you. Database snapshots, log shipping, and mirroring as SQLMenace suggested above. The suitability of these vary depending on your hardware. You will have to do some research to pick the optimal one for your needs.
You should probably read about replication, or ask your DB admin about it.
Is it possible to add columns to this database? You could add a Last_Activity column to the DB and the write a trigger that updates the date/timestamp on that row to reflect the latest edit. For any new entries, the date/time would reflect the timestamp when the row was added.
This way, when you grab the last three months, you'd be grabbing the last three months' activity, not just the new stuff.