BigQuery Table Last Query Date - google-bigquery

I heavily use bigQuery and there are now quite a number of intermediate tables. Because teammates can upload their own tables, I do not understand all the tables well.
I want to check if a table have not been used for a long time, then check if it can be deleted manually.
Is there anyone know how to do?
Many thanks

You could use logs if you have access. If you made yourself familiar with how to filter log entries you can find out about your usage quite easily: https://cloud.google.com/logging/docs/quickstart-sdk#explore
There's also the possibility of exporting logs to big query - so you could analyze them using SQL - I guess that's even more convenient.

You can get table specific meta data via the TABLES command.
SELECT *,TIMESTAMP_MILLIS(LAST_MODIFIED_TIME) ACCESS_DATE
FROM [DATASET].__TABLES__
The mentioned code snippet should provide you with the last access date.

Related

How to get table/column usage statistics in Redshift

I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.
I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.
Anyone knows if this is somehow possible?
Thank you!
The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.
I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.

I have a database full of many databases and Tables on Hive, now how could I search and find the column of interest?

Long in short that I realized that Hue(Hive/Impala) is not like Microsoft SQL server that you run the following to look for the Table of Interest.
Select * from information_schema.columns where column_name like '%The_Table_of_Interest%'
1st scenario: Imagine that I know what my Database is and I target my attention to the right table by searching through the table and find the column of interest.
2nd scenario: I don't know even what database I need to look for the right table and as a result the column of interest.
I realized that in Hue, there is no option to look for a column. All I can see is Table Search!
Having said that for the two above scenarios there should be a way to find the column of interest.
Scenario 2 is of course difficult to approach, however the 1st one looks a bit easier.
Now, I did my research and came of with running some code in Shell Command Line might be helpful to find the target column. However, that require some further investigation in the layer that I am not quite familiar.(Speaking of Metaset, etc.)
Therefore, here is my question.
Assume we are discussing the 1st scenario, now how can I search and find the columns while you have no knowledge about the tables at all. I can not take guesses and try every tables to find the right one to find the column that I am looking for. What would you suggest, and what is your strategy to approach? Thank you in advance. :)
Good Day H2019
Here are some commands that should help you out to explore the different tables that you have access to:
Find a table or a database
show tables like 'ben*'
Look at the table definition
show create table <table>;
Get table information
describe my_table_01;
Get even more information
describe extended table_name
Get more information in a pretty format
describe formatted table_name;
If you have access to Apache Ranger I also find it useful to look into tables permissions. (And see who's using what)
Apache Atlas if you use it it helpful to see where data comes from.(It keeps data lineage information and may help to give you an understanding of how things work)
Don't forget you can look at HDFS to find databases, tables if they're in /hive/warehouse/. This can also be helpful to understand when things are created.

SQL Server Performance Tuning of large table

I have a table of 755 columns and around holding 2 million records as of now and it will grow.There are many procedures accessing it with other tables join, are running slow. Now it's hard to split/normalize them as everything is already built and customer is not ready to spend much on it. Is there any way to make the query access to that table faster? Please advise.
Will column store index help?
How little are they prepared to spend?
It may be possible to split this table into multiple 1 to 1 joined tables (vertical partitioning), then use a view to present it as one single blob to existing code.
With some luck you may get join elimination happening frequently enough to make it worthwhile.
View will probably require INSTEAD OF triggers to fully replicate existing logic. INSTEAD OF triggers have a number of restrictions e.g. no support for OUTPUT clause, which can prove to be to hard to overcome depending on your specific setup.
You can name your view the same as existing table, which will eliminate the need of fixing code everywhere.
IMO this is the simplest you can do short of a full DB re-factoring exercise.
See: http://aboutsqlserver.com/2010/09/15/vertical-partitioning-as-the-way-to-reduce-io/ and https://logicalread.com/sql-server-optimizer-may-eliminate-foreign-key-joins-mc11/#.WXgEzlERW6I
755 Columns thats a lot. You should try to index the columns that are mostly used in where clause. this might speed up the process
It is fine, dont worry about it, actually how many columns you have it is not important in sql server (But be careful I said 'have'). The main problem is data count and how many column you select in queries. There is a few point firstly you can check.
Do not use * selector and change it if used in everywhere
In the joins, do not use it directly, you can firstly filter it as inner select. (Just try it, I have no idea about your table so I m telling the general rules.)
Try the diminish data count for ex: use history table for old records. This technicque depends on needs of your organization.
Try to use column index and something like that features.
And of course remove dynamic selects in your queries.
I wish one of them will work.

Solution: get report faster from database with big data

I use oracle database. I have many table with data very big (300-500 million record).
I use query statement have join many table together. I set index for table but get report very slow.
Please, help me solution when working with big data.
Thanks.
Do you really need to have all the data at once?
Try creating a table that stores just the information you need for the report, and run a query once a day (or few hours) to update the table. You can also use Sql Server Integration Services, although I have not tried SSIS with Oracle myself.
I agree with the other users, you really need to give more info on the problem.

Strategy for identifying unused tables in SQL Server 2000?

I'm working with a SQL Server 2000 database that likely has a few dozen tables that are no longer accessed. I'd like to clear out the data that we no longer need to be maintaining, but I'm not sure how to identify which tables to remove.
The database is shared by several different applications, so I can't be 100% confident that reviewing these will give me a complete list of the objects that are used.
What I'd like to do, if it's possible, is to get a list of tables that haven't been accessed at all for some period of time. No reads, no writes. How should I approach this?
MSSQL2000 won't give you that kind of information. But a way you can identify what tables ARE used (and then deduce which ones are not) is to use the SQL Profiler, to save all the queries that go to a certain database. Configure the profiler to record the results to a new table, and then check the queries saved there to find all the tables (and views, sps, etc) that are used by your applications.
Another way I think you might check if there's any "writes" is to add a new timestamp column to every table, and a trigger that updates that column every time there's an update or an insert. But keep in mind that if your apps do queries of the type
select * from ...
then they will receive a new column and that might cause you some problems.
Another suggestion for tracking tables that have been written to is to use Red Gate SQL Log Rescue (free). This tool dives into the log of the database and will show you all inserts, updates and deletes. The list is fully searchable, too.
It doesn't meet your criteria for researching reads into the database, but I think the SQL Profiler technique will get you a fair idea as far as that goes.
If you have lastupdate columns you can check for the writes, there is really no easy way to check for reads. You could run profiler, save the trace to a table and check in there
What I usually do is rename the table by prefixing it with an underscrore, when people start to scream I just rename it back
If by not used, you mean your application has no more references to the tables in question and you are using dynamic sql, you could do a search for the table names in your app, if they don't exist blow them away.
I've also outputted all sprocs, functions, etc. to a text file and done a search for the table names. If not found, or found in procedures that will need to be deleted too, blow them away.
It looks like using the Profiler is going to work. Once I've let it run for a while, I should have a good list of used tables. Anyone who doesn't use their tables every day can probably wait for them to be restored from backup. Thanks, folks.
Probably too late to help mogrify, but for anybody doing a search; I would search for all objects using this object in my code, then in SQL Server by running this :
select distinct '[' + object_name(id) + ']'
from syscomments
where text like '%MY_TABLE_NAME%'