Understanding a table's structure/schema in SQL - sql

I wanted to reach out to ask if there is a practical way of finding out a given table's structure/schema e.g.,the column names and example row data inserted into the table(like the head function in python) if you only have the table name. I have access to several tables in my current role, however, a person who developed the tables left the team I am on. I was interested in examining the tables closer via SQL Assistant in Teradata (these tables often contain often hundreds of thousands of rows hence there are issues of hitting CPU exception criteria errors).
I have tried the following select statement, but there is an issue of hitting internal CPU exception criteria limits.
SELECT top10 * FROM dbc.table1
Thank you in advance for any tips/advice!

You can use one of these commands to get table's structure details in teradata
SHOW TABLE Database_Name.Table_Name;
or
HELP TABLE Database_Name.Table_Name;
It shows the table structure details

Related

How to get table/column usage statistics in Redshift

I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.
I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.
Anyone knows if this is somehow possible?
Thank you!
The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.
I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.

Does Tabledata.list() count towards compute usage in BigQuery?

They say there are no stupid questions, but this might be an exception.
I understand that BigQuery, being a columnar database, does a full table scan for any query over a specific column.
I also understand that query results can be cached or a named table can be created with the results of a query.
However I also see tabledata.list() in the documentation, and I'm unsure of how this fits in with query costs. Once a table is created from a query, am I free to access that table without cost through the API?
Let's say, for example, I run a query that is grouped by UserID, and I want to then present the results of that query to individual users based on that ID. As far as I understand there are two obvious ways of getting out the appropriate row for doing so.
I can write another query over the destination table with a WHERE userID=xxx clause
I can use the tabledata.list() endpoint to get all the (potentially paginated) data and get the appropriate row myself in my code
Where situation 1 would incur a query cost, and situation 2 would not? Am I getting this right?
Tabledata.list API is free as it actually does not use BigQuery Engine at all
so you are right for both 1 and 2

How to Combine Multiple Nested SQL tables into one?

First of all, I should preface this by letting you know that I'm a SQL novice - I've never really used SQL Server before and what I'd like to do must be quite rare or challenging because I've been unable to find any relevant answers on StackOverflow or Google.
I'd really, really appreciate your help on this. In the meantime, I myself am currently trying to improve my SQL knowledge and unearth a way to tackle this - but let's get straight to the point
I'm currently in possession of a SQL Server (which I browse through SQL Server Management Studio) with 4 tables. Everything's in Greek so no point in writing the real names. The point is that each row in Table 1 is associated with multiple rows in Table 2, which in turn is associated with multiple rows in Table 3, which in turn is associated with multiple rows in Table 4
My task is to perform AI/Machine Learning on this multi-instance multi-label problem, but to do that, I have to make it so there is only 1 table containing all the information of all tables.
SQL Server database structure:
4 Tables
3.75 GB
Table 1:
Holds information about tasks
100 columns
400,000 rows
ID is connected to table 2's Research_ID
Table 2:
Each task has multiple sub-tasks (which is what this table holds)
11 columns
2,500,000 rows
ID is connected to table 3's Task_Group_ID
Table 3:
Each sub-task requires things to be bought or changed or thrown away (held in this table)
8 columns
17,000,000 rows
Material_ID connected to table 4's ID
Table 4:
Each material has a certain cost and stuff (held in this table)
12 columns
3,700 rows
The way I see it, maybe it needs to happen in stages starting from the bottom to top.
For each row in table 3, there are many associated rows in table 4; hence, each row in table 3 is inserted in a new table as many times as the number of rows associated with it in table 4.
This means that a lot of the information will be duplicated and the 3.75GB will become much bigger, but that's normal and is what the problem needs.
After this happens for table 3 and 4, same thing needs to happen for table 2, and then for table 1. Note that a couple of columns from each table have to not be included in the final table. As I understand it, the only thing this changes is the use of each column's name in the "Select" instead of the asterisk (*). Lastly, remember that I need to actually create a new table because it needs to occur only once and stay available for months to be read by machine learning programs (WEKA, R, etc) and programming libraries (Accord.NET, etc)
The thing is.. how can I combine all these tables into one table that persists?
If I've neglected to share any needed information, please inform me and I shall do so as soon as I see the message.
You use joins to get the information. Tehcnically, you can do something like
SELECT * FROM Table1
JOIN Table2 ON Table1.Table2Id = Table1.ID
JOIN Table3 ON Table2.Table3Id = Table3.ID
Etc. But, you end up with repeats that can mess things up, so you are better to only select the columns you require. The joins here are one way, and will exclude nulls, so you might need other types of joins. The most information comes from a cross join, but it makes a Cartesian product of all of the tables involved, so you have the potential of getting more back than what you require.
Here is a link that explains joins in T-SQL: http://www.techonthenet.com/sql_server/joins.php
It is a good place to get started and may answer your question with a little bit of experimentation on your part.

oracle join depth while updating table

I have a question regarding Oracle.
I know that Oracle only support the use of aliases to the first subquery level. This poses a problem when I want to group more than one time while updating a table.
Example: I have some server groups and a database containing information about them. I have one table that contains information about the groups and one table where I store with timestamp (to be exact: I used date actually) the workload of specific servers within the groups.
Now I have for performance issues a denormalized field in the server table containing the highest workload the group had within one day.
What I would like to do is something like
update server_group
set last_day_workload=avg(workload1)
from (select max(workload) workload1
from server_performance
where server_performance.server_group_ID_fk=server_group.ID
and time>sysdate-1
group by server_performance.server_group_ID_fk)
While ID is the primary key of server_group and server_group_ID_fk a foreign key reference from the server_performance table. The solution I am using so far is writing the first join into a temporary table and update from that temporary table in the next statement. Is there a better way to do this?
In this problem it isn`t such a problem yet, but if the amount of data increase using a temporary table cost not only some time, but also a notable amount of RAM.
Thank you for your answers!
If I were you, I would work out the results that I wanted in a select statement, and then use a MERGE statement to do the necessary update.

Performance Improve on SQL Large table

Im having 260 columns table in SQL server. When we run "Select count(*) from table" it is taking almost 5-6 to get the count. Table contains close 90-100 million records with 260 columns where more than 50 % Column contains NULL. Apart from that, user can also build dynamic sql query on to table from the UI, so searching 90-100 million records will take time to return results. Is there way to improve find functionality on a SQL table where filter criteria can be anything , can any1 suggest me fastest way get aggregate data on 25GB data .Ui should get hanged or timeout
Investigate horizontal partitioning. This will really only help query performance if you can force users to put the partitioning key into the predicates.
Try vertical partitioning, where you split one 260-column table into several tables with fewer columns. Put all the values which are commonly required together into one table. The queries will only reference the table(s) which contain columns required. This will give you more rows per page i.e. fewer pages per query.
You have a high fraction of NULLs. Sparse columns may help, but calculate your percentages as they can hurt if inappropriate. There's an SO question on this.
Filtered indexes and filtered statistics may be useful if the DB often runs similar queries.
As the guys state in the comments you need to analyse a few of the queries and see which indexes would help you the most. If your query does a lot of searches, you could use the full text search feature of the MSSQL server. Here you will find a nice reference with good examples.
Things that came me up was:
[SQL Server 2012+] If you are using SQL Server 2012, you can use the new Columnstore Indexes.
[SQL Server 2005+] If you are filtering a text column, you can use Full-Text Search
If you have some function that you apply frequently in some column (like SOUNDEX of column, for example), you could create PERSISTED COMPUTED COLUMN to not having to compute this value everytime.
Use temp tables (indexed ones will be much better) to reduce the number of rows to work on.
#Twelfth comment is very good:
"I think you need to create an ETL process and start changing this into a fact table with dimensions."
Changing my comment into an answer...
You are moving from a transaction world where these 90-100 million records are recorded and into a data warehousing scenario where you are now trying to slice, dice, and analyze the information you have. Not an easy solution, but odds are you're hitting the limits of what your current system can scale to.
In a past job, I had several (6) data fields belonging to each record that were pretty much free text and randomly populated depending on where the data was generated (they were search queries and people were entering what they basically would enter in google). With 6 fields like this...I created a dim_text table that took each entry in any of these 6 tables and replaced it with an integer. This left me a table with two columns, text_ID and text. Any time a user was searching for a specific entry in any of these 6 columns, I would search my dim_search table that was optimized (indexing) for this sort of query to return an integer matching the query I wanted...I would then take the integer and search for all occourences of the integer across the 6 fields instead. searching 1 table highly optimized for this type of free text search and then querying the main table for instances of the integer is far quicker than searching 6 fields on this free text field.
I'd also create aggregate tables (reporting tables if you prefer the term) for your common aggregates. There are quite a few options here that your business setup will determine...for example, if each row is an item on a sales invoice and you need to show sales by date...it may be better to aggregate total sales by invoice and save that to a table, then when a user wants totals by day, an aggregate is run on the aggreate of the invoices to determine the totals by day (so you've 'partially' aggregated the data in advance).
Hope that makes sense...I'm sure I'll need several edits here for clarity in my answer.