Union to already massive table vs. create new table - sql

I have a table that I use for creating reports using powerpivot in Excel. The table has 30 columns and ~2M rows, and because of this massive # of columns and rows, the excel file is already ~20MB (and yes, I use every single column in that table).
The business has asked to add a new data in the excel, which will add three new columns and ~1M rows. I'm worried about what that will do to my excel and impact to the size. Unfortunately I cannot use join because the existing table has a different hierarchy level
I was thinking of modifying the stored procedures of the existing table to union the table that includes new data instead of pulling the new table to excel and create join within the powerpivot data model.
what's the "better" approach here? will the excel be smaller if I did a join within excel or have union within SQL?

Better approach is - don't feed Excel with massive union-style tables in order to do a pivot queries.
There are OLAP Cubes for exactly that purpose already invented. Excel can connect to this kind of datasource since quite some time.
In order to create the cube using Microsoft's SQL server take a look at Analysis Services. The name had changed several times and other DB providers also have this kind of support.
Basic concept is the same. Pump your data into an efficient normalized cross-indexed structure. Use Excel and other tools to analyze that structure.
The drawback is that you will not see the data "online" and in case of Microsoft's SQL you need to purchase special license to be able to use it.

Related

Best Approach for Flat table representing data from multiple sources

We are using SQL Server 2014.
We have a couple dozen tables that we create reports against. What is the best approach report against all these tables with respect to SQL. Our need is a single flat table with select/specific columns from a number of these tables.
Options
Create views across the existing live tables using relationships
to produce the single representation of our data
Create some process that is scheduled and run at different times of the data to extract columnar data and insert/update an existing table (then
perform reporting against that table)
Rely on c#/data access code to collect data and push to the reporting tool d) others?
Is there a optimized way to use data warehouse features within SQL to create this single aggregated table?
thx

SQL - multiple tables vs one big table

I want to move multiple SQLite files to PostgreSQL.
Data contained in these files are monthly time-series (one month in a single *.sqlite file). Each has about 300,000 rows. There are more than 20 of these files.
My dilemma is how to organize the data in the new database:
a) Keep it in multiple tables
or
b) Merge it to one huge table with new column describing the time period (e.g. 04.2016, 05.2016, ...)
The database will be used only to pull data out of it (with the exception of adding data for new month).
My concern is that selecting data from multiple tables (join) would not perform very well and the queries can get quite complicated.
Which structure should I go for - one huge table or multiple smaller tables?
Think I would definitely go for one table - just make sure you use sensible indexes.
If you have the space and the resource 1 table, as other users have appropriately pointed out databases can handle millions of rows no problem.....Well depends on the data that is in them. The row size can make a big difference... Such as storing VARCHAR(MAX), VARBINARY(MAX) and several per row......
there is no doubt writing queries, ETL (extract transform load) is significantly easier on a single table! And maintenance of that is easier too from a archival perspective.
But if you never access the data and you need the performance in the primary table some sort of archive might make since.
There are some BI related reasons to maintain multiple tables but it doesn't sound like that is your issue here.
There is no perfect answer and will depend on your situation.
PostgreSQL is easily able to handle millions of rows in a table.
Go for option b) but..
with new column describing the time period (e.g. 04.2016, 05/2016, ...)
Please don't. Querying the different periods will become a pain, an unnecessary one. Just put the date in one column, put a index on the column and you can, probably, execute fast queries on it.
My concern is that selecting data from multiple tables (join) would not perform very well and the queries can get quite complicated.
Complicated for you to write or for the database to execute? An Example would be nice for us to get an image of your actual requirements.

How to dynamically use Arcmain or just partionly copy tables in Teradata?

I need to copy tables from one teradata server to another one pretty much tables. In order to solve this problem, I have been advised to use arcmain. So table can be transfered this way:
logon ZZZZ/YYYY,XXXX;
COPY DATA TABLES
(DATABASENAME.TABLENAME11) (FROM(DATABASENAME.TABLENAME1)),
(DATABASENAME.TABLENAME12) (FROM(DATABASENAME.TABLENAME2)),
(DATABASENAME.TABLENAME13) (FROM(DATABASENAME.TABLENAME3)),
RELEASE LOCK,
FILE=NVDSID1;
However I have some tables with same names among different databases, in addition this table need to transfered just their structure and some rows (let it be WHERE service_quality ='epic'). Is there solution how to just partionally copy tables?
Initially, I have been figured out quite different way:
1) I copy all these tables `s structure to temp database
2) Insert just required rows to them
3) Copy these tables to required DB on another table
But, one more time, tables with same names rune this solution, they just cant be simply pasted to same DB. Is that possibly to do these 3 steps in a loop, adding one more step - drop table to avoid conflicts?
Creating 100 temp DB really bad solution, and there are already tables with 30 symbols long name.
Any ideas?
What we have done in our environment is to build a set of databases where we populate tables containing slices of data that are destined for another environment. This is very similar to your approach except we use multiple databases and don't have redundant object names. We then use ARCMAIN to ARCHIVE these tables and COPY them to the destination environment.
If you have multiple tables that share the same name across databases I would suggest you create multiple databases to seed the slices of data unless the table structures are the same and the intent is to merge the slices on the target environment. Then you can merge the data in these seed tables for your archive process.
Other solutions include using FastExport and FastLoad or Teradata's Data Mover. The latter is likely going to require additional licensing from Teradata if you are not already using it. The former being script driven can be more flexible than ARCMAIN to accommodate the needs of your particular environment.

How to handle many columns / variable schema?

I apologize in advance in case this question has been asked already.
I'm working on revamping a reporting application used within my company. The requirements are:
Support addition of new fields (done through web app) and allow users to select those fields when building reports. Currently there are 300 of these, and right now their values are stored in a single SQL Server table with 300 columns. Users have to be able to select these new fields in report builder. In other words, the schema is dynamic.
Improve report generation performance.
My thought process was to split up these 300 (and potentially more) columns into multiple tables (normalization), but I'm not sure that's the right approach given there doesn't seem to be a logical way of grouping data without ending up with 20+ tables.
Another option would be to store values in rows (key, attribute, attribute-value) then do a pivot, but I'm not sure that would perform well. This option would handle the dynamic schema nicely, but the pivot statements would have to be built programmatically before a user can consume data (views).
Thanks!

DYnamic SQL examples

I have lately learned what is dynamic sql and one of the most interesting features of it to me is that we can use dynamic columns names and tables. But I cannot think about useful real life examples. The only one that came into my mind is statistical table.
Let`s say that we have table with name, type and created_data. Then we want to have a table that in columns are years from created_data column and in row type and number of names created in years. (sorry for my English)
What can be other useful real life examples of using dynamic sql with column and table as parameters? How do you use it?
Thanks for any suggestions and help :)
regards
Gabe
/edit
Thx for replies, I am particulary interested in examples that do not contain administrative things or database convertion or something like that, I am looking for examples where the code in example java is more complicated than using a dynamic sql in for example stored procedure.
An example of dynamic SQL is to fix a broken schema and make it more usable.
For example if you have hundreds of users and someone originally decided to create a new table for each user, you might want to redesign the database to have only one table. Then you'd need to migrate all the existing data to this new system.
You can query the information schema for table names with a certain naming pattern or containing certain columns then use dynamic SQL to select all the data from each of those tables then put it into a single table.
INSERT INTO users (name, col1, col2)
SELECT 'foo', col1, col2 FROM user_foo
UNION ALL
SELECT 'bar', col1, col2 FROM user_bar
UNION ALL
...
Then hopefully after doing this once you will never need to touch dynamic SQL again.
Long-long ago I have worked with appliaction where users uses their own tables in common database.
Imagine, each user can create their own table in database from UI. To get the access to data from these tables, developer needs to use the dynamic SQL.
I once had to write an Excel import where the excel sheet was not like a csv file but layed out like a matrix. So I had to deal with a unknown number of columns for 3 temporary tables (columns, rows, "infield"). The rows were also a short form of tree. Sounds weird, but was a fun to do.
In SQL Server there was no chance to handle this without dynamic SQL.
Another example from a situation I recently came up against. A MySQL database of about 250 tables, all in MyISAM engine and no database design schema, chart or other explanation at all - well, except the not so helpful table and column names.
To plan for conversion to InnoDB and find possible foreign keys, we either had to manually check all queries (and the conditions used in JOIN and WHERE clauses) created from the web frontend code or make a script that uses dynamic SQL and checks all combinations of columns with compatible datatype and compares the data stored in those columns combinations (and then manually accept or reject these possibilities).