So I have some quite large denormalized tables that have multiple columns that contain comma separated values.
The CSV values vary in length from column to column. One table has 30 different columns that can contain CSV's! For reporting purpose I need to do a count on the CSV values for each column (essentially different types)
Having never done this before what is my best approach?
Create a new table using a CSV split method to populate and have a
type field and type table for the different types?
Use the XML approach using XPath and the .nodes() and .value()
methods to split each column on the fly and perform a count as I go
or should I create some views that would show me what I want.
Please advise
Related
We have a fixed length flat file which is stored in a table as a single column (say table name: flatfile1). We have another table (Metadata) where the file format is stored with start and end positions of each fields.
Now we want to write a select sql to separate the fields from table flatfile1 by reading the positions from table Metadata.
We have 50+ such flat files where we are trying to figure out a reusable approach to write the sql. Each of these files will have different number of fields with different length.
How could we go about this?
I have over 100 .csv files which I imported/bulk loaded into SQL Server, all columns are of type nvarchar(255). Now I have over 100 tables in the database.
Only after doing that, I got pre-defined SQL creation scripts for the same tables with selected and pre-defined columns data types, e.g datetime, decimal, int, etc..
My question here: is there fast way to copy the data from the bulk tables to the pre-defined ones? The insert won't work because of the conflict between the datatypes. Tried to alter tables/columns for changing the character lengths for columns of nvarchar types and it works fine/fast. The challenge is with the rest of the columns and with the data conversion, the tables have over 1000 columns altogether.
I would like to use csv to describe a database schema. I found examples but for a single table, is there a standardized specification (or ideas, tracks etc.) for describing multiple (and linked) tables ?
CSV file only can have data with delimiter ( one line for one row and field separated with another delimiter)
So if you store data from different table in the same CSV, all data will be added to only one table.
best way is create different csv or choose another format ( why not sql ?)
I'm wanting to compare data in two powerpivot tables.
Is there a method in PowerPivot to compare two tables of data?
Or alternatively ...
I've created a "key" calculated column (as concatenation of 6 columns using '&') and I am creating a calculated column from all the remaining data - about 100 columns.
Is there a method / function that will allow me create that calculated column?
Edit: the reason is to perform data comparison checks on data before and after a data migration. Additionally, PowerPivot was dictated as being the technology of choice for this solution, much easier might have been using one of the RedGate compares.
The best answer I could find was what I was originally doing.
Create a string concatenation of the 6 key columns as a CompoundKey Column
Create a string concatenation of the 100 (approx) data columns as a CombinedData Column
After initially checking that there were identical number of observations, I then did ordered the data in each table by the CompoundKey and performed a comparison of table1.CompoundKey to table2.CompoundKey and table1.CombinedData to table2.CombinedData.
This enabled me to find the Keys that were different between the two datasets then additionally to find any rows of data that were different for matching key rows.
I am creating a project in VB.NET in which one of the reports require that the name of employees should be displayed as column names and whatever work they have done for a stated period should appear in rows below that particular column.
Now as it is clear, the columns will have to be added at runtime. Am using an ODBC Data source to populate the grid. Also since a loop will have to be done to find out the work done by employees individually, so the number of rows under one column might be less or more than the rows in the next column.
Is there a way to create an empty data table and then update its contents based on columns and not add data based on addition of new rows.
Regards
A table consists of rows and columns: the rows hold the data, the columns define the data.
So you have no other choice than to add at least that many rows as your longest column will need. You could just fill up empty values in the other columns. That should give you the view you need.
Wouldn't it be better to simply switch the table orientation?
If most of your columns are names or maybe regroupment I dont' know,
then you'd have one column for each of the data you could display,
And you'd add each rows with the names and stats dynamically, which is more common.
I'm only suggesting that, because I don't know all your table structure.