I would like to use csv to describe a database schema. I found examples but for a single table, is there a standardized specification (or ideas, tracks etc.) for describing multiple (and linked) tables ?
CSV file only can have data with delimiter ( one line for one row and field separated with another delimiter)
So if you store data from different table in the same CSV, all data will be added to only one table.
best way is create different csv or choose another format ( why not sql ?)
Related
We have a fixed length flat file which is stored in a table as a single column (say table name: flatfile1). We have another table (Metadata) where the file format is stored with start and end positions of each fields.
Now we want to write a select sql to separate the fields from table flatfile1 by reading the positions from table Metadata.
We have 50+ such flat files where we are trying to figure out a reusable approach to write the sql. Each of these files will have different number of fields with different length.
How could we go about this?
I'm using Oracle's DB.I have a table say T. It has following columns id, att1,att2,att3. Now for a large amount of data att3 is blank. I've created a csv file which contains data in the format id,att3 it has a lot of data. How do I updatefrom this file to existing rows?
Any way of doing it via pl/SQL
Which database engine are you using? The answer might vary depending on that.
But this post here How to update selected rows with values from a CSV file in Postgres? is similar to what you're asking, I'm sure you can adapt it to your needs.
I am attempting to fix the schema of a Bigquery table in which the type of a field is wrong (but contains no data). I would like to copy the data from the old schema to the new using the UI ( select * except(bad_column) from ... ).
The problem is that:
if I select into a table, then Bigquery is removing the required of the columns and therefore rejecting the insert.
Exporting via json loses information on dates.
Is there a better solution than creating a new table with all columns being nullable/repeated or manually transforming all of the data?
Update (2018-06-20): BigQuery now supports required fields on query output in standard SQL, and has done so since mid-2017.
Specifically, if you append your query results to a table with a schema that has required fields, that schema will be preserved, and BigQuery will check as results are written that it contains no null values. If you want to write your results to a brand-new table, you can create an empty table with the desired schema and append to that table.
Outdated:
You have several options:
Change your field types to nullable. Standard SQL returns only nullable fields, and this is intended behavior, so going forward it may be less useful to mark fields as required.
You can use legacy SQL, which will preserve required fields. You can't use except, but you can explicitly select all other fields.
You can export and re-import with the desired schema.
You mention that export via JSON loses date information. Can you clarify? If you're referring to the partition date, then unfortunately I think any of the above solutions will collapse all data into today's partition, unless you explicitly insert into a named partition using the table$yyyymmdd syntax. (Which will work, but may require lots of operations if you have data spread across many dates.)
BigQuery now supports table clone features. A table clone is a lightweight, writeable copy of another table
Copy tables from query in Bigquery
So I have some quite large denormalized tables that have multiple columns that contain comma separated values.
The CSV values vary in length from column to column. One table has 30 different columns that can contain CSV's! For reporting purpose I need to do a count on the CSV values for each column (essentially different types)
Having never done this before what is my best approach?
Create a new table using a CSV split method to populate and have a
type field and type table for the different types?
Use the XML approach using XPath and the .nodes() and .value()
methods to split each column on the fly and perform a count as I go
or should I create some views that would show me what I want.
Please advise
I am creating a table from a CSV file and was wondering where would I find a doc that will tell me what columns names are acceptable? I know you cant have "/" or spaces in the columns names.
What is the fastest way to clean a csv and turn it into a sql table?
There are a couple of recommendations:
Uvoid special characters (except for underscore)
Use CamelCase if you don't want to use unserscore
Set a naming convention for your company that you will enforce
Here are some useful articles:
General Naming Conventions
SQL Server Standards
A fast way would be to create the database and table in sql, or the platform you are using. Then create each column with type/restrictions in the order the CSV file is in. Then upload the CSV file.