multiple different record types within same file must be converted to sql table

multiple different record types within same file must be converted to sql table - sql

One for the SQL data definition gurus:
I have a mainframe file that has about 35-100 different record types within it. Depending upon the type of record the layout and each column is redefined into whatever. Any column on any different record could become a different length or type. I am not really wanting to split this thing up into 35-100 different tables and relating them together. I did find out that postgres has %ROWTYPE with cursor or table based records. However in all examples the data looked the same. How can I setup a table that would handle this and what sql queries would be needed to return the data? Doesn't have to be postgres but that was the only thing I could find, that looked similar to my problem.

I would just make a table with all TEXT datatype fields at first. TEXT is variable, so it only takes up the space it needs, so it performs very well. From there, you may find it quicker to move the data into better formed tables, if certain data is better with a more specific data type.
It's easier to do it in this order, because bulk insert with COPY is very picky... so with TEXT you just worry about the number of columns and get it in there.
EDIT: I'm referring to Postgres with this answer. Not sure if you wanted another DB specific answer.

Related

How to tag and store log descriptions in a SQL database

I have logs being captured that contains both a log message and a log "tag".
These "tags" describe the logging events as key-value pairs, for example:
CPU_USAGE: 52.3
USER_LOGGED_IN: "steve15"
NUMBER_OF_RETURNED_RESULTS: 125
I want to store these key-value pairs in a table. Each parameter can be either a string, float, or integer so I can't put all keys in one column and all values in a 2nd column. What is a good SQL table design to store this kind of data? How is the "tagging" of logs typically done? My ultimate goal is to be able to funnel this information into a dashboard so I can monitor resource usage and bottlenecks on charts.
A constraint is that I would like to be able to add new keys without needing to modify my database schema, so having each parameter as a separate column isn't good.
Various solutions I have thought of are:
Storing each value as a string and adding another column in my "tag" table that dictates the actual type
Use a JSON column
Just altering my table to add a new column every time I think of a new tag to log :(

In SQL, key-value pairs are often stored just using strings. You can use a view or application code to convert back to a more appropriate data type.
I notice that you have left out date/times -- those are a little trickier, because you really want canonical formats such as YYYYMMDD. Or perhaps Unix epoch times (number of seconds since 1970-01-01).
You can extend this by having separate columns for each type you want to store and having a separate type columns.
However, a key-value pair may not be the best approach for this type of data. A common solution is to do the following:
Determine if any columns are "common" enough that you really care about them. This might commonly be a date/time or user or multiple such columns.
Store these columns in separate columns, parsing them when you insert the data (or perhaps using triggers).
Store the rest (or all) as JSON within the row as "extra details".
Or, in Postgres, you can just store the whole thing as JSONB and have indexes on the JSONB column. That gives you both performance and simplicity on inserting the data.

In PostgreSQL, efficiently using a table for every row in another table

I am sorry for the lack of notation in my question but I am not too familiar with SQL. Despite searching the internet for a decent amount of hours, I couldn't find that how to do efficiently what I wanted to do, but that is maybe because I am not familiar with the notation. Here comes the question:
I want to create a table, say Forms, in which each Form row has an ID, some metadata and a pointer(?) to the table of that Form row, lets say Form12 table, which directs me to Form12 table. I need it because every Form has different number, name and type of columns depending on users configuration for a particular Form.
So, I thought I can put the Table ID of Form12 as a column to Form table. But is this approach considered OK, or is there a better way to do it?
Thank you for your time.

Storing the names of tables in a column is generally not a good solution in a relational database. In order to use the information, you need to use dynamic SQL.
I would instead ask why you cannot store the information in a single table or well-defined sets of tables. Postgres has lots of options to help with this:
NULL data values, so columns do not need to be filled in.
Table inheritance, so tables can share columns.
JSON columns to support a flexible set of columns.
Entity-attribute-value (EAV) data models, which allow for lots of flexibility.

Query Performance: single column vs multiple column

Which table structure is better among below 2?
OR
in first query i use LIKE operator in query.
In second i use AND operator.
Does first table design has any advantages over second on selecting data?
On what situation i need to decide between first table structure and second one?

The first one would be better if you never needed to work with the Type or the Currency attributes in any way and you used allways only the whole text stored as MEASUREMENT_NAME.
If you plan to work with the values of the Type or the Currency attributes separately, like using their values in where conditions etc., the second option will allways be a better choice.
You can also eventually create a combined structure containing both the whole text MEASUREMENT_NAME and separated values Type & Currency for filtering purposes. This would take more space on disk and would not be optimized, but the whole text or MEASUREMENT_NAME can in the future also eventually contain atributes that are now unknown to you. That could be the reason for storing MEASUREMENT_NAME in the raw format.
In case that the attribute MEASUREMENT_NAME is not something you get from external sources, but it is a data structure made by yourself and you seek a way how to store records with flexible (changing) structure, you better store it as JSON or XML data, Oracle has built in functions for JSON data.
I also recommend to use linked tables for Type or Currency values, so that the main table contains only ID link as a foreign key.

Second table obviously have advantages over first. If you have to query type or currency from first table, you might have to use right, left or any other functions.
Also if you align keys/constraints for second table, follows 2nd normal form.

SQL lookup in SELECT statement

I've got and sql express database I need to extract some data from. I have three fields. ID,NAME,DATE. In the DATA column there is values like "654;654;526". Yes, semicolons includes. Now those number relate to another table(two - field ID and NAME). The numbers in the DATA column relate to the ID field in the 2nd table. How can I via sql do a replace or lookup so instead of getting the number 654;653;526 I get the NAME field instead.....
See the photo. Might explain this better
http://i.stack.imgur.com/g1OCj.jpg

Redesign the database unless this is a third party database you are supporting. This will never be a good design and should never have been built this way. This is one of those times you bite the bullet and fix it before things get worse which they will. Yeu need a related table to store the values in. One of the very first rules of database design is never store more than one piece of information in a field.
And hopefully those aren't your real field names, they are atriocious too. You need more descriptive field names.
Since it a third party database, you need to look up the split function or create your own. You will want to transform the data to a relational form in a temp table or table varaiable to use in the join later.

The following may help: How to use GROUP BY to concatenate strings in SQL Server?

This can be done, but it won't be nice. You should create a scalar valued function, that takes in the string with id's and returns a string with names.

This denormalized structure is similar to the way values were stored in the quasi-object-relational database known as PICK. Cool database, in many respects ahead of its time, though in other respects, a dinosaur.
If you want to return the multiple names as a delimited string, it's easy to do with a scalar function. If you want to return the multiple rows as a table, your engine has to support functions that return a type of TABLE.

do i need a separate table for nvarchar(max) descriptions

In one of my very previous company we used to have a separate table that we stored long descriptions on a text type column. I think this was done because of the limitations that come with text type.
Im now designing the tables for the existing application that I am working on and this question comes to my mind. I am resonating towards storing the long description of my items on the same item table on a varchar(max) column. I understand that I cannot index this column but that is OK as I will not be doing searches on these columns.
So far I cannot see any reason to separate this column to another table.
Can you please give me input if I am missing on something or if storing my descriptions on the same table on varchar(max) is good approach? Thanks!

Keep the fields in the table where they belong. Since SQL Server 2005 the engine got a lot smarter in regard to large data types and even variable length short data types. The old TEXT, NTEXT and IMAGE types are deprecated. The new types with MAX length are their replacement. With SQL 2005 each partition has 3 types of underlying allocation units: one for rows, one for LOBs and one for row-overflow. The MAX types are stored in the LOB allocation unit, so in effect the engine is managing for you a separate table to store large objects. The row overflow unit is for in-row variable length data that after an update would no longer fit in the page, so it is 'overflown' into a separate unit.
See Table and Index Organization.

It depends on how often you use them, but yes, you may want the on a separate table. Before you make the decision, you'll want to read up on SQL file paging, page splits, and the details of "how" sql stores the data.
The short answer is that varcharmax() can definitely cause a decrease in performance where those field lengths change a lot due to an increase in page splits which are expensive operations.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

multiple different record types within same file must be converted to sql table - sql

Related

How to tag and store log descriptions in a SQL database

In PostgreSQL, efficiently using a table for every row in another table

Query Performance: single column vs multiple column

SQL lookup in SELECT statement

do i need a separate table for nvarchar(max) descriptions

Categories

Resources