Use TOXI Solution in DataBase with Json data 101 - sql

We want to project a new database schema for our Society's application.
The program is developed in c# nancy serverside and react-redux-graphql on clientside.
Our Society often must implement repentine changing for treat new business data. So we want to realise a solid core for the fundamental and no subject to decadence data eg: Article (Code, description, Qty, Value, Price, categoryId).
But often we need to add particular category to an article, or special implementation only for a limited period of time. We are thinking to implement a TOXI like solution for treat those situations.
But in TOXI pattern implementation we wan to add a third table for define each tag data type and definition.
Here is a simple explanatory image:
In the Metadata we have two columns with JSON data: DataType and DefinedValue
DataType define How the program (eventually a func in db) must cast the varchar data in articoli_meta.value
DefinedValue is not null define if the type must have a series of predefined value eg: High, Medium, Low etc...
Those two column are varchar and contain JSON with a predefined standard, a defined standard from our programming team (ev. an sql func for validate those two columns)
I Understand that this kind of approach is not a 'pure' relational approach but we must consider that we often pass data to the client in json format so the DefinedValue column can easily queried as string and passed to interface as data for a dropdown list.
Any ideas, experience or design tips are appreciated

Related

Storing numbers in character type columns

I'm researching the data types of the field of SAP's data table.
I realized that the fields that only store numbers are sometimes varchar or char data types. e.g., KUNNR(Customer Number) and BUKRS(Company Code) are character data types.
What is the objective or benefit of defining the data type of the fields to varchar/char instead of int when determining the field's data type that only contains numbers by its name's definition?
edit: SAP has "numc" data type and for numeric text and "INT1/2/4/8" data type for interger but use char/varchar for Customer Number or Company Code instead. Please help me if you have the idea why they use char data type for the above cases. I'm now trying to create data schema by referencing SAP's data schema.
Pages that referencing details of SAP's table/fields:
http://www.saptables.net/
https://sapstack.com/
Companies in global supply chain exchanges their transaction data through Electronic Data Interchange(EDI). The data is often variable-length XML or something similar. SAP mainly uses varchar to efficiently use the data for their application. The XML message data for transaction vary from business to business; sometimes number sometimes charcter but required to handle and save to it the database.

How can I change a date field from String to Date or DateTime?

I an using Google Big Query and I have a field, named 'AsOfDate' which is set as a string datatype. I have a bunch of data in this field, which I really want to set as DateTime or just Date. Either is fine. I Googled for a solution, and I thought this would be pretty easy to do, but I can't seem to get the data type updated. I don't want to run a simple select statement; I want to permanently change the Schema. Has anyone run into this and figured out how to do this kind of thing? If so, please share your insights. Thanks!
To quote directly from the official documentation: 'Changing a column's data type is not supported by the BigQuery web UI, the command-line tool, or the API.'
https://cloud.google.com/bigquery/docs/manually-changing-schemas#changing_a_columns_data_type
There are two ways to manually change a column's data type:
Using a SQL query — Choose this option if you are more concerned about
simplicity and ease of use, and you are less concerned about costs.
Recreating the table — Choose this option if you are more concerned
about costs, and you are less concerned about simplicity and ease of
use.
You could use either of the approaches above along with the PARSE_DATE() function to transform your string into a date field.
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#parse_date

How to change mileage representation forms in sql

I would like to change the manner in which the mileage is represented in the database. For example, right now the mileage is represented as 080+0.348; this would mean that this particular feature is at mileage point 80.348 along the roadway corridor. I would like to have the data represented in the database in the latter form, 80.348 and so on. This would save me from having to export the dataset to excel for the conversion. Is this even possible? The name of the column is NRLG_MILEPOINT.
Much appreciated.
One thing you could try is to pick the string value apart into its component pieces and then recombine them as a number. If your data is in a table called TEST you might do something like the following:
select miles, fraction,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from test);
SQLFiddle here.
Share and enjoy.
Using the answer provided above, I was able to expand it to get exactly the answer i needed. Thanks a ton to everyone who helped! Here is the query i ended up with:
select distinct nrlg_dept_route,corridor_code_rb,nrlg_county,next_county,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
nrlg_milepoint as nrlg_mile_point
nrlg_dept_route as nrlg_dept_route,
nrlg_county as nrlg_county,
next_county as next_county,
corridor_code_rb as corridor_code_rb,
corridor_code as corridor_code,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from corridor_county_intersect,south_van_data_view)
where nrlg_dept_route = corridor_code
order by 1,5;
There are a variety of ways to do this. Which one depends on your situation, how the data needs to be stored, and how it is being interacted with. Some of these options include:
Changing the datatype.
This option would potentially require you to change how the data is being stored currently. The conversion of the data would have to be done by whatever is writing the data to the schema currently.
Creating another column that stores the data in the correct format.
If you have an existing means of storing the data that would be broken by changing the datatype of NRLG_MILEPOINT and/or you have a requirement to store the data in that format; you could optionally add another column... say NRLG_MILEAGE_DISPLAY that is of a datatype number perhaps, and store the data there. You could make a trigger that updates/inserts NRLG_MILEAGE_DISPLAY appropriately, based on the data in NRLG_MILEPOINT.
If you are just wanting the data to be displayed differently in your select statement, you can convert the datatype in your SQL statement. Specifically how you would do this depends on the current datatype of NRLG_MILEPOINT.
Assuming that varchar2 is the type, based on the comments, here is an SQLFIDDLE link displaying a crude example of option 3. Your usage of this may vary depending on the actual datatype of NRLG_MILEPOINT. Regardless of its datatype... I am sure there is a means of converting how it is displayed in your query. You could take this further and create a view if you needed to. As an inline view or as a stored view, you can then use the converted value for doing your join later.

Why do SQLiteStudio (and others) not display a datetime in human-readable format by default?

Today I had to use a SQLite database for the first time and I really wondered about the display of a DATETIME column like 1411111200. Of course, internally it has to be stored as some integer value to be able to do math with it. But who wants to see that in a grid output, which is clearly for human eyes?
I even tried two programs, SQLiteStudio and SQLite Manager, and both don't even have an option to change this (at least I couldn't find it).
Of course with my knowledge about SQL it didn't take long to find out what the values mean - this query displays it like I expected:
select datetime(timestamp, 'unixepoch', 'localtime'), * from MyTable
But that's very uncomfortable when working with a GUI Tool. So why? Just because? Unix nerds? Or did I just get a wrong impression because I accidentally tried the only 2 Tools which are bad?
(I also appreciate comments on which tools to use or where I can find the hidden settings.)
Probably because sqlite doesn't have a first-class date type — how would a GUI tool know which columns are supposed to contain dates?
The question implies that a column of datatype DATETIME can only hold valid datetimes. But that's not true in SQLite: you can put any number or string value and it will be stored and displayed like it is.
To find out what the most "natural" way for a timestamp in SQLite would be, I created a table like this:
CREATE TABLE test ( timestamp DATETIME DEFAULT ( CURRENT_TIMESTAMP ) );
The result is a display in human readable format (2014-09-22 10:56:07)! But in fact it is saved as string, and I cannot imagine any serious software developer who would like that. Any comments?
That original database from the question, having datetimes as unixepoch, is not because of its table definition, but because the inserted data was like that. And that was probably the best possible option how to do it.
So, the answer is, those tools cannot display the datetime in human readable format, because they cannot know how it was encoded. It can be the number of seconds since 1970 or anything else, and it could even be different from row to row. What a mess.
From Wikipedia:
A common criticism is that SQLite's type system lacks the data
integrity mechanism provided by statically typed columns in other
products. [...] However, it can be implemented with constraints
like CHECK(typeof(x)='integer').
From the authors:
[...] most other SQL database engines are statically typed and so some
people feel that the use of manifest typing is a bug in SQLite. But
the authors of SQLite feel very strongly that this is a feature. The
use of manifest typing in SQLite is a deliberate design decision which
has proven in practice to make SQLite more reliable and easier to use,
especially when used in combination with dynamically typed programming
languages such as Tcl and Python.

Fuzzy matching Informatica vs SQL

We are currently debating whether to implement pairwise matching functions in SQL to perform fuzzy matching on invoice reference numbers, or go down the route of using Informatica.
Informatica is a great solution (so ive heard) however im not familiar with the software.
Has anybody got any experience of its fuzzy match capabilities and the advantages it may offer over building some logic in SQL.
Thanks
Parser transformation can be used in Informatica do the job. Reference Data objects can be created in Informatica which will be used to search your given string. The reference data objects are of the following types - Pattern Sets , Probabilistic Models, Reference Tables , Regex , Token sets.
Pattern Sets - A pattern set contains the logic to identify data patterns for eg separating out initials from the name.
Probabilistic Models - A probabilistic model identifies tokens by the types of information they contain and by their positions in an input string.
A probabilistic model contains the following columns:
An input column that represents the data on the input port. You populate the column with sample data from the input port. The model uses the sample data as reference data in parsing and labeling operations.
One or more label columns that identify the types of information in each input string.
You add the columns to the model, and you assign labels to the tokens in each string. Use the label columns to indicate the correct position of the tokens in the string.
When you use a probabilistic model in a Parser transformation, the Parser writes each input value to an output port based on the label that matches the value. For example, the Parser writes the string "Franklin Delano Roosevelt" to FIRSTNAME, MIDDLENAME, and LASTNAME output ports.
The Parser transformation can infer a match between the input port data values and the model data values even if the port data is not listed in the model. This means that a probabilistic model does not need to list every token in a data set to correctly label or parse the tokens in the data set.
The transformation uses probabilistic or fuzzy logic to identify tokens that match tokens in the probabilistic model. You update the fuzzy logic rules when you compile the probabilistic model.
Reference Table - This is a db table for searching
Here it seems that your data is unstructured and you want to extract meaningful data from it. Informatica DataTransformation(DT) tool is good if your data follows some pattern. It is used with UDT transformation inside Informatica PowerCenter. With DT you can create a parser to parse your data and using serializer you can write it to any form you want, later you can do aggregation and other transformations on that data using Informatica PowerCenter's ETL capabilities.
DT is well known for it's capabilities to parse PDF's, forms and invoices. I hope it can solve the purpose.