A way to store array type data into single database table? - sql

I am having a input JSON which I need to feed into a database. We are exploring on whether to normalize or not our database tables.
Following is the structure for the input data (json):
"attachments": [
{
"filename": "abc.pdf",
"url": "https://www.web.com/abc.pdf",
"type": "done"
},
{
"filename": "pqr.pdf",
"url": "https://www.web.com/pqr.pdf",
"type": "done"
},
],
In the above example, attachments could have multiple values (more than 2, upto 8).
We were thinking of creating a different table called DB_ATTACHMENT and keep all the attachments for a worker down there. But the issue is we have somewhat 30+ different attachment type array (phone, address, previous_emp, visas, etc.)
Is there a way to store everything in ONE table (employee)? One I can think of is using a single column (ATTACHMENT) and add all the data in 'delimited-format' and have the logic at target system to parse and extract everything..
Any other better solution?
Thanks..

Is there a way to store everything in ONE table (employee)? One I can
think of is using a single column (ATTACHMENT) and add all the data in
'delimited-format' and have the logic at target system to parse and
extract everything.. Any other better solution?
You can store the data in a single VARCHAR column as JSON, then recover the information in the client decoding this JSON data.
Also, there are already some SQL implementations offering native JSON datatypes. For example:
mariaDB: https://mariadb.com/kb/en/mariadb/column_json/
mySQL: https://dev.mysql.com/doc/refman/5.7/en/json.html

Database systems store your data and offer you SQL to simplify your search requests in case your data is structured.
It depends on you to decide whether you want to store the data structured to benefit from the SQL or leave the search requester with the burden of parsing it.

It very much depends on how you intend to use the data. I'm not totally sure I understand your question, so I am going to rephrase the business domain I think you're working with - please comment if this is not correct.
The system manages 0..n employees.
One employee may have 0..8 attachments.
An attachment belongs to exactly 1 employee.
An attachment may be one of 30 different types.
Each attachment type may have its own schema.
If attachments aren't important in the business domain - they're basically notes, and you don't need to query or reason about them - you could store them as a column on the "employee" table, and parse them when you show them to the end user.
This solution may seem easier - but don't underestimate the conversion logic - you have to support Create, Read, Update and Delete for each attachment.
If attachments are meaningful in the business domain, this very quickly breaks down. If you need to answer questions like "find all employees who have attached abc.pdf", "find employees who do not have a telephone_number attachment", unpacking each employee_attachment makes your query very difficult.
In this case, you almost certainly need to store attachments in one or more separate tables. If the schema for each attachment is, indeed, different, you need to work out how to deal with inheritance in relational database models.
Finally - some database engines support formats like JSON and XML natively. Yours may offer this as a compromise solution.

Related

How do I programmatically build Eloquent queries based on a custom data structure saved in a persistent storage?

I'm building a reporting tool for my Laravel app that will allow users to create reports and save them for later use.
Users will be able to select from a pre-defined list to modify the query, then run the report and save it.
Having never done this before, I was just wondering if it is ok to save the query in the database? This would allow the user to select a saved report and execute the query.
One approach that would be easier / more robust than the suggested approach of saving queries to the database would be build a Controller that constructs the queries based on user input.
You could validate server side that the query parameters match the predefined list of options and then use Eloquent's QueryBuilder to programatically build the queries.
Actual code examples are hard to provide based off of your question however, as it's very broad and doesn't contain any specific examples.
You essentially need to build a converter between your storage mechanism and your data model in PHP. A code example would not add much value because you need to build it based on your needs.
You need to build a data structure (ideally using JSON in this case, since it's powerful enough for this) that defines all the query elements in a way that your business logic is able to read and convert in Eloquent queries.
I have done something similar in the past but for some simple scenarios, like defining variables for queries, instead of actual query elements.
This is how I'd do it, for example:
{
table: 'users',
type: 'SELECT',
fields: ['firstname AS fName', 'lastname AS lName'],
wheres: [
is_admin: false,
is_registered: true
]
}
converts to:
DB::table('users')
->where('is_admin', false)
->where('is_registered', true)
->get(['firstname AS fName', 'lastname AS lName']);
which converts to:
SELECT firstname AS fName, lastname AS lName WHERE is_admin = false AND is_registered = true
Here's an answer about Saving Report Parameters to a db but from a SQL Server Report Service (SSRS) angle.
But it's still a generic enough EAV structure to work for any parameter datatype (strings, ints, dates, etc.).
You might want to skip Eloquent and use mysql stored procedures. Then you only need to save the list of parameters you'd pass to each.
Like the preferred output type (e.g. .pdf, .xlsx, .html), and who to email it to, and who has permission to run it.

DataBase design for store anketing data

my English is not well, so sorry for it.
I want to write the web-app for anketing. I mean, that it must be a site, where user may give answers on different questions. For example, it can be question with text type of answer, or checkbox, or lookup (comboBox).
And my problem is in data base architecture. I read a lot about Entity Attribute Value db pattern, One True Lookup Table I also read. But these patterns has problem (with ms sql) when building a sql-query for data selecting (report).
I hope somebody give me a good suggestion, and tell, what can I do with this proplem.
Thanks!
Almost everything can be represented as a string. Why not store a string in the database e.g. Text Type Answer or "true" "false" or ComboBox Value etc. Then simply convert the value from the database if necessary at runtime or in SQL if writing a query?
I feel Entity Attribute Value pattern is meant more for Entities which can have dynamic fields added etc, not so much for the problem you've posed here.
If necessary you could also add an additional column to the database table to specify the "type" of data being stored. You could then use that column to base your query convert statements on etc.

Difference between JSON and SQL

I'm a newbie at web development, so here's a simple question. I've been doing a few tutorials in Django, setting up an SQL database, which is all good. I have now come across the JSON format, which I am not fully understanding. The definition on Wikipedia is: It is used primarily to transmit data between a server and web application, as an alternative to XML. Does this mean that JSON is a database like SQL? If not, what is the difference between SQL and JSON?
Thank you!
JSON is data markup format. You use it to define what the data is and means. eg: This car is blue, it has 4 seats.
{
"colour": "blue",
"seats": 4
}
SQL is a data manipulation language. You use it to define the operations you want to perform on the data. eg: Find me all the green cars. Change all the red cars to blue cars.
select * from cars where colour = 'green'
update cars set colour='blue' where colour='red'
A SQL database is a database that uses SQL to query the data stored within, in whatever format that might be. Other types of databases are available.
They are 2 completely different things.
SQL is used to communicate with databases, usually to Create, Update and Delete data entries.
JSON provides a standardized object notation/structure to talk to web services.
Why standardized?
Because JSON is relatively easy to process both on the front end (with javascript) and the backend. With no-SQL databases becoming the norm, JSON/JSON-like documents/objects are being used in the database as well.
Absolutely not. JSON is the data format in order to pass the data from the sender to the receiver. SQL is the language used by relational databases in order to define data structures and query the information from them. JSON is not associated with any way to store or retrieve the data.
JSON isn't a database, but there isn't anything stopping you from using JSON in a database. Mongo DB is a database that uses JSON (it's actually BSON behind closed doors) to communicate with the database. If you enjoy using JSON and you understand it, I recommend looking into Mongo!

Removing privacy data from a database?

Say that I needed to share a database with a partner. Obviously I have customer information in that database. Short of going through and identifying every column that contains privacy information and a custom script to 'scrub' the data, is there any tool or script which can scrub the data, but keep the format in tact (for example, if a string is 5 characters, it would stay 5 characters, only scrubbed)?
If not, how would you accomplish something like this, preferably in TSQL?
You may consider only share VIEW, create VIEWs to hide data that you don't want share.
Example:
CREATE VIEW v_customer
AS
SELECT
NAME,
LEFT(CreditCard,5) + '****' As CreditCard -- OR, don't show this column at all
....
FROM customer
Firstly I need to state professional interest I work for IBM which has tools that do exactly this.
Step 1. Ensure you identify all the PII (Personally Identifiable Information). When sharing database information it is typical that the obvious column names like "name" are found but you also need to find the "hidden" data where either the data is embedded in a standard format eg string-name-string and column name is something like "reference code" or is in free format text fields . as you have seen this is not going to be an easy job unless you automate it. The Tool for this is InfoSphere Discovery
Step 2. What context does the "scrubbed" data need to be in. Changing named fields to random characters has problems when testing as users focus on text errors rather than functional failures, therefore change names to real but ficticious. Credit card information often needs to be "valid". by that I mean it needs to have a valid prefix say 49XX but the rest an invalid sequence. Finally you need to ensure that every instance of the change is propogated through the database to maintain consistency. Tool for this is Optim Test Data Management with Data Privacy option.
The two tools integrate to give a full data privacy solution.
Based on the original question, it seems you need the fields to be the same length, but not in a "valid" format? How about:
UPDATE customers
SET email = REPLICATE('z', LEN(email))
-- additional fields as needed
Copy/paste and rename tables/fields as appropriate. I think you're going to have a hard time finding a tool that's less work, unless your schema is very complicated, or my formatting assumptions are incorrect.
I don't have an MSSQL database in front of me right now, but you can also find all of the string-like columns by something like:
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE IN ('...', '...')
I don't remember the exact values you need to compare for, but if you run the query and see what's there, they should be pretty self-explanatory.

Database : best way to model a spreadsheet

I am trying to figure out the best way to model a spreadsheet (from the database point of view), taking into account :
The spreadsheet can contain a variable number of rows.
The spreadsheet can contain a variable number of columns.
Each column can contain one single value, but its type is unknown (integer, date, string).
It has to be easy (and performant) to generate a CSV file containing the data.
I am thinking about something like :
class Cell(models.Model):
column = models.ForeignKey(Column)
row_number = models.IntegerField()
value = models.CharField(max_length=100)
class Column(models.Model):
spreadsheet = models.ForeignKey(Spreadsheet)
name = models.CharField(max_length=100)
type = models.CharField(max_length=100)
class Spreadsheet(models.Model):
name = models.CharField(max_length=100)
creation_date = models.DateField()
Can you think about a better way to model a spreadsheet ? My approach allows to store the data as a String. I am worried about it being too slow to generate the CSV file.
from a relational viewpoint:
Spreadsheet <-->> Cell : RowId, ColumnId, ValueType, Contents
there is no requirement for row and column to be entities, but you can if you like
Databases aren't designed for this. But you can try a couple of different ways.
The naiive way to do it is to do a version of One Table To Rule Them All. That is, create a giant generic table, all types being (n)varchars, that has enough columns to cover any forseeable spreadsheet. Then, you'll need a second table to store metadata about the first, such as what Column1's spreadsheet column name is, what type it stores (so you can cast in and out), etc. Then you'll need triggers to run against inserts that check the data coming in and the metadata to make sure the data isn't corrupt, etc etc etc. As you can see, this way is a complete and utter cluster. I'd run screaming from it.
The second option is to store your data as XML. Most modern databases have XML data types and some support for xpath within queries. You can also use XSDs to provide some kind of data validation, and xslts to transform that data into CSVs. I'm currently doing something similar with configuration files, and its working out okay so far. No word on performance issues yet, but I'm trusting Knuth on that one.
The first option is probably much easier to search and faster to retrieve data from, but the second is probably more stable and definitely easier to program against.
It's times like this I wish Celko had a SO account.
You may want to study EAV (Entity-attribute-value) data models, as they are trying to solve a similar problem.
Entity-Attribute-Value - Wikipedia
The best solution greatly depends of the way the database will be used. Try to find a couple of top use cases you expect and then decide the design. For example if there is no use case to get the value of a certain cell from database (the data is always loaded at row level, or even in group of rows) then is no need to have a 'cell' stored as such.
That is a good question that calls for many answers, depending how you approach it, I'd love to share an opinion with you.
This topic is one the various we searched about at Zenkit, we even wrote an article about, we'd love your opinion on it: https://zenkit.com/en/blog/spreadsheets-vs-databases/