I'm a newbie at web development, so here's a simple question. I've been doing a few tutorials in Django, setting up an SQL database, which is all good. I have now come across the JSON format, which I am not fully understanding. The definition on Wikipedia is: It is used primarily to transmit data between a server and web application, as an alternative to XML. Does this mean that JSON is a database like SQL? If not, what is the difference between SQL and JSON?
Thank you!
JSON is data markup format. You use it to define what the data is and means. eg: This car is blue, it has 4 seats.
{
"colour": "blue",
"seats": 4
}
SQL is a data manipulation language. You use it to define the operations you want to perform on the data. eg: Find me all the green cars. Change all the red cars to blue cars.
select * from cars where colour = 'green'
update cars set colour='blue' where colour='red'
A SQL database is a database that uses SQL to query the data stored within, in whatever format that might be. Other types of databases are available.
They are 2 completely different things.
SQL is used to communicate with databases, usually to Create, Update and Delete data entries.
JSON provides a standardized object notation/structure to talk to web services.
Why standardized?
Because JSON is relatively easy to process both on the front end (with javascript) and the backend. With no-SQL databases becoming the norm, JSON/JSON-like documents/objects are being used in the database as well.
Absolutely not. JSON is the data format in order to pass the data from the sender to the receiver. SQL is the language used by relational databases in order to define data structures and query the information from them. JSON is not associated with any way to store or retrieve the data.
JSON isn't a database, but there isn't anything stopping you from using JSON in a database. Mongo DB is a database that uses JSON (it's actually BSON behind closed doors) to communicate with the database. If you enjoy using JSON and you understand it, I recommend looking into Mongo!
Related
I am having a input JSON which I need to feed into a database. We are exploring on whether to normalize or not our database tables.
Following is the structure for the input data (json):
"attachments": [
{
"filename": "abc.pdf",
"url": "https://www.web.com/abc.pdf",
"type": "done"
},
{
"filename": "pqr.pdf",
"url": "https://www.web.com/pqr.pdf",
"type": "done"
},
],
In the above example, attachments could have multiple values (more than 2, upto 8).
We were thinking of creating a different table called DB_ATTACHMENT and keep all the attachments for a worker down there. But the issue is we have somewhat 30+ different attachment type array (phone, address, previous_emp, visas, etc.)
Is there a way to store everything in ONE table (employee)? One I can think of is using a single column (ATTACHMENT) and add all the data in 'delimited-format' and have the logic at target system to parse and extract everything..
Any other better solution?
Thanks..
Is there a way to store everything in ONE table (employee)? One I can
think of is using a single column (ATTACHMENT) and add all the data in
'delimited-format' and have the logic at target system to parse and
extract everything.. Any other better solution?
You can store the data in a single VARCHAR column as JSON, then recover the information in the client decoding this JSON data.
Also, there are already some SQL implementations offering native JSON datatypes. For example:
mariaDB: https://mariadb.com/kb/en/mariadb/column_json/
mySQL: https://dev.mysql.com/doc/refman/5.7/en/json.html
Database systems store your data and offer you SQL to simplify your search requests in case your data is structured.
It depends on you to decide whether you want to store the data structured to benefit from the SQL or leave the search requester with the burden of parsing it.
It very much depends on how you intend to use the data. I'm not totally sure I understand your question, so I am going to rephrase the business domain I think you're working with - please comment if this is not correct.
The system manages 0..n employees.
One employee may have 0..8 attachments.
An attachment belongs to exactly 1 employee.
An attachment may be one of 30 different types.
Each attachment type may have its own schema.
If attachments aren't important in the business domain - they're basically notes, and you don't need to query or reason about them - you could store them as a column on the "employee" table, and parse them when you show them to the end user.
This solution may seem easier - but don't underestimate the conversion logic - you have to support Create, Read, Update and Delete for each attachment.
If attachments are meaningful in the business domain, this very quickly breaks down. If you need to answer questions like "find all employees who have attached abc.pdf", "find employees who do not have a telephone_number attachment", unpacking each employee_attachment makes your query very difficult.
In this case, you almost certainly need to store attachments in one or more separate tables. If the schema for each attachment is, indeed, different, you need to work out how to deal with inheritance in relational database models.
Finally - some database engines support formats like JSON and XML natively. Yours may offer this as a compromise solution.
I only found answers about how to import csv files into the database, for example as blob or as 1:1 representation of the table you are importing it into.
What I need is a little different: My team and I are tracking everything we do in a database. A lot of these tasks produce logfiles, benchmark results, etc., which are stored in CSV format. The number of columns are far from consistent and also the data could be completely different from file to file, e.g. it could be a log from fraps with frametimes in it or a log of CPU temparatures over an amount of time, or even something completely different.
Long story short, I came up with an idea, but - being far from a sql pro - I am not sure if it makes sense or if there is a more elegant solution.
Does this make sense to you:
We also need to deal with a lot of data that is produced, so please give me also your opinion if that is feasible with like 200 files per day which can easyly have a couple of thousands rows.
The purpose of all this will be, that we can generate reports form the stored data and perform analysis of the data. E.g. view it on a webpage in a graph or do calculations with it.
I'm limited to MS-SQL in this case, because that's what the current (quite complex) database is and I'm just adding a new schema with that functionality to it.
Currently we just archive the files on a raid and store a link to it in the database. So everyone who wants to do magic with the data needs to download every file he needs and then use R or Excel to create a visualization of the data.
Have you considered a column of XML data type for the file data as an alternative of ColumnId -> Data structure? SQL server provides is a special dedicated XML index (over the entire XML structure) so your data can be fully indexed no matter what CSV columns you have. You will have much less records in the database to handle (as an entire CSV file will be a single XML field value). There are good XML query options to search by values & attributes of the XML type.
For that you will need to translate CSV to XML, but you will have to parse it either way ...
Not that your plan won't work, I am just giving an idea :)
=========================================================
Update with some online info:
An article from Simple talk: The XML Methods in SQL Server
Microsoft documentation for nodes() with various use case samples: nodes() Method (xml Data Type)
Microsoft document for value() with various use case samples: value() Method (xml Data Type)
I have some relational data in a SQL Server 2008 database split across 3 tables, which I would like to use to populate some classes that represent them.
The hierarchy is: Products -> Variants -> Options.
I have considered passing back 3 result sets and using LINQ to check if there are any related/child records in the related tables. I've also considered passing back a single de-normalised table containing all of the data from the three tables and reading through the rows, manually figuring out where a product/variant/option begins and ends. Having little to no prior experience with LINQ, I opted to go for the latter, which sort of worked but required many lines of code for something that I had hoped would be pretty straight forward.
Is there an easier way of accomplishing this?
The end goal is to serialize the resulting classes to JSON, for use in a Web Service Application.
I've searched and searched on Google for an answer, but I guess I'm not searching for the right keywords.
After a bit of playing around, I've figured out a way of accomplishing this...
Firstly, create a stored procedure in SQL Server that will return the data as XML. It's relatively easy to generate an XML document containing hierarchical data.
CREATE PROCEDURE usp_test
AS
BEGIN
SELECT
1 AS ProductID
, 'Test' AS ProductDesc
, (
SELECT 1 AS VariantID
'Test'AS VariantDesc
FOR XML PATH('ProductVariant'), ROOT('ProductVariants'), TYPE
)
FOR XML PATH('Product'), ROOT('ArrayOfProduct'), TYPE
END
This gives you an XML document with a parent-child relationship:
<ArrayOfProduct>
<Product>
<ProductID>1</ProductID>
<ProductDesc>Test</ProductDesc>
<ProductVariants>
<ProductVariant>
<VariantID>1</VariantID>
<VariantDesc>Test</VariantDesc>
</ProductVariant>
</ProductVariants>
</Product>
</ArrayOfProduct>
Next, read in the results into the VB.Net application using a SqlDataReader. Declare an empty object to hold the data and deserialize the XML into the object using an XmlSerializer.
At this point, the data that once was in SQL tables is now represented as classes in your VB.Net application.
From here, you can then serialize the object into JSON using JavaScriptSerializer.Serialize.
I have an XML feed of a resume. Each part of the resume is broken down into its constituent parts. For example <employment_history>, <education>, <skills>.
I am aware that I could save each section of the XML file into a database. For example columnID = employment_history | education | skills & then conduct a free text search just on those individual columns. However I would prefer not to do this because it would create duplication of data that is already contained within the XML file and may put extra strain on indexing.
Therefore I wondered if it is possible to conduct a free text search of an XML file within the <employment_history></employment_history> using SQL Server.
If so an example would be appreciated.
Are you aware that SQL Server supports columns with the data type of "XML"? These can contain an entire XML document.
You can also index these columns and you can use XQuery to perform query and data manipulation tasks on those columns.
See Designing and Implementing Semistructured Storage (Database Engine)
Querying xml by doing string searching using sql is probably going to run into a lot of trouble.
Instead, I would parse it into whatever language you're using to interact with your database and use xpath (most languages/environments have some kind of built in or popular 3rd party library) to query it.
I think you can create a function (UDF) that takes the xml text as a parameter then it fetches the data inside tag then you make the filter you want
How do you decide on which side you perform your data manipulation when you can either do it in the code or in the query ?
When you need to display a date in a specific format for example. Do you retrieve the desired format directly in the sql query or you retrieve the date then format it through the code ?
What helps you to decide : performance, best practice, preference in SQL vs the code language, complexity of the task... ?
All things being equal I prefer to do any manipulation in code. I try to return data as raw as possible so its usuable by a larger base of consumers. If its very specialized, maybe a report, then I may do manipulation on the SQL side.
Another instance where I prefer to do manipulation on the SQL side is if it can be done set based.
If its not set based, and looping would be involved, then I would do the manipulation in code.
Basically let the database do what its good at, otherwise do it in code.
Formatting is a UI issue, it is not 'manipulation'.
My answer is the reverse of everyone else's.
If you are going to have to apply the same formatting logic (the same holds true for calculation logic) in more than one place in your application, or in separate applications, I would encapsulate the formatting in a view inside the database and SELECT from the view. You do not need to hide the original data, that can also be available. But by putting the logic into the database view you're making it trivially easy to have consistent formatting across modules and applications.
For instance, a Customer table would have an associated view CustomerEx with a MailingAddress derived column that would format the various parts of the address as required, combining city, state, and zip and compressing out blank lines, etc. My application code SELECTs against the CustomerEx view for addresses. If I extend my data model with, say, an Apt# field or to handle international addresses, I only need to change that single view. I do not need to change, or even recompile, my application.
I would never (ever) specify any formatting in the query itself. That is up to the consumer to decide how to format. All data manipulation should be done at the client side, except for bulk operations.
If it is just formatting and will not always need to be the same formatting, I'd do it in the application which is likely to do this faster.
However the fastest formatting is the one that is done only once, so if it is a standard format that I alawys want to use (say displaying American phone numbers as (###)###-#### ) then I'll store the data in the database in that format (this still may involve the application code, but onthe insert not the select). This is especially true if you might need to reformat a million records for a report. If you have several formats, you might considered calculated columns (we have one for full name and one for lastname, firstname and our raw data is firstname, middlename, lastname, suffix) or triggers to persist the data. In general I say store the data the way you need to see it if you can keep it in the appropriate data type for the real manipulations you need to do such as datemath or regular math for money values.
About the only thing that I do in a query that could probably be done in code also is converting the datetimes to the user's time zone.
MySQL's CONVERT_TZ() function is easy to use and accurate. I store all of my datetimes in UTC, and retrieve them in the user's time zone. Daylight savings rules change. This is especially important for client applications since relying on the native library relies on the fact that the user has updated their OS.
Even for server side code, like a web server, I only have to update a few tables to get the latest time zone data instead of updating the OS on the server.
Other than those types of issues, it's probably best to distribute most functions to the application server or client rather than making your database the bottleneck. Application servers are easier to scale than database servers.
If you can write a stored procedure or something that might start with a large dataset, do some inexpensive calculations or simple iteration to return a single row or value, then it probably makes sense to do it on the server to save from sending large datasets over the wire. So, if the processing is inexpensive, why not have the database return just what you need?
In the case of the date column, I'd save the full date in the DB and when I return it I specify in code how I'd like to show it to the user. This way you can ignore the time part or even change the order of the date parts when you show it in a datagrid for example: mm/dd/yyyy, dd/mm/yyyy or only mm/yyyy.