Create master table for status column - sql

I have a table that represent a request sent through frontend
coupon_fetching_request
---------------------------------------------------------------
request_id | request_time | requested_by | request_status
Above I tried to create a table to address the issue.
Here request_status is an integer. It could have some values as follows.
1 : request successful
2 : request failed due to incorrect input data
3 : request failed in otp verification
4 : request failed due to internal server error
That table is very simple and status is used to let frontend know what happened to sent request. I had discussion with my team and other developers were proposing that we should have a status representation table. At database side we are not gonna need this status. But team was saying that in future we may need to show simple output from database to show what is the status of all request. According to YAGNI principle I don't think it is a good idea.
Currently I have coded to convert returned request_status value to descriptive value at frontend. I tried to convince team that I can creat an enumuration at business layer to represent meaning of the status OR I could add documentation at frontend and in java but failed to convince them.
The table proposed is as follows
coupon_fetching_request_status
---------------------------------------------------
status_id | status_code | status_description
My question is, Is it necessary to create table for such a simple status in similar cases.
I tried to create simple example to address the problem. In real time the table is to represent a Discount Coupon Code Request and status representing if the code is successfully fetched

It really depends on your use case.
To start with: in you main table, you are already storing request_status as an integer, which is a good thing (if you were storing the whole description, like 'request successful', that would not be optimized).
The main question is: will you eventually need to display that data in a human-readable format?
If no, then it is probably useless to create a representation table.
If yes, then having a representation table would be a good thing, instead of adding some code in the presentation layer to do the transcodification; let the data live in the database, and the frontend take care of presentation only.
Since this table can be easily created when needed, a pragmatic approach would be to hold on until you have a real need for the representation table.

You should create the reference table in the database. You currently have business logic on the application side, interpreting data stored in the database. This seems dangerous.
What does "dangerous" mean? It means that ad-hoc queries on the database might need to re-implement the logic. That is prone to error.
It means that if you add a reporting front end, then the reports have to re-implement the logic. That is prone to error and a maintenance nightmare.
It means that if you have another developer come along, or another module implemented, then the logic might need to be re-implemented. Red flag.
The simplest solution is to have a reference table to define the official meanings of the codes. The application should use this table (via join) to return the strings. The application should not be defining the meaning of codes stored in the database. YAGNI doesn't apply, because the application is so in need of this information that it implements the logic itself.

Related

What is the best method to extract a recurring blob data and put in another table ? - SQL

I'm developing a new webpage in (.NET framework, if that helps) for the below scenario. Every single day, we get a cab drivers report.
Date | Blob
-------------------------------------------------------------
15/07 | {"DriverName1":"100kms", "DriverName2":"10kms", "Hash":"Value"...}
16/07 | {"DriverName1":"50kms", "DriverName3":"100kms", "Hash":"Value"}
Notice that the 'Blob' is the actual data received in json format - contains information about the distance covered by a driver at that particular day.
I have written a service which reads the above table & further breaks down this and puts it into a new table like below:
Date | DriverName | KmsDriven
15/07 DriverName1 100
15/07 DriverName2 10
16/07 DriverName3 100
16/07 DriverName1 50
By populating this, I can easily do the following queries:
How many drivers drove on that particular day.
How is 'DriverName1' did for that particular week, etc.,
My questions here are:
Are there anything in .NET / SQL world to specifically address this or let me know if I am reinventing the wheel here.
Is this the right way to use the Blob data ?
Are there any design patterns to adhere here to ?
Are there anything in .NET / SQL world to specifically address this or
let me know if I am reinventing the wheel here.
Well, there are JSON parsers available, for example Newtonsoft's Json.NET. Or you can use SQL Server's own functions. Once you have extracted individual values from JSON, you can write them into corresponding columns (in your new table).
Is this the right way to use the Blob data?
No. It violates the principle of atomicity, and therefore the first normal form.
Are there any design patterns to adhere here to?
I'm not sure about "patterns", but I don't see why would you need a BLOB in this case.
Assuming the data is uniform (i.e. it always has the same fields), you can just declare the columns you need and write directly to them (as you already proposed).
Otherwise, you may consider using SQL Server's XML data type, which will enable you to extract some of the sections within an XML document, or insert a new section without replacing your whole document.

How to properly store a JSON object into a Table?

I am working on a scenario where I have invoices available in my Data Lake Store.
Invoice example (extremely simplified):
{
"business_guid":"b4f16300-8e78-4358-b3d2-b29436eaeba8",
"ingress_timestamp": 1523053808,
"client":{
"name":"Jake",
"age":55
},
"transactions":[
{
"name":"peanut",
"amount":100
},
{
"name":"avocado",
"amount":2
}
]
}
All invoices are stored in ADLS, and can be queried. But, It is my desire to provide access to the same data inside an ALD DB.
I am not an expert on unstructed data: I have RDBMS background. Taking that into consideration, I can only think of 2 possible scenarios:
2/3 tables - invoice, client (could be removed) and transaction. In this scenario, I would have to create an invoice ID to be able to build relationships between those tables
1 table - client info could be normalized into invoice data. But, transactions could (maybe) be defined as an SQL.ARRAY<SQL.MAP<string, object>>
I have mainly 3 questions:
What is the correct way of doing so? Solution 1 seems much better structured.
If I go with solution 1, how do I properly create an ID (probably GUID)? Is it acceptable to require ID creation when working with ADL?
Is there another solution I am missing here?
Thanks in advance!
This type of question is a bit like do you prefer your sauce on the pasta or next to the pasta :). The answer is: it depends.
To answer your 3 questions more seriously:
#1 has the benefit of being normalized that works well if you want to operate on the data separately (e.g., just clients, just invoices, just transactions) and want to the benefits of normalization, get the right indexing, and are not limited by the rowsize limits (e.g., your array of map needs to fit into a row). So I would recommend that approach unless your transaction data is always small and you always access the data together and mainly search on the column data.
U-SQL per se has no understanding of the hierarchy of the JSON document. Thus, you would have to write an extractor that turns your JSON into rows in a way that it either gives you the correlation of the parent to the child (normally done by stepwise downwards navigation with cross apply) and use the key value of the parent data item as the foreign key, or have the extractor generate the key (as int or guid).
There are some sample JSON extractors on the U-SQL GitHub site (start at http://usql.io) that can get you started with the JSON to rowset conversion. Note that you will probably want to optimize the extraction at some point to be JSON Reader based so you process larger docs without loading it into memory.

database schema for http transactions

I have a script that makes a http call to a webservice, captures the response and parses it.
For every transaction, I would like to save the following pieces of data in a relational DB.
HTTP request time
HTTP request headers
HTTP response time
HTTP response code
HTTP response headers
HTTP response content
I am having a tough time visualizing a schema for this.
My initial thoughts were to create 2 tables.
Table 'Transactions':
1. transaction id (not null, not unique)
2. timestamp (not null)
3. type (response or request) (not null)
3. headers (null)
4. content (null)
5. response code (null)
'transaction id' will be some sort of checksum derived from combining the timestamp with the header text.
The reason why i compute this transaction id is to have a unique id that can distinguish 2 transactions, but at the same time used to link a request with a response.
What will this table be used for?
The script will run every 5 minutes, and log all this into the DB. Plus, every time it runs, the script will check the last time a successful transaction was made. Also, at the end of the day, the script generates a summary of all the transactions made that day and emails it.
Any ideas of how i can improve on this design? What kinda normalization and/or optimization techniques i should apply to this schema? Should i split this up into 2 or more tables?
I decided to use a NoSQL approach to this, and it has worked. Used MongDB. The flexibility it offers with document structure and not having to have a fixed number of attributes really helped.
Probably not the best solution to the problem, but i was able to optimize the performance using compound indexes.

What is a simple way to tell if a row of data has been changed?

If I have a row of data like:
1, 2, 3
I can create a checksum value that is the sum of all of the columns, 1 + 2 + 3 = 6. We can store this value with the row in the 4th column:
1, 2, 3, 6
I can then write a program to check to see if any of the values in the columns changed accidentally if the sum of the columns don't match the checksum value.
Now, I'd like to take this a step further. Let's say I have a table of values that anyone has read/write access to where the last column of data is the sum of the previous columns as described earlier.
1, 2, 3, 6
Let's say someone wants to be sneaky and change the value in the third column
1, 2, 9, 6
The checksum is easy to reproduce so the sneaky individual can just change the checksum value to 1 + 2 + 9 = 12 so that this row appears not to be tampered with.
1, 2, 9, 12
Now my question is, how can I make a more sophisticated checksum value so that a sneaky individual can't make this type of change without making the checksum no longer valid? Perhaps I could create a blackbox exe that given the first three values of the row can give a checksum that is a little more sophisticated like:
a^2 + b^2 + c^2
But while this logic is unknown to a sneaky user, he/she could still input the values into the black box exe and get a valid checksum back.
Any ideas on how I can make sure all rows in a table are untampered with? The method I'm trying to avoid is saving a copy of the table every time it is modified legitimately using the program I am creating. This is possible, but seems like a very unelegant solution. There has to be a better way, right?
Using basic math your checksum is invalid:
a^2 +b^2 +c^2
a=0,b=0,c=2 = checksum 4
a=2,b=0,c=0 = checksum 4
If you want a set of "read-only" data to the users, consider using materialized views. A materialized view will compute the calculation a head of time i.e. your valid data and serve that to the users, while your program can do modifications in the background.
Further this is the reason why privileges exist, if you only supply accounts that cannot modify the database for instance read-only access, this mitigates the issue of someone tampering with data. Also you cannot fully prevent a malicious user from tampering with data only make them jump through several hoops in hopes they get bored / blocked temporarily.
There is no silver bullet for security, what you can do is use a defense in depth mindset that would consist of the following features:
Extensive Logging,
Demarcation of responsibilities,
Job rotation,
Patch management,
Auditing of logs (goes together with logging, but someone actually has to read them),
Implement a HIPS system (host intrusion prevention system),
Deny outside connections to the database
The list can go on quite extensively.
You seem to be asking, "how can I give a program a different set of security permissions to the user running it?" The way to do this is to make sure the program is running in a different security context to the user. Ways of doing this vary by platform.
If you have multiple machines, then running a client server architecture can help. You expose a controlled API through the server, and it has the security credentials for the database. Then your user can't make arbitrary requests.
If you're the administrator of the client machine, and the user isn't then you may be able to have separate processes doing something similar. E.g. a daemon in unix. I think DCOM in windows lets you do something like this.
Another approach is to expose your API through stored procedures, and only grant access to these, rather than direct access to the table.
Having controlled access to a limited API may not be enough. Consider, for example, a table that stores High Scores in a game. It doesn't matter that it can only be accessed through a ClaimHighScore API, if the user can enter arbitrary values. The solution for this in games is usually complicated. The only approach I've heard of that works is to define the API in terms of a seed value that gave the initial game state, and then a set of inputs with timestamps. The server then has to essentially simulate the game to verify the score.
Users should not have unconstrained write access to tables. Better would be to create sprocs for common CRUD operations. This would let you control which fields they can modify, and if you insist you could update a CRC() checksum or other validation.
This would be a big project, so it may not be practical right now - but it's how things should be done.
Although your question is based on malicious entries to a database the use of the MOD11 can find inaccurate or misplaced values.
The following MySQL statement and SQLfiddle illustrate this
SELECT id, col1, col2, col3, col4, checknum,
9 - MOD(((col1*5)+(col2*4)+(col3*3)+(col4*2) ),9)
AS Test FROM `modtest` HAVING checknum =Test

SQL server string manipulation in a view... Or in XSLT

I have been passed a piece of work that I can either do in my application or perhaps in SQL:
I have to get a date out of a string that may look like this:
1234567-DSP-01/01-VER-01/01
or like this:
1234567-VER-01/01-DSP-01/01
but may look like this:
00 12345 DISCH 01/01-VER-01/01 XXX X XXXXX
Yay. if it is a "DSP" then I want that date, if a "DISCH" then that date.
I am pulling the data out in a SQL Server view and would be happy to have the view transform the data for me. My application could do it but would add processor time. I could also see if the data could be manipulated before it is entered into the DB, I suppose.
Thank you for your time.
An option would be to check for the presence of DSP or DISCH then substring out the date as necessary.
For example (I don't have sqlserver today so I can verify syntax, sorry)
select
date = case date_attribute
when charindex('DSP',date_attribute) > 0 then substring(date_attribute,beg,end)
when charindex('DISCH',date_attribute) > 0 then substring(date_attribute,beg,end)
else 'unknown'
end
from myTable
don't store multiple items in the same column!
store the date in its own column when inserting the row!
add a new nullable column for the date
write an update that pulls the date out and sets the new column
alter the column to be not nullable
fix your save routine to pull the date out and insert it in for you
If you do it in the view your adding processing time on SQL which in general a more expensive resource then an app, web or some other type of client.
I'd recommend you try and format the data out when you insert the data, or you handle in the application tier. Scaling horizontally an app tier is so much easier then scalling your SQL.
Edit
I am talking the database server's physical resources are usually more expensive then a properly designed applications server's physical resources. This is because it is very easy to scale an application horizontally, it is in my opinion an order of magnitude more expensive to scale a DB server horizontally. Especially if your dealing with a transactional database and need to manage merging
I am not saying it is not possible just that scaling a database server horizontally is a much more difficult task, hence it's more expensive. The only reason I pointed this out is the OP raised a concern about using CPU cycles on the app server vs the database server. Most applications I have worked with have been data centric applications which processed through GB's of data to get a user an answer. We initially put everything on the database server because it was easier then doing it in classic asp and vb6 at the time. Over time the DB server was more and more loaded until scaling veritcally was no longer an option.
Database Servers are also designed at retrieving and joining data together. You should leave the formating of the data to the application and business rules (in general of course)