database schema for http transactions - sql

I have a script that makes a http call to a webservice, captures the response and parses it.
For every transaction, I would like to save the following pieces of data in a relational DB.
HTTP request time
HTTP request headers
HTTP response time
HTTP response code
HTTP response headers
HTTP response content
I am having a tough time visualizing a schema for this.
My initial thoughts were to create 2 tables.
Table 'Transactions':
1. transaction id (not null, not unique)
2. timestamp (not null)
3. type (response or request) (not null)
3. headers (null)
4. content (null)
5. response code (null)
'transaction id' will be some sort of checksum derived from combining the timestamp with the header text.
The reason why i compute this transaction id is to have a unique id that can distinguish 2 transactions, but at the same time used to link a request with a response.
What will this table be used for?
The script will run every 5 minutes, and log all this into the DB. Plus, every time it runs, the script will check the last time a successful transaction was made. Also, at the end of the day, the script generates a summary of all the transactions made that day and emails it.
Any ideas of how i can improve on this design? What kinda normalization and/or optimization techniques i should apply to this schema? Should i split this up into 2 or more tables?

I decided to use a NoSQL approach to this, and it has worked. Used MongDB. The flexibility it offers with document structure and not having to have a fixed number of attributes really helped.
Probably not the best solution to the problem, but i was able to optimize the performance using compound indexes.

Related

Is there any time difference between different responded requests,

I am curious about one thing.
For example let's say, I am going to update my email on a website. My userId is: 1.
My request body for a successful request is:
currentEmail: example1#gmail.com,
newEmail: example2#gmail.com,
userId: 1
I will get a successful response for above. But the request below will be failed because there isn't any user with the userId of 2:
currentEmail: example1#gmail.com,
newEmail: example2#gmail.com,
userId: 2
Will there be any reasonable time difference between these two? Because second request won't trigger any database writing.
And also let's say I will try to find a user with a userId.
GET api/findUser/{userId}
If there isn't any user with the userId above, will there be any time difference between successful and failed requests?
If you're "curious" - go ahead and measure it, the real results will depend on the API implementation, DB implementation, caching on DB and ORM levels, etc.
For example if in 1st case the API just calls SQL Update statement - the execution time should be similar, however if the API builds a "user" DTO first - the attempt to amend non-existing user will be faster.
In the latter case my expectation is that attempt to get info for the user which doesn't exist will be faster, however it also "depends".
So you need to inspect the associated code execution footprint, query plans and maybe even load test the database separate from the API.

Create master table for status column

I have a table that represent a request sent through frontend
coupon_fetching_request
---------------------------------------------------------------
request_id | request_time | requested_by | request_status
Above I tried to create a table to address the issue.
Here request_status is an integer. It could have some values as follows.
1 : request successful
2 : request failed due to incorrect input data
3 : request failed in otp verification
4 : request failed due to internal server error
That table is very simple and status is used to let frontend know what happened to sent request. I had discussion with my team and other developers were proposing that we should have a status representation table. At database side we are not gonna need this status. But team was saying that in future we may need to show simple output from database to show what is the status of all request. According to YAGNI principle I don't think it is a good idea.
Currently I have coded to convert returned request_status value to descriptive value at frontend. I tried to convince team that I can creat an enumuration at business layer to represent meaning of the status OR I could add documentation at frontend and in java but failed to convince them.
The table proposed is as follows
coupon_fetching_request_status
---------------------------------------------------
status_id | status_code | status_description
My question is, Is it necessary to create table for such a simple status in similar cases.
I tried to create simple example to address the problem. In real time the table is to represent a Discount Coupon Code Request and status representing if the code is successfully fetched
It really depends on your use case.
To start with: in you main table, you are already storing request_status as an integer, which is a good thing (if you were storing the whole description, like 'request successful', that would not be optimized).
The main question is: will you eventually need to display that data in a human-readable format?
If no, then it is probably useless to create a representation table.
If yes, then having a representation table would be a good thing, instead of adding some code in the presentation layer to do the transcodification; let the data live in the database, and the frontend take care of presentation only.
Since this table can be easily created when needed, a pragmatic approach would be to hold on until you have a real need for the representation table.
You should create the reference table in the database. You currently have business logic on the application side, interpreting data stored in the database. This seems dangerous.
What does "dangerous" mean? It means that ad-hoc queries on the database might need to re-implement the logic. That is prone to error.
It means that if you add a reporting front end, then the reports have to re-implement the logic. That is prone to error and a maintenance nightmare.
It means that if you have another developer come along, or another module implemented, then the logic might need to be re-implemented. Red flag.
The simplest solution is to have a reference table to define the official meanings of the codes. The application should use this table (via join) to return the strings. The application should not be defining the meaning of codes stored in the database. YAGNI doesn't apply, because the application is so in need of this information that it implements the logic itself.

CoreData/SQL vs NSMutableDIctionary for fast retrieval of in memory tables

I have a list where each entry has multiple "columns".
E.g. List of queuedUpRequestsToDownloadData which can grow up to 500. Each Request, has fields like, URLString to use, Content-Type of request, Priority, buffer requirements, has it started, how many retries have completed, taskID etc
It is possible I may have duplicate requests come in or new request that changes priority of one the existing request in the list. So, I need to be able to do things like
Are there any requests in List with matching URL and if yes what is the priority?
Get the next priority X task that has not started?
Which is the task that corresponds to this task-ID?
Is there a speed benefit if I store the list as a SQL table/Core Data Entity instead of a NSMutableDictionary?
I will also try to create some tests to benchmark it, but if someone has some benchmarks handy that would help a lot.

WCF stateful service

Or - at least I think the correct term is stateful. I got a wcf service, listing lots of data back to me. So much data in fact, that I'm exceeding maxrecievedmessagesize - and the program crashes.
I've come to realize that I need to split the calls to the db. Instead of retrieving 5000 rows, I need to get row 1 - 200, remember the id of row number 200, get the next 200 rows from the id of row number 200 and so on.
Does anyone know how to do this? Is stateful (as in 'opposite of stateless') the correct way to go? And how would I proceed...? Could someone point me to an example?
You do not need stateful service in your scenario. Stateful services is better to avoid, especially when you want to save 5000 rows there.
Client should specified how much data it needs. So it could be method GetRows(index, amount), where index is start index for getting rows and amount of rows to get beginning from starting index.
Also client should ask about data state from service, and service just sends data state. For example when you have these 5000 rows, you could have method on service GetRowsState(index, amount) and the same story it's just saying you the last updated time for your rows, when time you have received is higher or other then on client, then once more to GetRows from server to update client data state.

how to get next 1000 records the fastest way

I'm using Azure Table Storage.
Let's say i have a Partition in my Table with 10,000 records, and I would like to get records number 1000 to 1999. And next time i would like to get records number 4000 to 4999 etc.
What is the fastest way of doing that?
All I can find till now are two options, which I don't like very much:
1. run a query which returns all 10,000 records, and filter out what I want when I get all 10,000 records.
2. Run a query whichs returns 1000 records at a time, and use a continuation token to get the next 1000 records.
Is it possible to get a continuation token without downloading all corresponding records? It would be great if i can get Continuation Token 1, than get Continuation token 2, and with CT2 get records 2000 to 2999.
Theoretically you should be able to use continuation tokens without downloading the actual data for the first 1000 recors by closing the connection you have after the first request. And I mean closing it at TCP level. And before you read all data. Then open a new connection and use continuation token there. Two WebRequests will not do it since the HTTP implementation will likely use keep alive wchich means all your data is going to be read in the background even though you don't read it in your code. Actually you can configure your HTTP requests to not use keep alive.
However, another way is naturally if you know the RowKey and can search on that but I assume you don't know which row keys will be in each 1000 entity batch.
Last I would ask why you have this problem in the first place. And what your access pattern is. If inserts are common and getting these records is rare I wouldn't bother making it more efficient. if this is like a paging problem i would probably get all data on the first request and cache it (in the cloud). if inserts are rare but you need to run this query often I would consider making the insertion of data have one partion for every 1000 entities and rebalance as needed (due to sorting) as entities are inserted.