WCF stateful service - wcf

Or - at least I think the correct term is stateful. I got a wcf service, listing lots of data back to me. So much data in fact, that I'm exceeding maxrecievedmessagesize - and the program crashes.
I've come to realize that I need to split the calls to the db. Instead of retrieving 5000 rows, I need to get row 1 - 200, remember the id of row number 200, get the next 200 rows from the id of row number 200 and so on.
Does anyone know how to do this? Is stateful (as in 'opposite of stateless') the correct way to go? And how would I proceed...? Could someone point me to an example?

You do not need stateful service in your scenario. Stateful services is better to avoid, especially when you want to save 5000 rows there.
Client should specified how much data it needs. So it could be method GetRows(index, amount), where index is start index for getting rows and amount of rows to get beginning from starting index.
Also client should ask about data state from service, and service just sends data state. For example when you have these 5000 rows, you could have method on service GetRowsState(index, amount) and the same story it's just saying you the last updated time for your rows, when time you have received is higher or other then on client, then once more to GetRows from server to update client data state.

Related

Yammer API - Paging

I am trying to gather a range of messages through the rest API, and am aware that you can only retrieve 20 results at a time. I have tried incrementing a page variable, but this has no affect, and I am just getting the same results each time no matter the page number (https://www.yammer.com/api/v1/messages.json?page=6). I have proceeded to use the newer_than and older_than parameters to page through the results, and it works to some extent, but it appears to be excluding records. I am using the following approach below:
Since just setting a newer_than only results in the 20 most recent records as long as they are newer than the id that is sent in the newer_than parameter, I am also setting a dynamic older_than parameter.
Send request with only a newer than parameter. This returns the 20 most recent records. (eg. ww.yammer.com/api/v1/messages.json?newer_than=235560157)
Extract the ID of the 20th id in the JSON, and using this to populate the older_than parameter. The result is 20 different records. (eg.ww.yammer.com/api/v1/messages.json?newer_than=235560157&older_than=405598096)
Repeat step 2 until no results are returned since the newer_than and older_than parameters will eventually overlap.
The problem is that the set of records that is returned with this method is less than the number of records that is returned for messages from the data export API. I am working under the assumption that newer message IDs are always generated with a value greater than any older messages.
Could I possibly be misunderstanding how paging through results is supposed to be implemented with the REST API?
Any help would be much appreciated!
Thanks in advance!
First of all, the page parameter works only for the search API.
Secondly, the way you are trying to fetch messages will not return any comments on the messages or will return top 2 comments on any message based on the "extended" parameter. By default it returns 2 comments on every message. To get all the comments on the message you will have to get it individually message wise.
That must be causing the difference in the number of messages in the two methods mentioned.
I agree with Farhann - The rest API endpoint returns only top two comments for any message by default. To get all the comments for a post, you have to make a separate request.
With the use of the Data Export API, all the comments along with the message (public and private) are also exported which increases the count of the number of the messages. While, the API call returns only recent 2 comments on any message by default.
The data export includes private messages. Private messages will not be returned by that API call.
Check if the messages you are not seeing are private messages.

database schema for http transactions

I have a script that makes a http call to a webservice, captures the response and parses it.
For every transaction, I would like to save the following pieces of data in a relational DB.
HTTP request time
HTTP request headers
HTTP response time
HTTP response code
HTTP response headers
HTTP response content
I am having a tough time visualizing a schema for this.
My initial thoughts were to create 2 tables.
Table 'Transactions':
1. transaction id (not null, not unique)
2. timestamp (not null)
3. type (response or request) (not null)
3. headers (null)
4. content (null)
5. response code (null)
'transaction id' will be some sort of checksum derived from combining the timestamp with the header text.
The reason why i compute this transaction id is to have a unique id that can distinguish 2 transactions, but at the same time used to link a request with a response.
What will this table be used for?
The script will run every 5 minutes, and log all this into the DB. Plus, every time it runs, the script will check the last time a successful transaction was made. Also, at the end of the day, the script generates a summary of all the transactions made that day and emails it.
Any ideas of how i can improve on this design? What kinda normalization and/or optimization techniques i should apply to this schema? Should i split this up into 2 or more tables?
I decided to use a NoSQL approach to this, and it has worked. Used MongDB. The flexibility it offers with document structure and not having to have a fixed number of attributes really helped.
Probably not the best solution to the problem, but i was able to optimize the performance using compound indexes.

How much time is going to take a process - by WCF Service

I’m here with another question this time.
I have an application which builds to move data from one database to another. It also deals with validation & comparison between the databases. When we start moving the data from source to destination it takes a while as it always deals with thousands of records. We use WCF service and SQL server # server side and WPF # client side to handle this.
Now I have a requirement to notify user with the time it is going to take based on the source database no: records (eventually that is what im going to create in the destination database) right before user starts this movement process.
Now my real question, which is the best way we can do this and get an estimated time out of it?
Thanks and appreciated your helps.
If your estimates are going to be updated during the upload process, you can take the time already spent, delete on number of processed records, and multiply by number of remaining records. This will give you an updating average remaining time:
TimeSpan spent = DateTime.Now - startTime;
TimeSpan remaining = (spent / numberOfProcessedRecords) * numberOfRemainingRecords;

how to get next 1000 records the fastest way

I'm using Azure Table Storage.
Let's say i have a Partition in my Table with 10,000 records, and I would like to get records number 1000 to 1999. And next time i would like to get records number 4000 to 4999 etc.
What is the fastest way of doing that?
All I can find till now are two options, which I don't like very much:
1. run a query which returns all 10,000 records, and filter out what I want when I get all 10,000 records.
2. Run a query whichs returns 1000 records at a time, and use a continuation token to get the next 1000 records.
Is it possible to get a continuation token without downloading all corresponding records? It would be great if i can get Continuation Token 1, than get Continuation token 2, and with CT2 get records 2000 to 2999.
Theoretically you should be able to use continuation tokens without downloading the actual data for the first 1000 recors by closing the connection you have after the first request. And I mean closing it at TCP level. And before you read all data. Then open a new connection and use continuation token there. Two WebRequests will not do it since the HTTP implementation will likely use keep alive wchich means all your data is going to be read in the background even though you don't read it in your code. Actually you can configure your HTTP requests to not use keep alive.
However, another way is naturally if you know the RowKey and can search on that but I assume you don't know which row keys will be in each 1000 entity batch.
Last I would ask why you have this problem in the first place. And what your access pattern is. If inserts are common and getting these records is rare I wouldn't bother making it more efficient. if this is like a paging problem i would probably get all data on the first request and cache it (in the cloud). if inserts are rare but you need to run this query often I would consider making the insertion of data have one partion for every 1000 entities and rebalance as needed (due to sorting) as entities are inserted.

SQL connection lifetime

I am working on an API to query a database server (Oracle in my case) to retrieve massive amount of data. (This is actually a layer on top of JDBC.)
The API I created tries to limit as much as possible the loading of every queried information into memory. I mean that I prefer to iterate over the result set and process the returned row one by one instead of loading every rows in memory and process them later.
But I am wondering if this is the best practice since it has some issues:
The result set is kept during the whole processing, if the processing is as long as retrieving the data, it means that my result set will be open twice as long
Doing another query inside my processing loop means opening another result set while I am already using one, it may not be a good idea to start opening too much result sets simultaneously.
On the other side, it has some advantages:
I never have more than one row of data in memory for a result set, since my queries tend to return around 100k rows, it may be worth it.
Since my framework is heavily based on functionnal programming concepts, I never rely on multiple rows being in memory at the same time.
Starting the processing on the first rows returned while the database engine is still returning other rows is a great performance boost.
In response to Gandalf, I add some more information:
I will always have to process the entire result set
I am not doing any aggregation of rows
I am integrating with a master data management application and retrieving data in order to either validate them or export them using many different formats (to the ERP, to the web platform, etc.)
There is no universal answer. I personally implemented both solutions dozens of times.
This depends of what matters more for you: memory or network traffic.
If you have a fast network connection (LAN) and a poor client machine, then fetch data row by row from the server.
If you work over the Internet, then batch fetching will help you.
You can set prefetch count or your database layer properties and find a golden mean.
Rule of thumb is: fetch everything that you can keep without noticing it
if you need more detailed analysis, there are six factors involved:
Row generation responce time / rate(how soon Oracle generates first row / last row)
Row delivery response time / rate (how soon can you get first row / last row)
Row processing response time / rate (how soon can you show first row / last row)
One of them will be the bottleneck.
As a rule, rate and responce time are antagonists.
With prefetching, you can control the row delivery response time and row delivery rate: higher prefetch count will increase rate but decrease response time, lower prefetch count will do the opposite.
Choose which one is more important to you.
You can also do the following: create separate threads for fetching and processing.
Select just ehough rows to keep user amused in low prefetch mode (with high response time), then switch into high prefetch mode.
It will fetch the rows in the background and you can process them in the background too, while the user browses over the first rows.