I have a streamed service. The message returned from the operation has a stream as the only body member, which is a stream to a file in the file system. I wonder if there's a way to record how much time it takes to the client to consume such file, from the server?
One of the ways you can go - return from server not only stream, but data structure, contains file size as well.
On client - you can use timer and calculate time against already read vs took time vs full file size.
See this example: http://www.codeproject.com/Articles/20364/Progress-Indication-while-Uploading-Downloading-Fi
Related
I have to implement a kafka consumer which reads data from a topic and writes it a file based on the account id(will be close to million) present in the payload. Assuming there will be around 3K events per second. Is it ok to open and close file for each message read?
or should I consider a different approach?
I am assuming following:
Each account id will be unique and will have its own unique file.
It is okay to have a little lag in the data in the file, i.e. the data in the file will be near real time.
The data read per event is not huge.
Solution:
Kafka Consumer reads the data and writes to a database, preferably a NoSQL db.
A separate Single thread periodically reads the database for new records inserted, groups them by accountId.
Then iterates over the accoundId and for each accountId opens the File, writes the data at once, closes the File and moves to the next accountId.
Advantages:
Your consumer will not be blocked due to File Handling, as the two operations are decoupled.
Even if File Handling fails then the data is always present in DB to reprocess.
If your account id repeats, then it is better to windowing. You can aggregate all events of say 1 min by windowing, then you can group events by key and process all accountId at once.
This way, you will not have to open a file multiple times.
It is not okay to open a file for every single message, you should buffer a fixed amount of message, then write to a file when you each that limit.
You can use the HDFS Kafka Connector provided by Confluent to manage this.
If configured with the FieldPartitioner writing out to a local filesystem given store.url=file:///tmp, for example, that will create one directory per unique accountId field in your topic. Then the flush.size configuration determines how many messages will end up in a single file
Hadoop does not need to be installed as the HDFS libraries are included in the Kafka Connect classpath and they support local filesystems
You would start it like this after creating two property files
bin/connect-standalone worker.properties hdfs-local-connect.properties
I am currently enhancing a product to support web delivery of large file-content.
I would like to store it in the database, and whether or not I choose to FILESTREAM by BLOB, the following question still holds.
My WCF method will return a stream, meaning that the file stream will remain open while the content is read by the client. If the connection is slow, then the stream could be open for some time.
Question: Connection pooling assumes that connections are exclusively held, only for a short period of time. Am I correct in assuming, that given I have a connection pool of finite size, there could be a contention problem, if slow network connections are used to download files?
Under this assumption, I really want to use FILESTREAM, and open the file directly from the file-system, rather than the SQL connection. However, if the database is remote, I will have no choice but to pull the content from the SQL connection (until I have a local cache of the file anyway).
I realise I have other options, such as to server-buffer the stream, however that will have implications as well. I wish at this time, to discuss only the issues relating to returning a stream obtained from a DB connection.
I have a service that has a method to send a file to the service from the client. I notice that when I run the client and the service in the same machine and the file that I want to send is also in the local machine, all works very fast.
However, I the client and the service are in the same machine but the file is in other computer, then speed is very slow.
If I copy the file from one computer to other, the speed is fast, so the problem does not seem to be the bandwidth.
I try to use tcp and basicHttp Binding, but the results are the same.
This problem also occurrs when I try to send if the client are in other computer.
Thanks.
EDIT: If I open the task manager, in the network tab of the computer taht run the client, I can see that the use of the network is about 0.5%. Why?
WCF for transmitting large file is not the optimal method because WCF has a lot of layers and overhead that adds up and causes delay in file transmission. Morever, you may not have written the WCF service to continuously read chunks of byte and write to the response. You might be doing a File.ReadAll and then just return the whole string, which would cause a large amount of sync read on the server, a lot of memory allocation and then writing the large string to WCF buffer, which in turn write to IIS buffer and so on.
The best way to transmit large files is by using HttpHandlers. You can just use Response.TransmitFile to transfer the file and IIS will transmit the file in the most optimal way. Otherwise you can always read 8k at a time and then write to the Response stream and call Flush after every 8k write.
If you cannot go for HttpHandler for any weird reason, can you show me the WCF code?
Another thing. You might be expecting performance that is simply not possible when IIS is in the picture. First you should measure how long it takes for IIS to transmit the file if you just host the file directly on a website and download the file by doing a WebClient.downloadString.
Another thing is, how are you downloading? Via Browser? or via client side code? Client side code can be suboptimal as well if you are trying to transmit the whole file in one shot and trying to hold it in a string. For ex, WebClient.DownloadString would be the worst approach.
I have a machine which creates a new log file at the beginning of the day(12am) and updates the log file whenever there is any changes until the end of the day.
How do I import the data in real time (30 sec, 1min or whenever there is any changes) to my SQL server database?
Will SQL Server 2008 be able to access the active log file? If not will it be easier if i let my machine create a new log file whenever there is any updates? But if it is so, how do i import so many log files with different names in real time. ( I must be able to scale the solution up to multiple machines)
Thx a lot
You can log each new line with a reversed time stamp.
Since you need to log only when the file changes you can implement an in memory queue
which reads from the file and stores the data.
Then implement a producer consumer model wherein one thread reads and loads data from the queue and the consumer logs to the database.
A windows service then can keep reading from the queue and log to the SQL Server.
(Since it's a producer consumer there will not be any busy waiting in case the queue is empty)
Somehow you will also have to notify the producer thread whenever every log is made. This can be done through Sockets/or some other means in case you have access to the code which is doing the logging.
If you have no control over the application producing the file then you have little option but to poll the file. Write an application that regularly polls the file and writes the deltas to the database. The application will need to record a high water mark that it has last read to.
Another wrinkle is that if the application does not close the file between writes then the last accessed time stamp might not be updated, so checking the age of the file may not be reliable. In this case you need to implement something like this process:
Open the log file
Seek to your last recorded EOF position
Try reading
If successful, process the new data until you get to the new EOF.
Update your persistent EOF position
Close the file
You will need to make sure that the number of bytes read aligns with your file seek position. If the log file is unicode then it may not have a 1:1 mapping between bytes and characters. You may need to read chunks of the file in binary mode and do the translation to characters from the buffer.
Once you have the log file entries parsed then you can just insert the data, or use SQLBulkCopy for larger data volumes.
If you can relax your latency constraints and the log file is small enough then you could possibly just implement a process that copies the log file to a staging area and reloads the whole thing periodically.
How about an SSIS package being called by an SQL Server Scheduled Job?
Can I somehow subscribe for notifications about Azure's blob object changes?
My purpose is to delegate file uploads to the client using SAS and lately (after upload is complete) update the database. It looks like I need to continuously check blob's state, but it is quite resource consuming process.
You can't be notified by the Blob Storage about a change made to a blob, but as you point out, you can monitor it, requesting the ETag on a scheduled basis to see if it's done.
That being said, the cost to monitor a blob (or even a whole container) can be close to negligible if correctly implemented. Pinging the Blob Storage once per second costs you roughly $2.5 / month. Then, by using some heuristic you can probably lower this cost to $0.25 (one check per 10s on average). At this point, it's not really interesting to try to optimize more.
You can now do this using Azure functions
Create a blob trigger by specifying your storage account connection
string and your container/{name}
In outputs, select the place where
you want your notification to go to
Another option to consider is to have the client notify you when it's done uploading.
I created a file change monitor for monitoring blobs - full details at http://ben.onfabrik.com/posts/monitoring-files-in-azure-blob-storage