Is it possible to use stream in Restler3 PHP? - restler

We are using Luracast Restler3 for our project. Some of our API calls return very large amounts of data. Might be a .CSV file of 100 00 lines or more, and of course takes even more memory as JSON or XML.
We will run out of memory in Restler formatter's if the result is returned as single array.
Is it possible to output these as streams instead or is there any other suggested solution for this. Paging is not possible due to limitations of our clients software.

Streaming is currently not supported by Restler, we are planning to add streaming support in the future version of Restler

Related

how does file reading ( streaming ) really work in mule?

Am trying to understand how streaming works w.r.t mule 4.4.
Am reading a large file and am using 'Repeating file store stream' as streaming strategy
'In memory size' = 128 KB
The file is 24 MB and for sake of argument lets say 1000 records is equivalent to 128 KB
so about 1000 records will be stored in memory and rest all will be written to file store by mule .
Here's the flow:
At stage#1 we are reading a file
At stage#2 we are logging payload - so I am assuming initially 128KB worth of data is logged and internally mule will move rest of the data from file storage to in memory and this data will be written to log.
Question : so does the heap memory increase from 128KB to 24 MB ?
I am assuming no , but needed confirmation ?
At stage#3 we are using transform script to create a json payload
So what happens here :
so now is the json payload all in memory now ? ( say 24 MB ) ?
what has happened to the stream ?
so really I am struggling to understand how stream is beneficial if during transformation the data is stored in memory ?
Thanks
It really depends on how each component works but usually logging means to load the full payload in memory. Having said that, logging 'big' payloads is considered a bad practice and you should avoid doing it in the first place. Even a few KBs logs are really not a good idea. Logs are not intended to be used that way. Using logs, as any computational operation, have a cost in processing and resource usage. I have seen several times people causing out of memory errors or performance issues because of excessive logging.
The case with the Transform component is different. In some cases it is able to benefit from streaming, depending on the format used and the script. Sequential access to records is required for streaming to work. If you try an indexed access to the 24MB payload it will probably load the entire payload in memory (example payload[14203]). Also referencing the payload more than once in a step may fail. Streamed records are consumed after being read so it is not possible to use them twice.
Streaming for Dataweave needs to be enabled (it is not the default) by using the property streaming=true.
You can find more details in the documentation for DataWeave Streaming and Mule Streaming.

Event Hub, Stream Analytics and Data Lake pipe questions

After reading this article I decided to take a shot on building a pipe of data ingestion. Everything works well. I was able to send data to Event Hub, that is ingested by Stream Analytics and sent to Data Lake. But, I have a few questions regarding some things that seem odd to me. I would appreciate if someone more experienced than me is able to answer.
Here is the SQL inside my Stream Analytics
SELECT
*
INTO
[my-data-lake]
FROM
[my-event-hub]
Now, for the questions:
Should I store 100% of my data in a single file, try to split it in multiple files, or try to achieve one-file-per-object? Stream Analytics is storing all the data inside a single file, as a huge JSON array. I tried setting {date} and {time} as variables, but it is still a huge single file everyday.
Is there a way to enforce Stream Analytics to write every entry from Event Hub on its own file? Or maybe limit the size of the file?
Is there a way to set the name of the file from Stream Analytics? If so, is there a way to override a file if a name already exists?
I also noticed the file is available as soon as it is created, and it is written in real time, in a way I can see data truncation inside it when I download/display the file. Also, before it finishes, it is not a valid JSON. What happens if I query a Data Lake file (through U-SQL) while it is being written? Is it smart enough to ignore the last entry, or understand it as an array of objects that is incomplete?
Is it better to store the JSON data as an array or each object in a new line?
Maybe I am taking a bad approach on my issue, but I have a huge dataset in Google Datastore (NoSQL solution from Google). I only have access to the Datastore, with an account with limited permissions. I need to store this data on a Data Lake. So I made an application that streams the data from Datastore to Event Hub, that is ingested by Stream Analytics, who writes down the files inside the Data Lake. It is my first time using the three technologies, but seems to be the best solution. It is my go-to alternative to ETL chaos.
I am sorry for making so much questions. I hope someone helps me out.
Thanks in advance.
I am only going to answer the file aspect:
It is normally better to produce larger files for later processing than many very small files. Given you are using JSON, I would suggest to limit the files to a size that your JSON extractor will be able to manage without running out of memory (if you decide to use a DOM based parser).
I will leave that to an ASA expert.
ditto.
The answer depends here on how ASA writes the JSON. Clients can append to files and U-SQL should only see the data in a file that has been added in sealed extents. So if ASA makes sure that extents align with the end of a JSON document, you should be only seeing a valid JSON document. If it does not, then you may fail.
That depends on how you plan on processing the data. Note that if you write it as part of an array, you will have to wait until the array is "closed", or your JSON parser will most likely fail. For parallelization and be more "flexible", I would probably get one JSON document per line.

Maximum number of data can be stored in dojo dstore

I wanted to store all the data from my DB to dstore so,
What is maximum number of data or size can be stored in Dojo dstore?
This is a very vague question, since you don't even mention what type of store specifically. With in-memory stores it's usually advisable to keep totals down to a couple of thousand, though modern browsers can certainly scale higher.
However, the entire point of server-based stores like Request and Rest are that not all items need to be stored on the client side at once. If you have hundreds of thousands of data items and your server providing the data supports filtering/sorting/paging arguments in some way, whether restful in the way that Request and Rest expect or otherwise, a server-based store (i.e., one that queries the server for each fetch or fetchRange call, passing arguments based on any preceding sort and filter calls) is a good idea.
You can get an idea for the kinds of server interactions that the Rest store expects here (although this documentation was written for implementations of older store APIs, dstore/Request and dstore/Rest still expect the same type of behavior, but are slightly more configurable).
You can also see an example of configuring and using dstore/Rest with one particular server-side framework, the Django Rest Framework, here.

Best solution for storing / accessing large Integer arrays for a web application

I have a Web Application (Java backend) that processes a large amount of raw data that is uploaded from a hardware platform containing a number of sensors.
Currently the raw data is uploaded and the data is decompressed and stored as a 'text' field in a Postgresql database to allow the users to log in and generate various graphs / charts of the data (using a JS charting library clientside).
Example string...
[45,23,45,32,56,75,34....]
The arrays will typically contain ~300,000 values but this could be up to 1,000,000 depending on how long the sensors are recording so the size of the string being stored could be a few hundred kilobytes
This currently seems to work fine for now as there are only ~200 uploads per day but as I am looking at the scalability of the application and the ability to backup the data I am looking at alternatives for storing this data
DynamoDB looked like a great option for me as I can carry on storing the uploads details in my SQL table and just save a URL endpoint to be called to retrieve the arrays....but then I noticed the item size is limited to 64kb
As I am sure there are a million and one ways to do this I would like to put this out to the SO community to hear what others would recommend, either web services or locally stored....considering performance, scalability, maintainability etc etc...
Thanks in advance!
UPDATE:
Just to clarify the data shown above is just the 'Y' values as it is time-sampled the X values are taken as the position in the array....so I dont think storing as a tuple would have any benefits.
If you are looking to store such strings, you probably want to use S3 (1 object containing
the array string), in this case you will have "backup" out of the box by enabling bucket
versioning.
You can try tuple of Couchbase and ElasticSearch. Couchbase is very fast document-oriented NoSql database. Several thousands of insert operation is normal for CB. Item size is limited to 20MB. Performance of "get" operation is several tens of thousands. There is one disadvantage, you can query data only by id (there is "view", but I think it will be too difficult to adapt them to the plotting). Compensate for this deficiency may ElasticSearch, that can perform any query very fast. Format data in Couchbase and ElasticSearch is json-document.
I have just come across Google Cloud Datastore, which allows me to store single item Strings up to 1Mb (un-indexed), seems like a good alternative to Dynamo
May be you should use Redis or SSDB, both are designed to store large list(array) of data. The difference between these two databases is that Redis is memory only(disk for backup), but SSDB is disk based and uses memory as cache.

How to transfer large file using WCF

I need to transfer large Excel files over a WCF service. Our project requires generating some reports for the clients, and we use Excel to generate the reports.
Right now the project uses net.tcp binding, but we are considering switching over to http binding.
I read another post on SO about transferring large images and the answers all suggested using streaming. However I'm wondering what the best approach would be considering its an Excel file. The file sizes can sometimes approach ~10Mb.
Yes, streaming will work TCP or HTTP -- you should use it. Using streaming will remove the need to have large in-memory buffers holding the entire file at once. This will increase scalability of your application.
As JP says streaming is a good option- I would generally recommend going for that. The book "Essential Windows Communication Foundation" suggests that if you need reliable messaging, digital signatures or resuming after failure then another option is to manually chunk the data into smaller messages and then reconstitute them on the server.