Is it possible to store somehow fetched metadata from database?
I need to run this process many times and it works too long. It would be
great to have possibility to load these metadata from any local object
and refresh it when it is needed.
This is done by caching, please see this docs page: http://javalite.io/caching
Related
I have a dotnet core 2.0 API. I want to store the total number of requests and also the number of requests made per user. I can't store this information in the database, because the API connects to many different databases dynamically depending on what the user needs. Also, I can't just use logging because I want to retrieve these numbers using a request in the API.
The only thing I can think of would be using a custom JSON file, and to continually update the file using middle ware. But this seems cumbersome, and I feel like there's got to be an easier way to store small amounts of permanent data. Maybe there's a nuget package someone can recommend?
I assume you could use cache memory depending for how long you want this data to be stored.
In other case, as suggested, your only choices is a file or a database.
After reading this article I decided to take a shot on building a pipe of data ingestion. Everything works well. I was able to send data to Event Hub, that is ingested by Stream Analytics and sent to Data Lake. But, I have a few questions regarding some things that seem odd to me. I would appreciate if someone more experienced than me is able to answer.
Here is the SQL inside my Stream Analytics
SELECT
*
INTO
[my-data-lake]
FROM
[my-event-hub]
Now, for the questions:
Should I store 100% of my data in a single file, try to split it in multiple files, or try to achieve one-file-per-object? Stream Analytics is storing all the data inside a single file, as a huge JSON array. I tried setting {date} and {time} as variables, but it is still a huge single file everyday.
Is there a way to enforce Stream Analytics to write every entry from Event Hub on its own file? Or maybe limit the size of the file?
Is there a way to set the name of the file from Stream Analytics? If so, is there a way to override a file if a name already exists?
I also noticed the file is available as soon as it is created, and it is written in real time, in a way I can see data truncation inside it when I download/display the file. Also, before it finishes, it is not a valid JSON. What happens if I query a Data Lake file (through U-SQL) while it is being written? Is it smart enough to ignore the last entry, or understand it as an array of objects that is incomplete?
Is it better to store the JSON data as an array or each object in a new line?
Maybe I am taking a bad approach on my issue, but I have a huge dataset in Google Datastore (NoSQL solution from Google). I only have access to the Datastore, with an account with limited permissions. I need to store this data on a Data Lake. So I made an application that streams the data from Datastore to Event Hub, that is ingested by Stream Analytics, who writes down the files inside the Data Lake. It is my first time using the three technologies, but seems to be the best solution. It is my go-to alternative to ETL chaos.
I am sorry for making so much questions. I hope someone helps me out.
Thanks in advance.
I am only going to answer the file aspect:
It is normally better to produce larger files for later processing than many very small files. Given you are using JSON, I would suggest to limit the files to a size that your JSON extractor will be able to manage without running out of memory (if you decide to use a DOM based parser).
I will leave that to an ASA expert.
ditto.
The answer depends here on how ASA writes the JSON. Clients can append to files and U-SQL should only see the data in a file that has been added in sealed extents. So if ASA makes sure that extents align with the end of a JSON document, you should be only seeing a valid JSON document. If it does not, then you may fail.
That depends on how you plan on processing the data. Note that if you write it as part of an array, you will have to wait until the array is "closed", or your JSON parser will most likely fail. For parallelization and be more "flexible", I would probably get one JSON document per line.
I am a beginner for Ignite, so I have some puzzles, one of which is as follows:when I try to query cache, whether it can look if memory contains or not. If not, then whether it will query database? If not,how to achieve such way?
Please help me if you know.Thx.
Queries work over in-memory data only. You can either use key access (operations like get(), getAll(), etc.) and utilize automatic read-through from the persistence store, or manually preload the data before running queries. For information on how effectively load large data set into the cache, see this page: https://apacheignite.readme.io/docs/data-loading
I've got a stored procedure that loads some data (about 59k items) and it takes 30 seconds. This SP must be called when the application starts. I was wondering if there's a reasonable way to invalidate the Redis cache entry via SQL ...any suggestion?
Thanks
Don't do it from your SQL, do the invalidation / (re)loading to Redis from your application.
The loading of this data into your application should be done by a separate component/service/module/part of your application. So that part should have all the responsibility of handling the needed data, including (re)loading it into the app, invalidating and reloading into Redis and so on. You should see your Redis server as an extension of your application cached data and not of your sql server data. That's why you should not link your relational database to your Redis. If you are going to change how you save this data into Redis that should not affect the SQL part, but only the application, and actually only the part of your application specialized with this.
I am building a news site. Currently, I use MySQL as main data store and Redis to maintain list of articles for a user home page feed. When users click on an article on home page, I connect to MySQL to get the main content of the articles, comments, and related stuff.
Is it best practice if I store all article data in Redis? I mean instead of connecting to MySQL to get the whole content of an article, I store the main content of articles in Redis so that the performance can be improved?
This is opinion-based, so here's my opinion. Redis is primed to be used as a cache. You need to decide what to cache, and if caching is actually necessary. This depends on the scale of your app. If the articles change a lot and you do not have a huge user/visitor base, I do not think Redis is necessary at all. Remember you cannot search for stuff there. You can't go SELECT articles WHERE author='foo' in Redis.
If, on the other hand, you are seeing a massive increase in DB load due to to many users, you could pre-render the HTML for all the articles and put that into Redis. That would save the DB and the web server some load. But only if you already know which articles you want to display.
That depends on the role redis is supposed to take in your case.
If it serves as a cache, you could try to store more data in redis, where possible. As long as the development overhead is small and the process doesn't introduce new sources of errors.
In case you want redis to be a primary source for your data, what it doesn't sound like in your case, you could also decide to move everything away from MySQL. With low, and "rarely" changing data, it might be worth a shot. But remember to back up the database and sync to the HDD after changes.