is there a way to migrate existing media files into new structure after changing
shopware:
cdn:
strategy: id
to
shopware:
cdn:
strategy: plain
and are there any required steps for new uploads to be stored the “plain” way? As far as I can tell, my newly uploaded files are not affected by the config change.
Additionally: are there any drawbacks of using the plain strategy?
My reasoning behind the change is to speed up rsync of ~40GB of files, when they are stored on one level as public/media/<filename.xy> instead of the "default" nested approach. Would that even gain me speed?
As far as I know there is no readily available method to migrate existing files when changing the strategy.
The whole idea of the id strategy is to make lookups faster. So the drawback of using the plain strategy would be performance loss with a huge amount of files within a single directory.
While rsync probably doesn't profit from the id strategy as it will have to traverse the directories anyways, I couldn't find any reports of the amount of directories impacting the speed in a significant way.
Related
I am currently using ArangoDB to store all the data I'm using for my application, including images. Now I want to migrate to S3 to store the image files and transfer the files I currently have in my ArangoDB.
I am aware that images are stored in file system, but I am not sure how to actually transfer them to s3.
Thank you for your help
The location of the data files is implementation-specific, as it can be changed at install and startup. On Linux, the default directory is /var/lib/arangodb3.
But in my experience, backing-up the raw storage files is not a good idea. I have found it very difficult to restore or access data with this method. Instead, I recommend using one of these two "official" methods:
Hot backups (enterprise-edition only)
JSON export (using arangobackup/arangoimport)
Snapshot-style "Hot backups" are really great - truly the preferred method. They have everything you would need (speed, reliability, portability, etc.), with only a few case-dependent limitations. The real downside is that it's only available in the enterprise editions (including Oasis).
JSON export is the "thrifty" backup option - I would forget about arangorestore (it does horrible things to your _id/_key values, and takes forever to do so). The good news about JSON export is that it's EXTREMELY portable. Almost ANY code-base (and even most good DB's) can work with it, so you're never locked into a single product or workflow, or even a specific version of ArangoDB (making up/down-grades much easier).
Has anybody experiences with great synonym files for the SynonymFilterFactory? We want to write down functional requirements for a new project (group the search results by facets with hierarchical synonyms) without own experiences.
How will be the index time increase per document? Which is a common file size for synonym files and which size should such a file not exceed?
I think you'll be pleasantly surprised, Solr can handle some decent sized lists: https://issues.apache.org/jira/browse/LUCENE-3233
That said, the only way to know if your particular use case will behave according to your particular requirements is to test it.
One thing though, if you're using configsets stored in Zookeeper (SolrCloud), the max file size in the default ZK config is 1Mb. If your synonym file exceeds that, you'll need to chop it up, not store it in ZK, or change the jute.maxbuffer setting in your ZK config.
Most of my day is spent on writing SQL queries to perform small tasks, mainly to get information from the database and manipulate it somehow for data visualization building reports for others.
At the end of the day i try to have a nice folder scheme to help me reusing code and so on, but it's becoming harder to handle so many files and keep
track of everything I've done so far.
Don't want to have huge SQL files because I might want to
the end It's hard to avoid a war zone in my desktop and on this folders. It's also a mess to handle so many folders/codes.
For version control we're using a GIT server, but there is plenty of code that is not in production that we would like to keep track and reuse somehow.
We're using iPython notebook, R studio and SSMS to build our codes, I'm wonder if there is some efficient ways to work.
There must be an efficient way to work out there. What do you use to keep track of your (SQL) codes? and more importantly reuse it.
Thanks in advance,
Rafael
I just use a folder system. And I keep the shell-scripts so to speak as the first file (like the generic code to do X). Whereas the specific codes where I take X and apply dates and other conditions in the bottom half of the folder.
What do you use to keep track of your (SQL) codes? and more importantly reuse it.
For ease of reuse, I have all my running SQL code backed up on an SQL server through routine INFORMATION SCHEMA dumps. For all development code that I need to reuse with others, I have a GIT server that gets automatic updates throughout the day. For reuse on my laptop itself, I have a local backup through time machine.
As for directory or folder structure, all code starts as project based and eventually I migrate the best and most useful code to a personal folder structure that is topic based (date arithmetic, indexing, etc.). No matter how they are stored, all these folders are indexed using local and remote indexing features so I can search and retrieve them with just a few keystrokes when needed. Ultimately what's needed for optimum reuse is ease of retrieval. The quicker I can retrieve, the more reuse I get.
Lastly, it's not just SQL code, but all the supporting documents that led to that code solution. Sometimes this collection may include code from other languages, code from other servers, emails, text documents, images, workflows, etc. Keeping them all together enhances the value of reuse.
The Project
I've been asked to work on an interesting project -- what amounts to a basic Web CMS -- that uses HTML/CSS/jQuery with PHP. However, one requirement is that there won't be a database to house the data (they want flat files for the documents/pages -- preferable in JSON format).
In a very basic sense, it'll be used to generate HTML pages via a very "non-techie" interface. Each installation would only have around 20 pages, but a few may get up to 100. It has to be fairly easy to drop onto a PHP capable server and run, with very little setup needed.
What's Out There
There are tons of CMS options and quite a few flat file versions. But an OSS or other existing CMS is not an option. They need a simple propriety system.
Initial Thoughts
So flat files it is... but I'd really like to get some feedback on the drawbacks, and if it is worth the effort to try and convince them to use something like MySQL (SQLite or CouchDB are out since none of the servers can be configured to run them at the present time).
Of course the document files are pretty straightforward, but we're also talking about login info for 1 or 2 admins per installation, a few lists, as well as configs/settings (which also can easily be stored in a file with protection).
The Dilemma
If there are benefits to using MySQL rather than JOSN formatted files and some arrays in a simple project like this -- beyond my own pre-conceived notions :) -- I'll be sure to argue them.
But honestly I can't see any that outweigh their need to not have a database system.
I'd appreciate you insight and opinions.
If you can't cite a specific need for relational table design, then you're good with flat files. Build as specified. The moment you can cite a specific need, let them know; upgrading isn't that hard, if you're perception is timely (that is, if you aren;t in the position of having to normalize data that should have been integrated earlier).
It's a shame you can't use CouchDB, this seems like the perfect application for it. Keep in mind that using flat-files severely constrains your architecture and, especially, scalability.
What's the best case scenario for your CMS app? It's successful and people want to use it more? If you're using flat-files it'll be harder to service and improve your system (e.g. make it more robust, and add new features for future versions) and performance will not scale well. So "success" in this case is at best short-lived, as success translates into more and more work for less and less gains in feature-set and performance.
Then again, if the CSM is designed right, then switching between a flat file to RDMS should be as simple as using a different data access file.
Will this be installed on any shared hosting sites. For this to work somewhat safely, a mechanism like suEXEC needs to be set up properly as the web server will need write permissions to various directories.
What would be cool with a simple site that was feed via JSON and jQuery is that the site wouldn't need to load on each click. Just the relevant data would change. You could then use hashes in the location bar to keep track of where you were (ex. http://localhost/#about)
The problem being if they are editing the raw JSON file they can mess it up pretty quick. I think your admin tools would have to generate the JSON files based on the input so that you can ensure nothing breaks. The admin tools would be more entailed then the site (though isn't that always the case with dynamic sites)
What is the predicted data sizes for the CMS?
A large reason for the use of a RDMS is quick,specific access to large amounts of data. The data format might not be large, but if there is a lot of the data, then it might be better in the long run for a RDMS.
Then again, if the CSM is designed right, then switching between a flat file to RDMS should be as simple as using a different data access file.
While an RDBMS may be necessary for a very large CMS, a small one could run off flat files very well. A lot of CMS products out there fall down in that regard, I think, by throwing an RDBMS into the mix when there's no real need.
However, if you are using flat files, there are security issues which others have highlighted. Another issue I've come across is hosting providers using the disable_functions directive in php.ini to disable file I/O functions like fopen() and friends. If you're hosting your CMS on a box you control, you won't have this problem but if you're using a third-party provider, check first.
As the original poster, I wasn't signed in, so I'm following up to the answers so far in an answer (sorry if this is bad form).
There may instances where this is on
a shared host.
Though the JSON files can technically
be edited, this won't be the case.
The admin interface will be robust
enough to do all of the creating/editing of pages
The size for each install will be
relatively small -- 1 - 2 admins,
10-100 pages. A few lists of common
items may run longer (snippets of
copy for example).
Security will be a big issue -- any
other options suggestions on this
specifically?
Well, isn't there a problem with they being distrustful to any database system? Isn't the problem more in their thinking than in technology? Maybe they are afraid of database because it sounds complex to them. In that case, if you just present them some very simple CMS (like CMS made simple, which I've heard is really simple and the learning process is very fast), if they see everything is easy then may be they just don't care what's behind, if it's a database or whatever!
They could hear to arguments like better maintenance, lower cost of maintenance, much better handover to another webmaster than proprietary solutions (they are not dependent on you) etc.
I was thinking of an efficient way to add quarantining abilities to my antivirus application:
copy the file into a specified directory and change its extension to none (*.).
save the file's binary code in an XML database.
Which way is better?
However, I have no idea how I will recompile the binary code once the user wants to restore the file.
A way to do this is to encrypt the binary file using an encryption engine and moving it into a quarantine folder, you could create a random password and encrypt the file with that password and store it somewhere (that password could also be encrypted with a master key). That is probably the easiest way of quarantining. To unquaranine, just write the complete opposite of the quarantining code. Enumerate the files into a list and filter it out, then when the user clicks on an item and presses unquarantine, it calls the unquarantine function with the filepath as the variable.
If I had to do this (and again, I wouldn't want to be in this situation in the first place, per my comment), I would use an in-process database engine with native support for encryption and large-format binary data. I think sql compact or sqlite both fit this.
I would not use xml, because it's plain-text and the binary data could be easily extracted, and I would not just change the extension, because the file could still easily be executed. Neither are much of a quarantine.
Note that the renaming option is probably the most "efficient" of what I've seen discussed so far, but when dealing with security software correctness should always be your first concern over efficiency. There are times when you can compromise correctness for performance (3D game rendering software does this all the time, to great effect), but security software is not in this category.
What you can do is optimize later. For example, anti-virus engines use heuristics (rules of thumb that will only hold most of the time) to make their software faster, they do this in a way that favors false positives that must then be more-closely checked rather than potentially missing a threat. This only works because the code that more-closely checks each item was written and battle-tested first.