Julia: how stable are serialize() / deserialize() - serialization

I am considering the use of serialize() and deserialize() for all of my data i/o due to their convenience. I do not, however, want to be stuck with unreadable files on a Julia update.
How stable are serialize() and deserialize()? Should they work between updates of 0.3? Can I expect safe behavior if I stick to basic types like arrays of Float64?
Thank you.

If you want to store data you might depend on being able to read in the future, you should not use a format that will incorporate breaking changes if/when someone finds it useful. As far as I understand the default serialization format is for network communications, so it is designed for maximum performance.
There is also the HDF5.jl package that uses a documented format and a common library that has wrappers for different languages.

I believe the official answer here is, "people will try not to break the serialization format, but you shouldn't depend upon on it."

Related

Reassign an interface or allow GC to do its work on temporary variables

I'm very new to Go and am currently porting a PHP program.
I understand that Go is not a dynamically-typed language and I like that about it. It seems very structured and easy to keep track of everything.
But I've been coming across situations that seem to be a little ... ugly. Is there a better way of performing this sort of process:
plyr := builder.matchDetails.plyr[i]
plyrDetails := strings.Split(plyr, ",")
details := map[string]interface{}{
"position": plyrDetails[0], "id": plyrDetails[1],
"xStart": plyrDetails[2], "zStart": plyrDetails[3],
}
EDIT:
Is there a better way to achieve a map containing the strings from plyr than to create two additional variables, to be destroyed straight afterwards? Or is this the correct way?
tl;dr:
If possible, choose a different format and let a library do the string parsing/generation for you
Use structs rather than maps for anything you use a few times, for more compiler checks
The common way of using encoding/json accomplishes both of those.
Meanwhile, don't sweat perf too much because you'll probably vastly improve the old app's speed regardless; there's no indication speed of parsing or GC is a problem yet; and the syntactical differences you mentioned in the first rev. of the post don't necessarily actually relate to GC.
So, I understand you may be porting piece-for-piece, and that may limit what you can change now.
But if/when you can change things, a really clean solution would be to use the encoding/json package and a struct: the json package will parse input/generate output from structs without any manual string manipulation on your part, and using a struct gives you compile-time checking rather than only the runtime checking you get with a map. Lots of Go apps (and others) use JSON to expose their services.
An intermediate step could be to introduce struct types for any internal structure you use at least a few times, rather than maps, so even without updating the parsing, at least the internals of the app get the benefits of compile-time checking. structs are also what things like the gorm object/relational mapper expect to deal with. They happen to use less memory than maps, and be quicker (and more concise syntactically) to access, but those aren't even necessarily the most important considerations here.
On the performance of what you have now, and particularly whether different syntax would make it faster: don't sweat that, for a bunch of reasons: the port's likely to be faster than the PHP was whatever you do; we don't yet have any indication that parsing or GC is actually slow or your bottleneck; and the syntactical differences you talked about in the first revision of your question may not relate to GC much or at all. More/fewer var names in your code may not correspond to more/fewer heap allocations, 'cause often Go can allocate on the stack, briefly discussed under 'escape analysis' in Dave Cheney's Gocon Tokyo slides. And as peterSO said, we seem to be looking at allocations of smallish references, not, say, copying all of the string bytes from the request each time.
Go is NOT PHP. Write Go programs in Go. Write PHP programs in PHP.
Interface values are represented as a two-word pair giving a pointer
to information about the type stored in the interface and a pointer to
the associated data. Go Data Structures:
Interfaces
Reusing Go interface variables to "increase performance" makes no sense.

Is SQL Injection/XSS attack possible with preg_replace?

I have done some research on how injection/XSS attacks work. it seems like hackers simply make use of the USER INPUT fields to input codes.
However, suppose I restrict every USER INPUT fields with only alphanumerics(a-zA-Z0-9) with preg_replace, and lets assume that I use the soon-to-be-deprecated my_sql instead of PDO or my_sqli.
Would hackers still be able to inject/hack my website?
Thanks!
Short version: Don't do it.
Long version:
Suppose you have
SELECT * FROM my_table WHERE id = $user_input
If this happens, then some inputs (such as CURRENT_TIMESTAMP) are still possible, though the "attack" would be limited to the point of probably being harmless. The solution here could be to restrict the input to [0-9].
In Strings ("$user_input"), the problem shouldn't even exist.
However:
You have to make sure you implement your escape function correctly.
It is incredibly annoying for the end user. For instance, if this was a text field, why aren't white spaces allowed? What about รก? What if I want to quote someone with ""? Write a math expression with < (or even write something apparently harmless such as i <3 u)?
So now you have:
A homebrew solution, which has to be checked for correctness (and may have bugs, as any other function). Bugs in this function are potential security issues;
A solution which is unfamiliar to other programmers, who have to get used to it. Code without the usual escape functions is usually wrong code, so it's masssively surprising;
A solution that's fragile. What if someone else modifies your code and forgets to add the validation? What if you forget the validation?
You are focusing on solving a problem that's already been solved. Why waste time doing something that takes time to develop and is hard to maintain when others have already developed proper solutions that take close to no effort to use.
Finally, don't use deprecated APIs. Things are deprecated for a reason. Deprecated can mean stuff like "we'll drop support at any minute" or "this is has severe issues but we can't fix it for some reason".
Deprecated APIs are supposed to be used by legacy applications of developers that did not have enough time or resources to migrate. When starting from scratch, use the supported APIs.

Is this API too simple?

There are a multitude of key-value stores available. Currently you need to choose one and stick with it. I believe an independent open API, not made by a key-value store vendor would make switching between stores much easier.
Therefore I'm building a datastore abstraction layer (like ODBC but focused on simpler key value stores) so that someone build an app once, and change key-value stores if necessary. Is this API too simple?
get(Key)
set(Key, Value)
exists(Key)
delete(Key)
As all the APIs I have seen so far seem to add so much I was wondering how many additional methods were necessary?
I have received some replies saying that set(null) could be used to delete an item and if get returns null then this means that an item doesn't exist. This is bad for two reasons. Firstly, is it not good to mix return types and statuses, and secondly, not all languages have the concept of null. See:
Do all programming languages have a clear concept of NIL, null, or undefined?
I do want to be able to perform many types of operation on the data, but as I understand it everything can be built up on top of a key value store. Is this correct? And should I provide these value added functions too? e.g: like mapreduce, or indexes
Internally we already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores
Do only what is absolute necessary, instead of asking if it is too simple, ask if it is too much, even if it only has one method.
Your API lacks some useful functions like "hasKey" and "clear". You might want to look at, say, Python's hack at it, http://docs.python.org/tutorial/datastructures.html#dictionaries, and pick and choose additional functions.
Everyone is saying, "simple is good" and that's true until "simple is too simple."
If all you are doing is getting, setting, and deleting keys, this is fine.
There is no such thing as "too simple" for an API. The simpler the better! If it solves the need the way it is, then leave it.
The delete method is unnecessary. You can just pass null to set.
Edited to add:
I'm only kidding! I would keep delete, and probably add Count, Contains, and maybe an enumerator (or two).
When creating an API, you need to ask yourself, what does my API provide the user. If your API is so simplistic that it is faster and easier for your client to write their own app, then your API has failed. Ask yourself, does my functionality give them specific benefits. If the answer is no, it is too simplistic and generic.
I am all for simplifying an interface to its bare minimum but without having more details about the requirements of the system, it is tough to tell if this interface is sufficient. Sure looks concise enough though.
Don't forget to document the semantics for "key non-existent" as it isn't clear from reading your API definition above. updated: I see you have added the exists method: is this necessary? you could use the get method and define a NIL of some sort, no?
Maybe worth thinking about: how about considering "freshness" of a value? i.e. an associated "last-modified" timestamp? Of course, it depends on your system requirements.
What about access control? Is it within scope of the API definition?
What about iterating through the keys? If there is a possibility of a large set, you might want to include some pagination semantics.
As mentioned, the simpler the better, but a simple iterator or key-listing method could be of use. I always end up needing to iterate through the set. A "size()" method too, if not taken care of by the iterator. It obviously depends on your usage, though.
It's not too simple, it's beautiful. If "exists(key)" is just a convenient shorthand for "get(Key) != null", you should consider removing it. I guess that depends on how large or complex the value you get() is.

How to decode SAP text from from STXL.CLUSTD?

I know ! The "proper" way to read STXL.CLUSTD is through SAP ABAP function. But I'm sorry, we are suffering badly from performance problem. We have already make our decision to go directly to the database (Oracle), and we don't have any plan to revert our decision yet since everything goes so much better so far.
However, we've came across this issue. The text in STXL.CLUSTD field was stored in an incomprehensible format. We cannot find any information about its encoding format via google. Anybody can hint me how to decode text from STXL.CLUSTD ?
Thanks
Short version: You don't. Use the function module READ_TEXT.
Long version: You're looking at a so-called cluster table. See http://help.sap.com/saphelp_47x200/helpdata/en/fc/eb3bf8358411d1829f0000e829fbfe/frameset.htm for the details. The data you see is an internal representation of the text, somehow related to the way the ABAP kernel handles the data internally. This data does not make any sense without the metadata. If you change the original structure in an incompatible way, the data can no longer be read. Oh, and did I mention that the data does not contain a reference to the metadata? When reading the contents of these tables, even in ABAP, you need to know the original source data structure, otherwise you're doomed. Without the metadata and the knowledge of how the kernel handles these data types at runtime, you'll have a hard time deciphering the contents.
Personal opinion: Direct access to the database below the SAP R/3 system is a really bad idea since this not only bypasses all safety measures, but it also makes you very vulnerable to all structural changes of the database. The only real reason for accessing the database directly is not lack of performance, but lack of (ABAP) knowledge, and that should be curable :-)
You can definitely read clusters and pools without running any ABAP code, or invoking RFC's or BAPI's, etc. it is a very good approach, highly performant, and easy to use.
I don't like people flogging their products in StackOverflow, but the information that you must use ABAP to access SAP data has been outdated for over 7 years now.
Thanks,
Bill MacLean
I just noticed this thread and I work for Simplement. Snow_FFFF is correct (BTW, that user is not me, and ASFAIK is not anyone in our company). The Data Liberator product has been de-clustering and de-pooling tables (and many other things) for our customers since 2009.

Languages other than SQL in postgres

I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?
For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?
Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.
PS. Bonus points if someone can make PL/LOLCODE seem useful.
#Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.
Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:
Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
After the evaluation, make the most of the platform you decided upon
And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.
"isn't that [text manipulation] more of something that should be programmed into the application?"
Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.
If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.
"is there any valid reason to use an untrusted language?"
As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.
These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.
I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.
Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.
(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).
So that's a rant on the first part of the question. Heart-felt, though.
is there any valid reason to use an
untrusted language? It seems like
making it so that any user can execute
any operation would be a bad idea
My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.
For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.
From my perspective, I guess the answer is 'it depends'.
There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.
Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.
I then wrote a binning algorithm in PL/pgSQL and it worked much faster.
Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.
The untrusted versions of the procedural languages allow you to access I/O on the system.
This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this.
You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.
I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.
An example of a useful stored procedure I recently wrote in an external language that would not have been possible in pl/sql is a version of 'df' which allowed SQL table generators to pick a tablespace with the most free space available at runtime.
I used plperlu, and it was relatively simple, although I had to be careful with data typing.