Is there any object-storage standardized protocol for storing data+metadata? - api

When it comes to query databases, you can rely on SQL or when exchanging emails you can rely on SMTP.
They are both standards across vendors. We could generalize this to many other "standardized services".
This allows the backing provider of the service to be switched easily. I can change a local MySQL to a managed one, or I can switch an SMTP by another just changing the config of the application without changing the code.
Question:
When it comes to "store binaries", like for example say a company that needs to store photos sent by users, PDFs sent by providers and phone-call recordings from the quality department...
...when it's about "storing binaries over the net" is there any industry-standard in the form of "key-value-metadata" where the value is the binary, the metadata is a free-object (with possibly info about creation time, who is the creator, context of the creation, etc); and the ID is a function of the content (for example a Sha1 or whatever), that ensures anti-duplication measures by design?
When I say industry-standard I mean that I don't care if "behind" my connecting API, the "real" storage is implemented by a redis or a mysql or a plain-file-system or an S3 from AWS... just something "standard" given by multiple parties, to where I connect to and I can "easily change what's behind" by configuration and change from one storage-provider to another with the same simplicity we change our SMTP for example not caring if there's a sendmail or a postfix behind.

Related

Implementing a RMW operation in Redis

I would like to maintain comma separated lists of entries of the following form <ip>:<app> indexed by a an account ID. There would be one such list for each user indexed by their account ID with the number of users in the millions. This is mainly to track which server in a cluster a user using a certain application is connected to.
Since all servers are written in Java, with Redisson I'm currently doing:
RSet<String> set = client.getSet(accountKey);
and then I can modify the set using some typical Java container APIs supported by Redisson. I basically need three types of updates to these comma separated lists:
Client connects to a new application = append
Client reconnects with existing application to new endpoint = modify
Client disconnects = remove
A new connection would require a change to a field like:
1.1.1.1:foo,2.2.2.2:bar -> 1.1.1.1:foo,2.2.2.2:bar,3.3.3.3:baz
A reconnect would require an update like:
1.1.1.1:foo,2.2.2.2:bar -> 3.3.3.3:foo,2.2.2.2:bar
A disconnect would require an update like:
1.1.1.1:foo,2.2.2.2:bar -> 2.2.2.2:bar
As mentioned the fields would be keyed by the account ID of the user.
My question is the following: Without using Redisson how can I implement this "directly" on top of Redis commands? The goal is to allow rewriting certain components in a language different than Java. The cluster handles close to a million requests per second.
I'm actually quite curious how Redisson implements an RSet under the hood and I haven't had time to dig into it. I guess one option would be to use Lua, but I've never used it with Redis. Any ideas how to efficiently implement these operations on top of Redis on a manner that is easily supported by multiple languages, i.e. not relying on a specific library?
Having actually thought about the problem properly, it can be solved directly with a HSET. Where <app> is the field name and the value are the IPs. Keys being user accounts.

What information is logged by IdentityModel when ShowPii is set to true?

IdentityModelEventSource has a property called ShowPII that means that Personally Identifiable Information will be added to the logs (in relation to security). This value is used to decide when to log some OAuth2 sensitive data.
I am trying to understand what kind of Personally Identifiable Information will be logged:
Client ID? (aka Client Key, Consumer Key)
Client Secret? (aka Consumer Secret)
Json Web Tokens? (aka JWT)
Access Tokens?
Refresh Tokens?
Kerberos Tickets?
PKCE Values?
Authorization Codes?
I know it cannot get access to usernames and passwords because they are only exchanged directly with the IDP.
But but I need to know if I need to find a way to lock down my log files because it will have data that constitutes a security vulnerability.
This is possible log messages of IdentityModel: LogMessages.cs
About
I am trying to understand what kind of Personally Identifiable Information will be logged
I won't copy-paste log messages from there (especially, as they can change at any moment). You can check them yourself and decide what should be considered as the PII.
But here's an interesting example:
"IDX10615: Encryption failed. No support for: Algorithm: '{0}', SecurityKey: '{1}'."
and this is how it's used:
throw LogHelper.LogExceptionMessage(new SecurityTokenEncryptionFailedException(LogHelper.FormatInvariant(TokenLogMessages.IDX10615, encryptingCredentials.Enc, encryptingCredentials.Key)));
If you'll follow the track you'll find out that encryptingCredentials.Key will be logged if ShowPII = true and won't be logged if ShowPII = false.
Of course, depending on your use case, this particular message may never appear in your logs. And not all messages so outrageously leaky. But you never know:
your use case may change
you may be mistaken about the set of messages IdentityModel can emit for your use case
IdentityModel code may change, and you may forget to check if messages' set is still secure
So about
if I need to find a way to lock down my log files
Yes, you definitely need to.
Or better yet - don't use ShowPII = true in production for monitoring, use it only in development environment for debugging purposes.
Looking at the source, it appears that when ShowPII is on - it will do two things:
Replace all parameters passed to library-specific exceptions with their data type names
For all system exceptions - replace inner message with exception type name
In this context "library-specific" is an exception that is of type Exception and its full type name starts with "Microsoft.IdentityModel." (library defines a few)
Depending on your use case you'd see a variety of parameters that can be logged with custom exceptions. A quick search for FormatInvariant yields quite a few for your consideration.
Again, depending on how you use it, you might get a better idea of what the error messages are by looking through relevant LogMessages.cs file on your specific namespace.
P.S.: on a side note, it appears that default ShowPII setting is GDPR-compliant

Gemfire region with data expiration

Regarding this document, "entry-time-to-live-expiration" means How long the region's entries can remain in the cache without being accessed or updated. The default is no expiration of this type. However, when I use Spring Cache and client-region with following configuration, I find that setting dose not work well with being accessed. Going forward, regarding this document-> XMLTTL tab, it said "Configures a replica region to invalidate entries that have not been modified for 15 seconds.". So I am confused if TTL work for being accessed.
<gfe:client-region id="Customer2" name="Customer2" destroy="false" load-factor="0.5" statistics="true" cache-ref="client-cache">
<gfe:entry-ttl action="DESTROY" timeout="60"/>
<gfe:eviction threshold="5"/>
</gfe:client-region>
So, the documentation you might want refer to is here and here. Perhaps relevant to your situation is...
"Requests for entries that have expired on the consumers will be forwarded to the producer."
Based on your configuration, given you did not set either a ClientRegionShortcut or DataPolicy, your Client Region, "Customer2", defaults to ClientRegionShortcut.LOCAL, which sets a DataPolicy of "NORMAL". DataPolicy.NORMAL states...
"Allows the contents in this cache to differ from other caches. Data that this region is interested in is stored in local memory."
And for the shortcut of "LOCAL"...
"A LOCAL region only has local state and never sends operations to a server. ..."
However, it does not mean the client Region cannot receive data (of interests) from the Server. It simply implies operations are not distributed to the Server. It may be expiring the entry and then repopulating it from the Server (producer).
Of course, I am speculating and have not tested these ideas. You might try setting the Expiration Action to "LOCAL_DESTROY" and/or changing your distribution properties through different ClientRegionShortcuts.
Post back if you are still having problems. I too echo what #hubbardr is asking.
Cheers!

Validation of sender identity in websites

I came to think about this question a few days ago when I desinged an HTML form that submits data via php to an SQL database. I solved my problem, but I am asking here a computer-theoretical question, which might help me (or others) in the future.
I want to protect myself from SQL-injection, and I thought that instead of validating the data by the php on the server side, I can have the javascript validate the data on the client side (I am much more fluent in JS than in PHP) and then send it.
However, a sophisticated user might inspect the javascript (or the HTTPrequest) and then alter it to send his own vicious request to the server.
My question:
Is it theoretically possible to do a computation on the clinet side, where the code is visible to him, and have it sent with some way that ensures that the data was sent from my program and not from an altered code?
Can this be done by an RSA-scheme with public and private keys?
I want to protect myself from SQL-injection, and I thought that instead of validating the data
Don't validate data to protect yourself from SQL Injection. Validate data to make sure it is in the format you want.
Escape data to protect yourself from SQL Injection (and do that escaping via prepared statements and parameterized queries).
Is it theoretically possible to do a computation on the clinet side, where the code is visible to him, and have it sent with some way that ensures that the data was sent from my program and not from an altered code?
No. The client side code can be bypassed entirely. In this arena, it is useful only to quickly tell the user that their data would be rejected if it was submitted to the server.
Can this be done by an RSA-scheme with public and private keys?
No. You have to give one of the keys to the client. It can then be extracted and used independently of your code.

How to efficiently deploy content types to a Content Type Hub

I have set up a Content Type Hub and tested the syndication is working correctly by creating a test content type and watching it be published to the client site.
Then I deployed the content types I am actually interested in publishing to the hub (by way of a feature) along with the site columns they depend on.
I get the error
Content type '...' cannot be published to this site because feature '...' is not enabled.
I want to deploy content types with features for upgradability and ease of porting between dev, qual and prod environments. But am left not understanding what the benefit of the Hub is.
If I have to activate the deploying feature, the content types will already be on the site before publishing will take place. If I have to manually create the content types on the Hub site with the web UI (yuck!), I have the issue of trying to keep three landscapes manually synchronized.
Is there a way to efficiently manage content type deployment to the Hub while still using the Hub to publish the content types?
The advantage of using the Content Type Hub, is that it allows you to use and reuse your Content Types over multiple site collections and Web Applications throughout your farm.
Because all of your site collections are now using instances of the same syndicated content types, if, in the future, you need to add/remove/rename columns within the content types, this is done as easily as updating the content type, and resubscribing (then allowing sharepoint to run its timer jobs, and double checking that the changes updated because you're a careful SharePoint administrator).
I am not sure which error you are receiving, there simply isn't enough context in your post. However, I think you may be slightly confused on how syndicated content types are published. First, you turn on the content syndication hub publishing feature on the site collection that holds all of the content types you are going to reuse throughout your farm. Next you configure the mixed metadata service, so that SharePoint loads each of your content types "into memory" more or less.
After this step, you get to choose which site collections you want to subscribe to the syndication hub. To do this, you need to turn on the content type publishing site collection feature. Note: If you use blank templates for your sites you may receive a feature error like you've described, due to a "flaw" with blank templates. See my post at: http://www.thesharepointblog.net/Lists/Posts/Post.aspx?ID=109
Only AFTER you've turned on the subscribing feature, And content Type Hub timer job has run, AND the subscribing timer job has run, will your site collection see the available content types.
As for manually creating content types on the hub site, the only OOB way of doing this is to use the UI. Personally, I wrote a utility that does everything I just described for me, from creating the initial content types, to creating the syndication hub, publishing them to all of the site collections, and most time consumingly, associating them with all of the lists and libraries on the subscribing site collections. I had intended for my employing company to sell it, but as they don't seem interested, I could open source it if there is enough interest.
Hope this was helpful.
This looks like a shortcoming of the hub, indeed.
I've witnessed it before.
If you've deployed your content type to the hub, please check if the INHERITS tag of the content type element is set to TRUE. Otherwise it won't work in a hub.
<ContentType ID="xxxxx"
Name="xxxx"
Group="xxxx"
Description="xxxx"
Inherits="TRUE"
Version="0">
</ContentType>
Don't forget that you can actually synchronize the content types BETWEEN farms -- this is especially valuable when you are developing on a separate farm and don't want to hassle with a PnP Framework for managing your content types... In some cases, the Content Type may already exist on the production farm and you need a copy of them on dev and/or test..