Smart Contracts - Anti-virus service - bitcoin

I am trying to understand Smart Contracts through an anti-virus service idea. I hope, it matches the idea as expected of Smart Contract and Blockchain 2.0.
Problem
Suppose, I have received a 25 MB file from some one through some media. I would like to have it scanned for virus before I actually open it for use. How can I be sure that it is not infected?
Traditional Solution
Perhaps there is a cloud based service provided for virus scan - Virus Scanning as a Service (VSaaS). This provider has subscription plans based on file type, volume and frequency of scans. I could approach this provider to scan my file and receive a binary response as TRUE (clean) or FALSE (infected). But, I have no choice but to trust two things of the provider:
Did the provider scan at all? Or, more generally, was 'work' done at all? Maybe, there is a tiny script to return TRUE and FALSE with 50% probability regardless of how many and what type of files are sent.
Was the provider's scan conclusive? Maybe, the provider does not have all of the virus signatures to scan successfully. So, the provider may respond with TRUE (clean) whereas, it is actually infected with the latest strain of a virus.
Smart Contract
Can a distributed application (or, Smart Contract) resolve this problem? I know that the underlying blockchain provides a publicly available ledger that can verify:
If I have sufficient coins (say, BTC) to cover the scan service request.
If a scan request (the transaction in this case) was made, to whom, when, etc.
What I do not know is, how do I answer the questions that I have for the traditional solution? That is, what is the mechanism to confirm that a virus scan did actually happen in its entirety? Should multiple 'full nodes' do a virus scan of the same file? If so, am I not spreading my ambiguity across multiple full nodes (in effect, different cloud providers in a conventional way) now?
How would an Ethereum platform address this requirement?

Related

Hashgraph, what's it and how does it work?

Does anyone know what's a hashgraph and the difference from blockchain? If you could explain it to me how it works, I would be pleased!
Thank you in advance!
Hedera Hashgraph is very much different from a regular Blockchain (which is a chain of blocks as its name suggests), it is a DAG (Directed Acyclic Graph).
You can read more about what a DAG is in the link provided but apart from that how Hedera is different is that it is solving the below problems that any DLT should overcome to become Scalable, Secure and Fast without losing Trust:
(1) Performance,
(2) Security,
(3) Governance,
(4) Stability,
(5) Regulatory Compliance.
How is it solving all this simultaneously without losing Trust is something that you can read more in their whitepaper which explains beautifully or refer this article from Rejolut.
One primary problem with any Public Blockchain is the TPS or Transaction per second it can handle and Hedera Hashgraph is miles ahead of the others with being able to clock 100,000 TPS which makes it valuable and preferred option as it offers services like Crypto Service, File Service, Smart Contract Service and Consensus Service.
You can have a look at some of the explorers like Kabuto and Dragonglass to get an idea of the transaction speed and other details.
If you want to play around with Hedera Hashgraph, their docs are very well written or you can use console playground to work on Smart contract and Consensus services directly and maybe use fairorder to do the account management for you to begin with.
Hedera does not use blockchain. It uses DAG, "Directed Acyclic Graph. it keeps the records of communication of all the nodes in Dag and time stamps all these messages. Sending messages between nodes is called gossiping`. Its source code is not open source.
How does the gossip protocol work
In hashgraph, when sending information between nodes, "Alice" will
choose another member at random, such as "Bob". Then Alice will tell
Bob all of the information she knows so far. Alice then repeats this
information with a different random member. Bob repeatedly does the
same, and all other members do the same. In this way, if a single
member becomes aware of new information, it will spread exponentially
fast through the community, until every member is aware of it.
The synchronization of information between two members through the
gossip protocol is called a gossip sync. Upon completion of a gossip
sync, each participating member commemorates the gossip sync with an
event. An event is stored in memory as a data structure composed of a
timestamp, an array of zero or more transactions, two parent hashes,
and a cryptographic signature. The two parent hashes are the hash of
the last event created by the self-parent prior to the gossip sync and
the hash of the last event created by the other-parent prior to the
gossip sync.
Gossiping is the basis of the hashgrap consensus algorithm which uses asynchronous byzantine fault tolerant (abft) . this indicates that there is no specific time when all the nodes reach consensus. In other words abft consensus mechanism has no block time. (there is no blockchain here. neither a block reward)
Since gossiping the fastest way of sperading information, it takes 3-5 seconds for the entire network to reach a consensus.
Hedera claims that the network achieves 10,000 HBAR cryptocurrency transactions per second. (i personally do not believe those claims)
You can also write smart contracts on Hedera. But you cannot execute 10000 smart contract operations per second because Hedera uses EVM and EVM is very slow.
Hedera has no slashing. If validators misbehave, they will not lose any of their crypto holdings. because they claim that hedera hashgraph gives validators less opportunity to misbehave. validators receive the 90% of the network fees.

How to prevent fake data from being sent to a Blockchain

I am developing a blockchain for IoT applications, where there are a number of gateways (miners) spread throughout the city and several nodes (sensors) connected to each of them. Each gateway can be added by an end user so this is a untrusted environment. How can I make sure that there isn't fake data being sent to the chain by one of the miners?
I have looked up some consensus protocols by find that none fit this specific problem since there is no value being exchanged.
Every miners sends a ping to a master server and receives from it the list of miners on the network. Then they connect to each other by p2p.
Any ideas of how could I solve this?
Blockchain can be used in both cases permission-less or permissionned, if you want to prevent that anyone can broadcast data, then you have to authenticate the nodes before they can join the network. If even after authenticating the nodes there is a chance that an authenticated node send "fake data" then a trust mechanism must be implemented, nodes verify the trustworthiness of the data's source and decide if the node is trusted and accept the data or not.
In order to prevent spamming or fake data being posted, it has to be added as a consensus rule to the protocol. Otherwise, it requires another layer that validates data based on off-chain data (but doesn't prevent data from being stored in blocks). Blockchain is for achieving distributed consensus, in a permission-less system. Restricting who can participate is not a permission-less system, and would be a centralized system because someone has to determine who is allowed to participate.
The answer to the query lies in Blockchain Oracles.
Oracles to-date are centralized services, meaning any smart contract using such services has a single point of failure, which nullifies any benefits gained from the decentralized nature of smart contracts.
To fill this gap, Chainlink was developed as the first decentralized oracle that can provide external data to smart contracts. As a result, the security and determinism of smart contracts can be combined with the knowledge and breadth of real-world external events. Chainlink will provide a smart contract with access to any external API needed.
As according to chainlink here and here
Blockchains and smart contracts cannot access data from outside of
their network. In order to know what to do, a smart contract often
needs access to information from the outside world that is relevant to
the contractual agreement, in the form of electronic data, also
referred to as oracles. These oracles are services that send and
verify real world occurrences and submit this information to smart
contracts, triggering state changes on the blockchain.

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

Can ImageResizer be exploited if presets are not enabled?

My project uses the Presets plugin with the flag onlyAllowPresets=true.
The reason for this is to close a potential vulnerability where a script might request an image thousands of times, resizing with 1px increment or something like that.
My question is: Is this a real vulnerability? Or does ImageResizer have some kind of protection built-in?
I kind of want to set the onlyAllowPresets to false, because it's a pain in the butt to deal with all the presets in such a large project.
I only know of one instance where this kind of attack was performed. If you're that valuable of a target, I'd suggest using a firewall (or CloudFlare) that offers DDOS protection.
An attack that targets cache-misses can certainly eat a lot of CPU, but it doesn't cause paging and destroy your disk queue length (bitmaps are locked to physical ram in the default pipeline). Cached images are still typically served with a reasonable response time, so impact is usually limited.
That said, run a test, fake an attack, and see what happens under your network/storage/cpu conditions. We're always looking to improve attack handling, so feedback from more environments is great.
Most applications or CMSes will have multiple endpoints that are storage or CPU-intensive (often a wildcard search). Not to say that this is good - it's not - but the most cost-effective layer to handle this often at the firewall or CDN. And today, most CMSes include some (often poor) form of dynamic image processing, so remember to test or disable that as well.
Request signing
If your image URLs are originating from server-side code, then there's a clean solution: sign the urls before spitting them out, and validate during the Config.Current.Pipeline.Rewrite event. We'd planned to have a plugin for this shipping in v4, but it was delayed - and we've only had ~3 requests for the functionality in the last 5 years.
The sketch for signing would be:
Sort querystring by key
Concatenate path and pairs
HMACSHA256 the result with a secret key
Append to end of querystring.
For verification:
Parse the query,
Remove the hmac
Sort query and concatenate path as before
HMACSHA256 the result and compare to the value we removed.
Raise an exception if it's wrong.
Our planned implementation would permit for 'whitelisted' variations - certain values that a signature would permit to be modified by the client - say for breakpoint-based width values. This would be done by replacing targeted key/value pairs with a serialized whitelist policy prior to signature. For validation, pairs targeted by a policy would be removed prior to signature verification, and policy enforcement would happen if the signature was otherwise a match.
Perhaps you could add more detail about your workflow and what is possible?

LDAP Server side sorting - really a good idea?

I'm toying with using server side sorting in my OpenLDAP server. However as I also get to write the client code I can see that all it buys me is in this case one line of sorting code at the client. And as the client is one of presently 4, soon to be 16 Tomcats, maybe hundreds if the usage balloons, sorting at the client actually makes more sense to me. I'm wondering whether SSS is really considered much of an idea. My search results in the case aren't larger, dozens rather than hundreds. Just wondering whether it might be more of a weapon than a tool.
In OpenLDAP it is bundled with VLV - Virtual List View, which I will need some day, so it is already installed: so it's really a programming question, not just a configuration question, hence SO not SF.
Server-side sorting is intended for use by clients that are unable or unwilling to sort results themselves; this might be useful in hand-held clients with limited memory and CPU mojo.
The advantages of server-side sorting include, but not limited to:
the server can enforce a time limit on the processing of the sorting
clients can specify an ordering rule for the server to use
professional-quality servers can be configured to reject requests with sort controls attached if the client connection is not secure
the server can enforce resource limits, for example, the aforementioned time limit, or administration limits
the server can enforce access restrictions on the attributes and on the sort request control itself; this may not be that effective if the client can retrieve the attributes anyway
the server may indicate it is too busy to perform the sort or simply unwilling to perform the sort
professional-quality servers can be configured to reject search requests for all clients except for clients with the necessary mojo (privilege, bind DN, IP address, or whatever)
The disadvantages include, but not limited to:
servers can be overwhelmed by sorting large result sets from multiple clients if the server software is unable to cap the number of sorts to process simultaneously
client-side APIs have to support the server-side sort request control and response
it might be easier to configure clients to sort by their own 'ordering rules'; although these can be added to professional-quality, extensible servers
To answer my own question, and not to detract from Terry's answer, use of the Virtual List View requires a Server Side Sort control.