Is blockchain a decentralised database? - bitcoin

I understand bitcoin uses blockchain technology to maintain a decentralised ledger of all transactions. I have also read many posts eluding to future applications of blockchain technology, none of which have been very clear to me.
Is blockchain technology simply a decentralised database with consensus validation of the data? If this was the case surely the db would grow to be too large to be effectively decentralised?
To help me understand, can anyone point me to a clear example of a non-bitcoin blockchain application?

Yes, its true that the blockchain database would grow overtime which is what is called "blockchain bloat". Currently the blockchain growth of Bitcoin is roughly less than 100mb a day. Today (2016) the bitcoin blockchain takes up about 60-100GB of space which took about 6 years to accumulate. It is indeed growing faster, but also limited by the blocksize cap of 1MB per block (every 10 minutes). Some proposed solutions have been:
SPV nodes: This is how your phone doesn't need to download the entire blockchain, but retrieve its data from SPV nodes that have the entire blockchain.
Lightning network - This is how Bitcoin can overcome the 1MB memory cap.
Those are just some of the solutions for bitcoin that I know of, as for altcoin related solutions. NXT/Ardor has implemented the solution of pruned data, because NXT/Ardor gives the ability to upload arbitrary data and messages onto its blockchain, the bloat is much more apparent in this scenario. The NXT/Ardor blockchain has the ability to delete previous data every 2 weeks and only keep the hash of its data on the blockchain which only takes a few KB. They also have the ability to retain all of the blockchain data with the pruning turned off which would mark a node as an Archival Node and other nodes can replicate this node and become an Archival node.
From my understanding NXT/Ardor has been one of the few blockchains that has production ready decentralized data storage system, marketplace, stock exchange, and messaging system built into its blockchain.

Blockchain is not just a decentralised database, but it is much more than that. While the original Bitcoin blockchain allowed only value to be transferred, along with limited data with every transaction, several new blockchains have been developed in the past 2-3 years, which have much more advanced native scripting and programming capabilities.
Apart from the Bitcoin blockchain, I would say that there a few other major blockchains like Ethereum, Ripple, R3's Corda, Hyperledger. Although Ethereum has a crypto-currency called Ether, it is actually a Turing complete EVM (Ethereum Virtual Machine). Using Ethereum, you can create Smart Contracts that would themselves run in a decentralised manner. As a developer, it opens up completely new avenues for you and changes your perspective of writing programs. While Ripple is mainly geared towards payments, Corda and Hyperledger are built with a view of being private/permissioned blockchains, to solve the issues such as scalability, privacy, and identity. The target markets for Hyperledger and Corda are mostly banks and other Financial Institutions.
As for the non-bitcoin application of blockchain, you can certainly look at some companies like Consensys (multiple different use cases on blockchain), Digix Global (gold tokens on the blockchain), Everledger (tracking of diamonds on the blockchain), Otonomos (Company registration on the blockchain), OT Docs (Trade Finance and document versioning on the blockchain) amongst others.

Blockchain is:
Name for a data structure,
Name for an algorithm,
Name for a suite of Technologies,
An umbrella term for purely distributed peer-to-peer systems with a common application area,
A peer-to-peer-based operating system with its own unique rule set that utilizes hashing to provide unique data transactions with a distributed ledger

Blockchain is much more than a "database". Yes the blocks on the chain stores data but it is more like a service. There are many applications of blockchain. Read about them: here. If you want to see the code of a blockchain application, try this one: here.

Blockchain is combination of p2p network, decentralised database and asymmetric cryptography
P2P network means you can transfer data between two deferent network nodes without any middleman, decentralised db means every nodes of network has one replica of network db and asymmetric cryptography means you can use digital signature to validate the authenticity and integrity of a messages

Related

Analyse huge amount of blockchain data

I am trying to go over all transactions data from every block on the bitcoin blockchain from the previous 4 years. With almost 2k transaction per block, it will take a lot of queries per block.
I have a full node running locally and I tried two ways:
Python with RPC: This is very slow and keeps losing connection after some time (httpx.ReadTimeout)
Python with os.popen commands: Doesn't have the connection problem, but still very slow.
Would there be any other way? Any recommendation on how to analyze bulk data from the blockchain? The methods listed above are unfeasible given the time it would take.
EDIT: The problem isn't memory, but the time the bitcoin node takes to answer the queries.
Hey there are differents ways to fetch bitcoin blockchain data:
Network level using P2P messages (this method doesn't require to setup a node)
Parsing .blk files which are synchronized by your node
Querying the application interface RPC
P2P messages and .blk files are raw encoded, so you will need to decode blocks and transactions.
The RPC interface abstract the raw decoding but it's slower (because it decodes).
We wrote a paper with Matthieu Latapy to give instructions about collecting the whole Bitcoin blockchain and indexing in order to make parsing efficient.
Step-by-step procedure
Full paper
Repository
Website

Best technology for building race simulation application

I am trying to do something new, something I have never done before. I am looking for advice or point me into right direction how to choose technology. I am trying to build race simulation app that will have thousands of iot devices streaming data into central platform. While I understand that I can use some sort of IOT hub with cloud providers, but what technology do I choose for storing data?
Example is online indoor biking app. There are apps where you can connect your indoor bike online and have simulated race. For my project I am trying to build something similar. Do I use NO SQL db in this scenario? What technology will allow better scale of application like this since it could be millions of devices around the world in "simulated" race. I am not worried about front-end and things like that, but backend, IOT hub, storing data, presenting-real time?
At this point it is important to understand what kind of data your IoT devices will stream, and at what kind of a rate. It will have significant impact on your question.
That it is if it's just location information and some other small data sent lets say once a second, then if you're talking about tens of thousands of devices - this is not a big load of information, and any standard database, like MySQL will be able to deal with it. You will of course need a multi-threaded server(s) capable of handling many requests in parallel.
If your IoT devices will stream HD video, then you're looking at a completely different solution, with a much stronger server, capable of handling allot of streams in parallel, with significant bandwidth requirements from your hosting company, as well as storage space for all the videos. In this case you will store the streams as files (if you'll need them later on), and you won't need any special database either.
In any case, once you'll reach millions of users, you'll be able to scale most modern databases and servers, like MySQL replication capability. For example, take a look how Wikipedia is relying on MySQL: wikipedia - MySQL https://www.mysql.com/why-mysql/case-studies/mysql-cs-wikipedia.html
So I wouldn't be worried regarding the database on this stage, but make sure that the design of my system is in accordance to the the type of data and rate it is streamed.
Hope this gives you a pointer.

Does a cryptocurrency wallet blockchain need to be fully synched before it can be mined?

I have a mining pool that is going to take several days to synch the blockchain, I'm wondering if I can have miners mining on it before the blockchain synchs or if I have to wait before blocks will be generated. I suspect that I do, but perhaps this lovely site will erase that suspicion entirely :)
Are you talking about bitcoin network or any private blockchain network?
Yes, you have to sync entire blockchain wallet.
As we know that mining is the process of adding transactions to the large distributed public ledger of existing transactions and cryptocurrency wallet blockchain remains in syncing process automatically.
Mining and syncing of blocks works simultaneously, can you please explain what you exactly want to achieve?

Protocol for remote logging of temperature, gas/electricity consumption

So, I'm managing a series of rented holiday homes, which all have dynamic IP, ADSL Internet connections.
We've wanted to keep track of a few types of data, e.g. per-room electricity usage, hot water temperature, thermostat setting, gas usage, network bandwidth usage, etc etc, and keep these centrally so we can perform analytics and graph them in real-time.
I'm comfortable building the hardware required to log these variables every 1-5 seconds and get them into e.g. a Raspberry Pi, but I'm wondering what kind of framework would be suitable for transferring and storing the data on the server side.
My initial thought was something like SNMP, but a) this doesn't seem designed for non-network uses, b) it's not very secure, and c) I'm looking for something agent-to-server (so I don't have to know the IP of the agent, and it'll also traverse NAT, so I can have multiple devices logging different things on the same network.)
My second thought was something using a REST API, but making potentially hundreds of API calls per second via different TCP connections seems a bit wasteful.
I came across Cubism but this seems to have the same disadvantages as some sort of REST API; there's a lot of redundant data transmitted every connection, if I were to send the data every 5 seconds per sensor.
Names like AMQP and MQTT come up, though none of these seem particularly suited (natively) to travelling over the public Internet without configuring VPNs etc.
Thoughts?
[This doesn't seem like a particularly niche problem, now I think about it - weather logging, share price, etc etc... although this is probably a smaller interval]
I have an geospatial/environment monitoring background and can tell you something about two major standards which are used today in environmental/infrastructural (electricity and water supply networks) monitoring sensor networks.
Proprietary one: Most sensors simply store time series measurements in their own local data format. A server process calls every sensor from time to time to gather the time series data (in most cases via a simple GPRS uplink), transforms it into an exchange Format and then stores it into a centralized database where you can work with the data. One of the industry leader companies is Kisters AG and their exchange format ZRXP. So this is simply storing time series data in an ASCII Format (i.e.ZRXP), and import that into a database by calling the sensor over any connection.
Open Geospatial Standard: Sensor Observation Service and SensorML which I think does more fit your needs, because these are Web Service Specifications whilst the proprietary stuff above is a complete system solution built by one vendor. There exists a nearly ready to use java reference implementation of SOS provided by 52 north which should be easily runnable on a Pi. Although the SOS specification has a very strong geospatial background, that does not mean,that it can't be adopted for your purpose I think. At least SensorML should give you some ideas.

Application Level Replication Technologies

I am building out a solution that will be deployed in multiple data centers in multiple regions around the world, with each data center having a replicated copy of data actively updated in each region. I will have a combination of multiple databases and file systems in each data center, the state of which must be kept consistent (within a data center). These multiple repositories will be fronted by a SOA service tier.
I can tolerate some latency in the replication, and need to allow for regions to be off-line, and then catch up later.
Given the multiple back end repositories of data, I can't easily rely on independent replication solutions for each one to maintain a consistent state. I am thus lead to implementing replication at the application layer -- by replicating the SOA requests in some manner. I'll need to make sure that replication loops don't occur, and that last writer conditions are sorted out correctly.
In your experience, what is the best pattern for solving this problem, and are there good products (free or otherwise) that should be investigated?
Lotus/ Domino is your answer. I've been working with it for ten years and its exactly what you need. It may not be trendy (a perception that I would challenge) but its powerful, adaptable and very secure, The latest version R8 is the best yet.
You should definitely consider IBM Lotus Domino. A Lotus Notes database can replicate between sites on a predefined schedule. The replicate in Notes/Domino is definitely a very powerful feature and enables for full replication of data between sites. Even if a server is unavailable the next time it connects it will simply replicate and get back in sync.
As far as SOA Service tier you could then use Domino Designer to write a webservice. Since Notes/Domino 7.5.x (I believe) Domino has been able to provision and consume webservices.
AS what other advised, I will recommend also Lotus Notes/Domino. 8.5 is really very powerful application development platfrom
You dont give enough specifics to be certain of your needs but I think you should check out SQL Server Merge replication. It allows for asynchronous replication of multiple databases with full conflict resolution. You will need to designate a Global master and all the other databases will replicate to that one, but all the database instances are fully functional (read/write) and so you can schedule replication at whatever intervals suit you. If any region goes offline they can catch up later with no issues - if the master goes offline everyone will work independantly until replication can resume.
I would be interested to know of other solutions this flexible (apart from Lotus Notes/Domino of course which is not very trendy these days).
I think that your answer is going to have to be based on a pub/sub architecture. I am assuming that you have reliable messaging between your data centers so that you can rely on published updates being received eventually. If all of your access to the data repositories is via service you can add an event notification to the orchestration of each of your update services that notifies all interested data centers of the event. Ideally the master database is the only one that sends out these updates. If the master database is the only one sending the updates you can exclude routing the notifications to the node that generated them in the first place thus avoiding update loops.