- AWS CloudWatch
- AWS CodeDeploy
- AWS CodeBuild
- AWS CodeCommit
- Route53
- Kinesis
- ...
Pricing example
5 milioni * 3,50 USD/milione = 17,50 USD
3 KB * 5 milioni = 15 milioni KB = 14,3 GB
14,3 GB * 0,09 USD = 1,29 USD
17,50 USD + 1,29 USD = 18,79 USD
0 requests => ~0 USD
Scalability Summary
- Scale capacity on demand
- Turn fixed costs into variable costs
- Always available
- Rock-solid reliability
- Cost-Effective
- Reduce time to market
Monolith
Problems
- Difficult to scale
- Long development cycles (dev, build, test)
- So many modules (operations nightmare - who is the owner?)
- Architecture legacy (hard to maintain and evolve)
- Add features are difficult for developers
- Feature delivery can be really slow
It means...
- Lack of agility
- Lack of innovation
- Frustrated customers
Services to Micro Services
Composing and collecting APIs create a new business model
where an application is decoupled into many tiny little services
What is a microservice
service-oriented architecture composed of loosely coupled elements that have bounded contexts
Adrian Cockcroft @ Netflix
Create many tiny services
Instead doing one thing
Do one thing, and do it well
Microservices achievements
- We can scale the development team
- Easy deployments (much more deployments)
- Reduce infrastructure costs
Delivery/Deployment...
hundreds of teams x
thousands of services x
different environments x
deployment time = ?
If we take 1 hour to manually build/test/deploy
100 teams x
1000 services x
3 environments x
1 hour
300000 hours for deployments
12500 days for deployments
37500 business days
Continuous delivery/deployment
Delivery automation
If we have 300000 deploy per year
our delivery pipeline must deploy every
30 seconds
AWS CodeBuild/CodePipeline
Measure is the key to science
Now we have tiny little services and
we can deploy those services easily
How we can save resources and deploy more applications in consolidated environment?
Instance resources
More applications?
resource contention
In computer science, resource contention is a conflict over access to a shared resource such as random access memory, disk storage, cache memory, internal buses or external network devices.
Lightweight, isolated environment for applications
CAP - Limits for resources (CPU, Memory, Networking)
It is like a sort of virtual machine but it is not!
if with virtual machine we can run up to 15/20 instances per phisical hardware, now with containers we can spawn hundres if not thousands of application per phisical hardware
Distributed architectures
Clusters for web applications
Microservices
More instances, more resources
More instances, more resources
Api gateway for applications
We have this sort of super computer where we can run our applications
Kubernetes & Docker Swarm
Orchestrators for Containers
Export control with Cloud & On-Prem federation
Many isolated but somehow connected clusters for appplications
Big Data
Big data is a collection of data sets
- so large and complex
- difficult to process
Big Data Challenges
- Capture
- Storage
- Search
- Share
- Transfer
- Analysis
- Visualization
Common Scenario
- Data arriving at fast rate
- Data are typically unstructured or partially structured
- Data are stored without any kind of aggregation
Data Grow
- Scalable storage
- In 2010 Facebook claimed that they had 21 PB of storage
- 30 PB in 2011
- 100 PB in 2012
- 500 PB in 2013
Massive Parallel Processing
In 2012 anybody can elaborate exabytes (10^18 bytes) of data in a reasonable amount of time
Reasonable Cost (Cloud Computing)
The New York Times used 100 Amazon EC2 instances and an Hadoop application to
process 4TB of raw images into 11 million finished PDFs in the space of 24 hours
at a computation costs of about $240
Big Data examples
- So many data (petabyte of information)
- So many different not aggregated data sources
- ...
Petabyte of information
We cannot create a terabyte large index of our data (doesn't fit our memory)
It means that we hit a phisical limit for traditional tecnologies (Relational Databases)
Different/Not Aggregated data sources
We cannot join not structured data and from different data sources
How we can join information in our service oriented infrastructure?
We can solve those computing problems introducting a way to analyse datasets
Map
A mapping is a transformation from one value to another value
[1,2,3].map(multiplyBy3) => [3,6,9]
For example, if you start with the number 2 and you multiply it by 3, you have mapped it to 6.
Reduce
This operator combines mapped values down to the final resultset
[1,2,3]
.map(multiplyBy3) // [3,6,9]
.reduce(sum) => 18
Extend to parallel computing
Map/Reduce is an highly scalable data processing algorithm
it is all about brute force
Hadoop is a framework for distributed map/reduce
Thanks to Hadoop ecosystem we can use higher level tools to analyze/visualize etc... our data
Apache Hive
Run SQL statements over our distributed data source
Well-Known SQL statements
SELECT * FROM tickets t WHERE t.weekday = 12;
SELECT * FROM tickets t GROUP BY weekday;
SELECT * FROM tickets t JOIN users u ON (t.owner_id = u.id) WHERE u.id = 12
etc...
Blockchain and not Bitcoins
Bitcoins are just an application of a blockchain but it can help us to focus on different things about blockchains
Ledger
User |
Balance |
Walter |
12.3 |
Giovanni |
1.7 |
Michele |
1.9 |
Paola |
19.3 |
Michela |
21.5 |
How maintains the ledger?
Central authorities in general: banks, etc.
Why we want to drop central authorities?
Transparency on transaction/tracking
...
So drop central authorities
Drop the central authorities
Walter
User |
Balance |
Walter |
12.3 |
Giovanni |
1.7 |
Michele |
1.9 |
Paola |
19.3 |
Michela |
21.5 |
Paola
User |
Balance |
Walter |
12.3 |
Giovanni |
1.7 |
Michele |
1.9 |
Paola |
19.3 |
Michela |
21.5 |
Giovanni
User |
Balance |
Walter |
12.3 |
Giovanni |
1.7 |
Michele |
1.9 |
Paola |
19.3 |
Michela |
21.5 |
Do we spot how many problems do we have now?
- Privacy problems
- Fraud transactions
- stale/not updated ledger
- ...
Privacy problem
Walter
User |
Balance |
123432 |
12.3 |
143524 |
1.7 |
534653 |
1.9 |
356346 |
19.3 |
498647 |
21.5 |
Paola
User |
Balance |
123432 |
12.3 |
143524 |
1.7 |
534653 |
1.9 |
356346 |
19.3 |
498647 |
21.5 |
Giovanni
User |
Balance |
123432 |
12.3 |
143524 |
1.7 |
534653 |
1.9 |
356346 |
19.3 |
498647 |
21.5 |
Otherwise everybody knows my balance
How we can solve other problems if everybody has complete access to the ledger?
The way that blockchain uses to provide security is
TRUST NOBODY
Blockchain provide mathematical and crypto challenges to achieve that results and protect itself
How to maintain the DISTRIBUTED ledger?
Blockchain do not store a computed ledger but transactions
Transaction wrap an information
Transactions
- From: 12058872352
- To: 39856829385
- Amount: 2
The ledger is just a concatenation of transaction
Transactions
- From: 12058872352
- To: 39856829385
- Amount: 2
Transactions
- From: 39857312434
- To: 19385738532
- Amount: 1.5
Transactions
- From: 98357389332
- To: 39573847838
- Amount: 0.2
why is blockchain and not transaction chain?
because transactions are groupped into blocks
(of transactions of course)
Block
- Transaction 1
- Transaction 2
- Transaction 3
Blocks are linked together to create the chain
Here the blockchain
Block
- Transaction 1
- Transaction 2
- Transaction 3
Block
- Transaction 4
- Transaction 5
- Transaction 6
Block
- Transaction 7
- Transaction 8
- Transaction 9
How we could prevent data modification of a block?
How we could prevent that anybody send fraud trasactions?
With asymmetric cryptography
Digital Signature
Public Key + Private Key
- Only the private key can sign a transaction/block
- Everybody can verify a signature of a transaction/block with the public key
Digital Signature
Block
- Transaction 1
- Transaction 2
- Transaction 3
- footprint (hash)
- Signature
We cannot corrupt a block (transactions)
How to link blocks together?
Every block has also a previous hash
Block
- hash: 1234567
- previous_hash: 46754736
- signature: 13564309683405
Block
- hash: 3456789
- previous_hash: 1234567
- signature: 23895732989535
Block
- hash: 94827523
- previous_hash: 3456789
- signature: 13985723985793
So we cannot rewire the chain
Who introduce new blocks in a distributed network?
Everybody try to create blocks
The first one that creates a new block in the network is the winner
To win a reward and create a new block for the chain everyone in the network has to solve a complex mathematical problem
Solve a mathematical problem
Hash function of something start with `n` zeros
sha256(1404) = 000ebec3ecebd727ce4f020441268ad34ac468cdb1c76ced442380b5ac842b7d
Complex enough to take 10 minutes to solve the problem (bitcoins)
So everybody try to "mine" a new block...
What happend if two actors compute a valid block at the same time?
This is a real problem because we can have a big issue!
Double Spending Problem
Longest chain always win
Double spending on the latest block
wait 6 blocks ahead yours to have a strong confirmation of your transactions
Orphans and blockchain
The chain must be continuously incremented to provide stability
Empty blocks or transaction filled blocks are equivalent, the point is minimize the double spending problem
But if we have to wait 6 blocks ahead our transaction to be reasonably sure about confirmation we have to wait 60 minutes (10 minutes per block generation)
Tipically we wait 2 business days to move money between bank accounts!
In a private blockchain we want all features but we are not interested in rewards and proof of work in general, instead we want more speed for inserting blocks
Proof of Stake
Proof of stake is a different way to validate transactions based and achieve the distributed consensus.
The creator of a new block is chosen in a deterministic way, depending on its wealth, also defined as stake
walter.dalmut @ corley.it