CCC review W8
Elasticsearch (I hate this database)
“Big data” challenges and architectures
Four V in big data, Volume, Velocity(the frequency of new data brought in system), Variety(complexity of data schema), Veracity(more data, more diverse source, more unstructured data)
no relational and no SQL, SQL did a great job on consistency, but not work for big data
Data model for distributed system,
key-value store(fast, structured), BigTable DBMS, Document-oriented DBMS(store data as structured files, eg.Elasticsearch)
Database cluster: why we need DB as cluster also
Distribute computing load over multiple computer affording availability
Store multiple copy of data
shard is a partition of your database(take as a part, also the same shards can be stored in different nodes, and we can choose how many replicates we want to store)
Federated architecture DB: many nodes with different tables, but only one entry node. Different DBs with different tables run on many nodes
ElasticSearch Cluster architecture:
two nodes type, Master and data node; one node can have more than one node type.
More than one node can have master role, but only one node in cluster can be a master at the same time(the other nodes with master role will become master-eligible)
every node can act as a coordinate node(coordinating query execution)
indexes can have shards and replicates, also elasticsearch has status green&yellow(indicates that there is no sufficient nodes for distributing shards and replicates)
consistency(client receive same answer from all node of the cluster, availability(receive response from at least one node, partition-tolerance(keep operating when one or more nodes offline)
Brewer's Cap Theorem, you can only achieve two of them
to achieve two of them:
Consistency and availability: two phase commit
when the cluster is co-located, it works great, but not good for distributed
Availability and partition-tolerance: Multi-Version Concurrency Control (MVCC)
Consistency and partition-tolerance: Paxos, in this algorithm, every node either a proposer or an accepter
Why Document-Oriented DB for big data?
sharding: shards in the partitioning of a DB horizontally, database rows or documents are partitioned into subsets and are stored on different nodes
replication: the action of the same row or documents on different nodes to achieve fault tolerance
Finally ElasticSearch now
pros: full-text search, retrieval of time-based data, storing unstructured data
cons: bad at lined data,
Concepts:
Index: an index is comparable to a DB in relational DBMS, Documents: data item of an index, Data steam: a set of indexes follow the same naming pattern, Shard: horizontal partition of an index, Replicates, Node: an instance of ES, Cluster: multiple nodes that cooperate to manage the same index
Components:
ES, FileBeat: the component that listens for updates in files, and loads the updates into indexes, MetricBeat: monitor the status of system, Logstash: transfer data by data sources,Kibana: frontend user interface
scaling in ES either be horizontal(adding more nodes to a cluster) or vertical(provisioning a more powerful node)
CCCreview system test
Automatically, Early, Comprehensively, Incrementally(write a little, and test a littile)
What we test? Expected conditions, Boundary conditions, Error conditions
Test Taxonom: Unit tests, Intergration tests, System tests(end-to-end tests)
Test Iteration:
First Iteration: Functions, Routes, End-to-end tests
Second Iteration: Parameters, Database, Function read from DB
Third Iteration: side-effect function, Wipe test DBs, test each case before wipe, Implement transaction, Test Queries
Forth Iteration: Refactoring, Unit test
金泓宇的戒毒日记day5
你看,虽然炸药包很重,但听起来没有那么糟糕,只是在包里装了太多东西了。人会一眼就接住那个眼神是有道理的,金泓宇今天早上在看闻香识女人里,阿尔帕西诺问,你知道为什么车那么重嘛。Cuz you got the weight of the world on your shoulder.
Also, 另外半句话他被问。If you are tangled up,just tango on. 我有点不喜欢这句话的中文翻译,但是如果面前的路已经开始变得乱糟糟了,那就继续走下去吧。If we are tangled up, can we tango on?
金泓宇的戒毒日记day4
金泓宇今天想book一个十一点的月亮,在perterhall到msd的那棵树下。但不是直接去的那条路,是好像要绕一大圈的那条路,几年前金泓宇说,没有人是时刻按理性人思考的,因为那条路绕起来真的有点远,但是近的路上有窗。
所以如果要从上海大学的a楼回宿舍,应该走北环,但是金泓宇喜欢绕一圈走泮池。可能是因为今天是放假叭。谁说戒毒所不能放假的,自愿被送进戒毒所的人,都能放周末的。但如果月亮今天会照在树下,金泓宇还想看一个人自杀后会变成意大利面的故事。
金泓宇的戒毒日记day3
记得拿学生卡!!!!
金泓宇最近根本来不及看Workshop的问题。but still,今天老师问大家想听哪道题的时候,第八题里的 Pareto distribution超级显眼。有的人甚至没看题就跟老师讲,maybe No.8. 但是有的人欠了太多lecture所以有点没懂那道题叭。所以解不出是因为落了课,补完课就能解出解。大概吧。
另外一个巨大伤心,确确实实在于,被大家讨厌了。但是做什么都是有代价的对吗。
Ken桑说这个人还怪幽默的,我有点不知道该说什么,所以金泓宇闭嘴了。
金泓宇的戒毒日记day2
这几周金泓宇做的最简单的事情可能就是写代码了
有些人其实很讨厌自己脑子里的一半的工程师思维,那个需要把问题一步步拆成function才能开始工作的人生,但是写function真的好简单。
有些人可能真的需要被推一把,告诉他该干嘛,在什么时候,在哪里。金泓宇在人生中好多重要的决策下,最后都选了不决策。但是做不出决定的人是会被讨厌的吧,这好像变成那个人生中越来越清晰的画面了。所有在身边的人会慢慢的开始被一个呆在原地的人耗尽,然后金泓宇开始看不到人家的画面,看不到人生接下来会演什么。然后演一个四十多年没走出爱荷华州的人。
噢,Peter Hall旁边的窗碎了,但也没人能出来呢。
金泓宇的戒毒日记day1
人真的会因为戒毒陷入巨大panic
特别是在连着两个小时没有任何进展还只能对着墙面壁的时候
他真实的魂穿那个17岁在上海图书馆四楼的那个小孩
不知所措 又张不开嘴
想爷爷了
所以是根本没有办法用事情都过去了来开脱的,因为再怎么想都不会过去。所以其实是金泓宇根本没有意识到,根本没有往这方面去想。就,怎么会这样呢。如果早一点意识到,就,就算没办法,也有很多机会把话讲完。就不至于去干很多没有意义的事情。就至少不会是现在这样。
新年快乐!
新年快乐!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!