0%

<<某克莱登大学计算机科学博士、混乱的共产主义者、间歇性犬儒主义者>>

Elasticsearch (I hate this database)

“Big data” challenges and architectures

Four V in big data, Volume, Velocity(the frequency of new data brought in system), Variety(complexity of data schema), Veracity(more data, more diverse source, more unstructured data)

no relational and no SQL, SQL did a great job on consistency, but not work for big data

Data model for distributed system,

key-value store(fast, structured), BigTable DBMS, Document-oriented DBMS(store data as structured files, eg.Elasticsearch)

Database cluster: why we need DB as cluster also

Distribute computing load over multiple computer affording availability

Store multiple copy of data

shard is a partition of your database(take as a part, also the same shards can be stored in different nodes, and we can choose how many replicates we want to store)

Federated architecture DB: many nodes with different tables, but only one entry node. Different DBs with different tables run on many nodes

ElasticSearch Cluster architecture:

two nodes type, Master and data node; one node can have more than one node type.

More than one node can have master role, but only one node in cluster can be a master at the same time(the other nodes with master role will become master-eligible)

every node can act as a coordinate node(coordinating query execution)

indexes can have shards and replicates, also elasticsearch has status green&yellow(indicates that there is no sufficient nodes for distributing shards and replicates)

consistency(client receive same answer from all node of the cluster, availability(receive response from at least one node, partition-tolerance(keep operating when one or more nodes offline)

Brewer's Cap Theorem, you can only achieve two of them

to achieve two of them:

Consistency and availability: two phase commit

when the cluster is co-located, it works great, but not good for distributed

Availability and partition-tolerance: Multi-Version Concurrency Control (MVCC)

Consistency and partition-tolerance: Paxos, in this algorithm, every node either a proposer or an accepter

Why Document-Oriented DB for big data?

sharding: shards in the partitioning of a DB horizontally, database rows or documents are partitioned into subsets and are stored on different nodes

replication: the action of the same row or documents on different nodes to achieve fault tolerance

Finally ElasticSearch now

pros: full-text search, retrieval of time-based data, storing unstructured data

cons: bad at lined data,

Concepts:

Index: an index is comparable to a DB in relational DBMS, Documents: data item of an index, Data steam: a set of indexes follow the same naming pattern, Shard: horizontal partition of an index, Replicates, Node: an instance of ES, Cluster: multiple nodes that cooperate to manage the same index

Components:

ES, FileBeat: the component that listens for updates in files, and loads the updates into indexes, MetricBeat: monitor the status of system, Logstash: transfer data by data sources,Kibana: frontend user interface

scaling in ES either be horizontal(adding more nodes to a cluster) or vertical(provisioning a more powerful node)

Automatically, Early, Comprehensively, Incrementally(write a little, and test a littile)

What we test? Expected conditions, Boundary conditions, Error conditions

Test Taxonom: Unit tests, Intergration tests, System tests(end-to-end tests)

Test Iteration:

First Iteration: Functions, Routes, End-to-end tests

Second Iteration: Parameters, Database, Function read from DB

Third Iteration: side-effect function, Wipe test DBs, test each case before wipe, Implement transaction, Test Queries

Forth Iteration: Refactoring, Unit test

你看,虽然炸药包很重,但听起来没有那么糟糕,只是在包里装了太多东西了。人会一眼就接住那个眼神是有道理的,金泓宇今天早上在看闻香识女人里,阿尔帕西诺问,你知道为什么车那么重嘛。Cuz you got the weight of the world on your shoulder.

Also, 另外半句话他被问。If you are tangled up,just tango on. 我有点不喜欢这句话的中文翻译,但是如果面前的路已经开始变得乱糟糟了,那就继续走下去吧。If we are tangled up, can we tango on?
会一起去嘛 一起去吧.

金泓宇今天想book一个十一点的月亮,在perterhall到msd的那棵树下。但不是直接去的那条路,是好像要绕一大圈的那条路,几年前金泓宇说,没有人是时刻按理性人思考的,因为那条路绕起来真的有点远,但是近的路上有窗。

所以如果要从上海大学的a楼回宿舍,应该走北环,但是金泓宇喜欢绕一圈走泮池。可能是因为今天是放假叭。谁说戒毒所不能放假的,自愿被送进戒毒所的人,都能放周末的。但如果月亮今天会照在树下,金泓宇还想看一个人自杀后会变成意大利面的故事。
will never out of my mind.

记得拿学生卡!!!!

金泓宇最近根本来不及看Workshop的问题。but still,今天老师问大家想听哪道题的时候,第八题里的 Pareto distribution超级显眼。有的人甚至没看题就跟老师讲,maybe No.8. 但是有的人欠了太多lecture所以有点没懂那道题叭。所以解不出是因为落了课,补完课就能解出解。大概吧。

另外一个巨大伤心,确确实实在于,被大家讨厌了。但是做什么都是有代价的对吗。

Ken桑说这个人还怪幽默的,我有点不知道该说什么,所以金泓宇闭嘴了。
will never out of my mind.

这几周金泓宇做的最简单的事情可能就是写代码了

有些人其实很讨厌自己脑子里的一半的工程师思维,那个需要把问题一步步拆成function才能开始工作的人生,但是写function真的好简单。

有些人可能真的需要被推一把,告诉他该干嘛,在什么时候,在哪里。金泓宇在人生中好多重要的决策下,最后都选了不决策。但是做不出决定的人是会被讨厌的吧,这好像变成那个人生中越来越清晰的画面了。所有在身边的人会慢慢的开始被一个呆在原地的人耗尽,然后金泓宇开始看不到人家的画面,看不到人生接下来会演什么。然后演一个四十多年没走出爱荷华州的人。

噢,Peter Hall旁边的窗碎了,但也没人能出来呢。
为什么只碎了一半呢

人真的会因为戒毒陷入巨大panic

特别是在连着两个小时没有任何进展还只能对着墙面壁的时候

他真实的魂穿那个17岁在上海图书馆四楼的那个小孩

不知所措 又张不开嘴

所以是根本没有办法用事情都过去了来开脱的,因为再怎么想都不会过去。所以其实是金泓宇根本没有意识到,根本没有往这方面去想。就,怎么会这样呢。如果早一点意识到,就,就算没办法,也有很多机会把话讲完。就不至于去干很多没有意义的事情。就至少不会是现在这样。

新年快乐!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

2022年啦