- Lots of system design practice today.
- NoSQL vs SQL.
- NoSQL is useful when you always need all information about that row. SQL better for select x, NoSQL better for select *.
- Don’t need NULLs. Just skip keys in the nosql json. In this way, the schema is basically flexible per entry.
- Good for analytics.
- NoSQL is more expensive for updates, and doesn’t guarantee ACID.
- NoSQL is more expensive for reads. While you can grab a whole blob easily, getting only one col across the whole db is slow.
- JSON nesting is nowhere near as structured as the the relationships (FK, many-many, etc).
- Joins are weird. Not easy.
- Overall: if your data is always read/written in blocks, not individual cols, nosql is good.
- Key-value is just key-value. Document nosql can have structure and nesting within (JSON usually).
- Examples:
- Document nosql dbs: mongo, couch.
- Key-value nosql dbs: dynamo, redis.
- Wide col nosql dbs: cassandra.
- Graph nosql dbs: neo4j.
- 10m call with Jack to set up the onsite with citadel, a few notes:
- Will’s team is interested, but there are a few others as well (equities…). Interview might be at one site, but fulltime might be at the other.
- You don’t need to have expertise in finance, but you need to show interest. The projects of last year are great examples.
- ACID is a db transaction property for transactions. Atomicity, Consistency, Isolation, Durability. Financial institutions absolutely require this. Something like a metrics server wouldn’t.
- Availability is another important metric, but isn’t part of ACID.
- Can shard by ID, or by location, or whatever is the most convenient/divisive/common entry point.
- JOINs get expensive if they’re cross-shard.
- Master-slave architecture. Master gets all the writes, slaves update to match master every X sec/min/hours/days, slaves get all the reads.
- If master goes down, one of the remaining slaves becomes master.
- Still dumb that its formal name is master/slave in 2020.
- Remember generators just return before they’re finished executing. Could be 3 lines straight with yield 1, yield 2, yield 3, but it’s usually a loop within the function. Just returns partial progress. Then you can iterate over the returns as they come, or call next(my_gen_func) to walk through manually.
- Confirmed and planned the logistics for the amazon onsite on monday.
- Need to print the NDA before then.
- Consistency is very important. Do you want mutexes on everything read/write such that the data returned is always exact? This is much slower and much bigger. How much data can afford to be out-of-date? How much time must pass before data is considered obsolete?
- If I scale a microservice horizontally, the machines need to talk to each other. RPC. If I scale vertically, it’s all local still. IPC. RPC is slower, and more complicated to manage. But it’s usually cheaper, and it’s boundless, and it’s more resilient to failure.
- Remember: event-driven/pubsub/messagequeue/bus designs are worse with consistency, but better with availability. You’re distributing events and aggregating results. If you need higher fidelity responses, go with more of a request/response (timeout) design. Slower, but more integrity in consistent data.
- Event-driven is easier to test, play, rollback, because it’s structured and sequenced. Kinda like react/redux.
- In-mem solutions use snapshots to get persistence, but they’re obviously at a certain frequency and so you open risk for data lass.
- Postgres has nosql features as well, it’s just used less commonly. You can store docs just like mongo.
- Amazon 1hr onsite coaching call, general to all onsites.
- Basic fundamentals. The one thing I usually have to remember is: Ask questions to clarify the problem. This especially applies to design questions, but can also apply to coding questions.
- Amazon 30m final call, specific to me.
- Schedule: (each 1hr)
- 2 technical: Problem solving, DS&A.
- 1 nontechnical with hiring manager. Customers, communication, dealing with adversity.
- 1 lunch. Culture. Ask questions. Relax.
- 1 technical: Logical and maintainable code.
- 1 technical: System design. Scalability. Operational performance.
- Each will have 2 leadership principle questions (behavioral) as well.
- General:
- Ask clarifying questions. Think out loud.
- Start with an easy solution. Brute force, whatever. Then expand. Take bite-sized bites.
- The interviewer is your coworker! If you start to panic, look at them like a collaborator.
-
Always consider priority. A bank withdrawal is much more time-sensitive than an email notification. This could affect the ordering in the message queue, the rate limiter thresholds on various notes, the load balancing, etc.
-
When realtime accurate data is cumbersome, you can trend or interpolate data, based on history, to give an estimate. This is true for something like the viewcount on popular youtube channels. Would take too much to show exactly, but is easy to show ballpark.
-
Thrashing – when a cache is inserting/deleting too quickly.
-
Going through this process of interview prep has been interesting. I feel like I’ve consumed what must equate to at least a handful of semesters in fundamental CS courses. This accessibility of information makes me wonder about the future of education. It’s been maybe 6 weeks and $0, whereas the university equivalent would have been a year and $30k. Will our grandchildren get in-person degrees? I’m not so sure this model is viable in the future.
-
Ordered more dry-erase markers for whiteboard practice at home. Arrive tomorrow.
-
Netflix splits every piece of content into small atoms, stored with different permutations of codecs and resolutions. That’s why viewing is seamless when your internet quality changes, or you select a new one; it will simply then fetch the right chunk instead of the whole movie again. The amount of chunks it preloads into the future is based on heuristics, by how much people typically jump during this movie. If a lot -> preload only a little. If a little -> preload a lot.
-
Did a practice system design from scratch for instagram/twitter.
-
It looks like neither grubhub nor yelp do group carts anymore? I remember loving this feature. You’d just send the link to anyone, they’d look through then menu and add, then be responsible for the finances individually when the order was sent.
-
You could store images in a db as blobs (binary large objects), but it’s almost always still best to put them on a file system elsewhere (CDN, s3, etc) and keep only a URL to the file in your db.
-
A client-server relationship is always initiated by the client. Request -> response. The protocol for this is http.
-
A peer-peer relationship can be initiated by either side. You can use a general socket (TCP) or a protocol specific to your use case. For chat (whatsapp, fb messenger, etc) the protocol is XMPP – extensible messaging and presence protocol.
-
It’s ok to have gut instincts. It’s even ok if they’re wrong. State them, and then state that you’re going to analyze them now. Do the verification out loud. A raw, transparent thought process is worth a lot.
- Designing google search.
- First, just think about keeping a big dictionary. As people add sites to the internet, it adds words and phrases to the dictionary. You have a rough mapping of what is out there.
- Then think of it like a gigantic hash table, or an indexed db. Look for the word (or words) in the search, and return the direct results.
- That might not be all though – look for misspellings, common associations, and other relatives. This is where association algorithms come in. They’re usually based on prior results. There are many other keys to consider: location, how recent, etc. There are also blacklists for known spam, risks, more.
- Then, you have a list of possible options. How do you rank them? What do you show at the top of the list? This is another ranking algorithm. Again, usually based on prior results.
- How do you manage the size of the dictionary? MapReduce. Basically shard the hash table by characters (since those are they keys that the user is typing), farm out to the all the appropriate nodes to their more-efficient subsearches, then combine the data back.
- ~1 in every 7 strings typed into the google search engine has never been typed before!
- Write-ahead logging is when the action is logged before it is performed. Think of it in the context of a database write operation – this handles the atomicity and durability principles (of ACID).
- Third day of jeop goat.
- Replication types. Consider a db mirror, or a master-slave.
- Snapshot replications. Vompletely copy the full data over, like a backup or a pgdump.
- Transactional replication. Replay the transaction on each slave.
- There are more, but don’t worry a ton about them.