-
- Consolidated my Amazon LP document into the general questions they might ask you, since that has a superset of all the general categories for behavioral interview questions.
- Conf championships! Gonna be an all-red superbowl, niners v chiefs.
- Disposed/packed more kitchen/bathroom supplies.
- Practice problems.
- Drew a system design diagram for autotest on the whiteboard, now that I’ve gotten much better at scalable system design.
- Most of the primary nodes are present, there’s just not much redundancy. Lots of single points of failure.
- Consistency > availability (although it doesn’t really matter at low scale, mostly 1-1 nodes and a couple requests / second).
- Task manager / gateway = dispatch server. Did have a heartbeat for all downstream nodes, although it was http. Did not really serve as a heavy-burden load balancer, most test cells had a single control node – in use, or not in use.
- Master-slave for RDB should have been in place years ago so that vehicle/component teams could query their data at will.
- All the usual frills. Analytics, resource monitor, logs, app monitor, hydra/sync for parallel and rpc sync.
- Redis cache for control node. Faster read, metadata, hydra state, etc.
- Took picture on phone.
- Remember that every db (disk or mem, rdb or nosql, etc) has locks for write.
- Lowlevel, it’s a message queue with an event loop. It’s called the write ahead log, WAL.
- Timeouts, so that one can’t deadlock a row.
- Pessimistic = true, natural lock (“i’m using this”). Optimistic = check if change when writing back, like VCS merge conflicts. Favors smaller faster changes.
- Behavioral.
- Wrote a couple examples for each of the 4 LPs Amazon said they’d check during the senior interview.
- Plugged my ehd into the ubuntu laptop for the first time. It could mount the FAT partition but not the ExFAT, obviously for NT.
- FAT = file allocation table.
- Ansible inventory. I had forgotten this word the other day. Playbooks run, inventory is the list of endpoint nodes.
- Went back over sx-setuptools and autotest.
- Watched countdown. Ending sucked, and was illogical, but overall the movie was better than I thought it would be.
- Europix uses a ton of mem. Was basically 4GB of ram and 4GB of swap, yikes. Even compared to other streaming sites, for all (movies, tv, and sports), which are about half that.
- If you have n + (n-1) + (n-2) + … (1), instead of n * (n-1) * (n-2) * … (1), which is factorial, it’s called the triangular number or the binomial coefficient. It equals (n^2 + n)/2, but you can just take the highest order term and approximate it as n^2. It’s much lower in complexity than n! factorial.
-
- Practice problems.
- https://leetcode.com/problems/minimum-path-sum. Did BFS first. Works, but not as fast. Wrote a secondary recursive solution, with both backtracking and DP. Faster.
- https://leetcode.com/problems/4sum. Wrote a beautiful generic solution to the ksum problem. Recurse down to the 1sum case, not the 2sum! The exit condition is just the target being in the remaining nums!
- https://leetcode.com/problems/3sum-closest. Similar.
- https://leetcode.com/problems/next-permutation/. Dumb question. Too specific for an interview, and doesn’t show much problemsolving skill.
- https://leetcode.com/problems/swap-nodes-in-pairs/. Recursive linked list swapping. Just define to a temp variable, perform the swap, the recurse to the next one. Could totally be solved iteratively.
- https://leetcode.com/problems/reverse-nodes-in-k-group/. Same thing, but instead of k=2 (swap every pair), swap every k. If you’re swapping 2 nodes, each node needs 2 connections for each neighbor, so 4 connections total. Since the root head has no left, you’re doing 3 connections on each iteration/recursion.
- https://leetcode.com/problems/sudoku-solver/. A literal solver for a given sudoku board. Just recurses, populating the board with a new (valid) number and passing it through, then exiting in the base case where there are no valid plays. Backtracking. Not DP. Takes about 40 lines.
- Remember {} is the set literal in python.
- Jack borrowed the BMW for a 7am ride with Teej.

- Run, pull, sauna, meditate, stretch.
- Watched the truth about emanuel, the 2013 movie which servant allegedly ripped off. It was decent until the ending. The similarity is the fake baby, but not much else. Overall I think Servant is distinct enough to survive.
- UFC 246, mcgregor cerrone.
- Got kabuli naan, my favorite.
- Dimensional is the company that Rob partners with for data / portfolio suggestions (regarding my folks’ retirement).
-
- Remember the tool pycallgraph, useful in tandem with cprofile for getting lots of profile information about your module.
- NBC (Comcast) added another streaming service, Peacock.
- Only 5 companies have ever hit the trillion cap. Four comma club!
- Even in a workplace environment, put yourself in your colleagues’ shoes. See things from your customer’s perspective. Really helps.
- Placed Amazon fresh order last night.
- Finished servant season 1. Good, but left lots of questions and loose ends. Need season 2.
- New Eminem album.
- I love the integrations between the google calendar and gmail. Autocreation of flights, hotels, events, etc – fantastic.
- Really interesting:
- Got an unsolicited call from facebook today. Recruiter for senior positions in infra/scalability. Bay area based. Set up a coding interview for next week.
- Practice problems.
- https://leetcode.com/problems/combination-sum-iii. Backtracking. Define the recursive case, define some fail-fast mechanisms as exit conditions. Throw loops in if you need to check multiple starting points.
- https://leetcode.com/problems/climbing-stairs. Recursion, with dp for speed.
- https://leetcode.com/problems/plus-one. Reverse to LSB first, the iterate toward more significant digits, incrementing or carrying the one.
- https://leetcode.com/problems/merge-sorted-array/.
- https://leetcode.com/problems/subsets. Generate all subsets. Just iterate, and add the current number to all previous solutions. Exponential.
- https://leetcode.com/problems/word-search/. Seeing if a word exists in a 2d grid. Cool problem. I love graphs.
- https://leetcode.com/problems/unique-paths/. Recursion with DP and backtracking. Sum left and up. Just think about the recursive case, you’ll get it!
- https://leetcode.com/problems/unique-paths-ii. Same, but with obstacles. A bit harder. Lots more corner cases to consider.
- https://leetcode.com/problems/sqrtx. Iterative, or binary search across range(0, maxint).
- https://leetcode.com/problems/count-and-say. Bottom-up iteration!
- https://leetcode.com/problems/valid-sudoku/. Writing a sudoku validator. Not hard.
- https://leetcode.com/problems/first-missing-positive/. Didn’t solve this one, looked it up though. Very complex problem, but simple solution.
- The neuroscience behind mindfulness is clear. Breathe, smile, stand tall. All work.
- Python itertools.combinations/permutations.
- Google coding interview #2 today. Went well.
- Lunch with briley at wing ferno.
- Put new (2021) registration on the BMW.
- When doing DFS or BFS, you can keep track of the depth/level. Just store each node as (val, depth) instead of (val), then increment it within the (for neighbor in neighbors: queue.append()).
-
- Citadel onsite confirmed, next Thursday. Choice of Chicago or New York. Jack said both offices were great. Chicago is a shorter flight, so we’ll go there. Fulltime can change after, if need be. Fly out midday wed.
- Very quick (2 question) interviewer feedback form for Amazon.
- California would be the 5th largest country economy, after Us China Japan Germany.
- Lots more system design videos. I really do love this stuff.
- Alphabet hit 1 trillion market cap today.
- Probabilistic solutions for hash collisions.
- When counting or checking existence, the natural solution is a hash table.
- The solutions here: use multiple hash functions, then compare. This makes you robust to collisions.
- Count min sketch algorithm.
- Used for counting how many times an item appears in an array.
- Iterate over the array, use multiple different hash functions on each item, then calc the min of all the integer counts produced by them.
- Bloom filter.
- Used to check if a record exists already (think “username taken”…).
- As each record is created, run it through multiple hash functions and put entries in a bit array.
- As a new one comes it, run it through all hash functions. If there’s already an entry for all, then you’re 90% sure it’s already taken.
- If one output doesn’t exist, then you’re 100% sure it is not already used, because you would have put an entry for it when the record was created.
- These typically use efficient hash functions and memory-efficient data structures.
- Allows you to do something like pass a bloom filter clientside so that a user can get immediate feedback in the browser if a name is taken. Just have to pass the K hash functions and a bit array of the possible outputs.
- Rate limiter design.
- Just think like github’s “remaining-requests” to the api. Keep a db which has a mapping of user: tokens. When they’re all used, reject requests.
- You’ll usually need to store timestamps too, so you know how the usage has been spread over your particular window (last minute/hour/whatever). Just like a queue: [ts1, ts2, ts3…]. You can count them as desired. Sliding window.
- Disney finally got back to me, they’ll schedule a phone interview for ASAP. Reminder the comp package is much less than the others: ~190k base, no equity, contract 18mo, half benefits, some disney perks like movie screenings. I’ll see how the phone goes, then balance with my current burden of the others before committing to an onsite interview. They’re glendale, which is a bit more convenient than the remotes.
- Nosql databases can be indexed for speed as well. It’s usually a compound index of multiple keys mushed together. Remember that an index is just something you can query quickly because it’s sorted.
- Sam Harris is so good: https://www.youtube.com/watch?v=LRm_H158qc0. Be grateful for things that haven’t happened. Easy way to stay in the moment, impervious to negative emotion.
- Downloaded Waking Up. Will read after jim simons’ bio.
- To create the floodfill feature (the paint bucket in mspaint), just use a graph traversal (either BFS or DFS) where the neighbors are all 4 nodes in the cardinal directions. If the value is the same, update to new color.
- Practice problems.
- Focused mostly on DS&A today. Tried a new website for diversity, geeksforgeeks.
- Good summary of everything: https://www.geeksforgeeks.org/must-do-coding-questions-for-companies-like-amazon-microsoft-adobe/.
- In general, geeksforgeeks has pretty good tutorials/descriptions. The text is better, the IDE/leaderboard/etc is worse.
- Poison bottles and rats to test. Each iteration, have the rat drink half the bottles. Reduces it to binary search basically, logn.
- Binary trees. Find max width, height, sum of columns, etc. Just do -1 for each left traversal, +1 for each right traversal. Keep in a global hash table or something. Then you can run whatever stats you want on it.
- Ended up switching back to leetcode. Geeksforgeeks has a bad interface. It requires you to supply your own input/output reader/write, rather than making it class or function based with prefilled templates.
- https://leetcode.com/problems/continuous-subarray-sum/. This one wasn’t easy. Find modulo of cumsum, check to see if two are the same, which means the cumsum between them is divisible by the target.
- https://leetcode.com/problems/baseball-game. Easy.
- https://leetcode.com/problems/top-k-frequent-words. Easy. Remember you can heap for nlogk instead of nlogn.
- https://leetcode.com/problems/spiral-matrix. Walk 2d grid in spiral. Just define the 4 directions, loop through them. Take whole row, take first or last item in all rows, and reverse as necessary.
- https://leetcode.com/problems/median-of-two-sorted-arrays/. Didn’t do it, just reread. This is a great article: https://medium.com/@hazemu/finding-the-median-of-2-sorted-arrays-in-logarithmic-time-1d3f2ecbeb46. The problem is way out of scope for an interview.
- https://leetcode.com/problems/longest-palindromic-substring. Did the “expand around all centers” way.
- https://leetcode.com/problems/trapping-rain-water/. Trapped rain water, iterate once from left and once from right. Calculate max and deltas, then compare and take min. That’s the trapped water.
- https://leetcode.com/problems/rotate-image/. Rotate image. If you’re allowed to use extra space, just fill in the new matrix. Write a couple down to see the pattern. New value is just (j, n-i-1). To do the rotation in place, go onion layer by onion layer and do 1, 2, 3, 4 = 1, 2, 3, 4.
- https://leetcode.com/problems/jump-game. Did DFS. There’s a much cleaner solution, basically start at the end and check if you can reach this position, then iteration back through the whole range.
- https://leetcode.com/problems/merge-intervals. Good example of making assumptions and testing them out. Did it vocally. Would have clarified with interviewer if live, rather than breaking/fixing with testcases.
- Applying locks, mutexes, semaphores to a concurrent program in order to avoid race conditions: synchronization.
- Priority inversion: when a high priority tasks is waiting for a low priority task to finish. Priority inheritance: when the low priority tasks becomes high priority bc the next job in the queue is high priority, to make sure nothing takes it away.
- In python2, xrange was more memory efficient than range. It wouldn’t create the full list, it would lazily generate them as you need. In python3, range is xrange.
- Python’s heapq.heapify will take a list and make a heap in linear time (in place too!). There’s also nlargest, nsmallest.
- pop(0) for first element, pop() for last.
- collections.Counter(arr).mostcommon(k) is very useful. It doesn’t have a nested sort, after count it’s arbitrary. To enforce alphabetical or something on the second level, just write it yourself.
-
- Letsencrypt sent out deprecation notices for ACMEv1. Starting June 1, only ACMEv2 standards will be accepted as sufficient certs. I’ll have to upgrade the
jrcs/letsencrypt-nginx-proxy-companion
(running certbot) before then.
- Smoked the 15lb dry-brined prime brisket. Garlic, mustard, paprika, pepper. Post oak and cherry. Honey.
- Ran out of 18″ aluminum foil. For a big cut like brisket, it definitely leaks with 12″. Ordered more.
- Plaid connects apps like venmo, robinhood to accounts like citi, boa. Visa just bought plaid for 5.3b.
- I get python weekly and founder weekly, but I’m subscribed to nosql and never get that newsletter?
- Airtable makes a kinda cool product. It’s a database/spreadsheet combined. Standard backend but offers a full frontend gui like google sheets. This exists in many stacks, but it’s usually 2 products.
- Kadane’s algorithm for max some of contiguous subarray.
- Basically at index i, consider all the subarrays up to index i, starting from every index before it. Get the max sum of those. Go to the next index. It’s just the previous answer plus the number at the new index, or the previous answer itself.
- local_maximum[i]= max(local_maximum[i-1], array[i] + local_maximum[i-1])
- Simple recursion problem.
- Rewrote some graph traversal algorithms. Raw BFS/DFS easy. For pre-order and post-order traversal (types of DFS), use recursion. Just add the node to `visited` before or after the recursive call. For level-order (BFS), use iteration.
- For python remember the distinction between the subprocess and multiprocessing modules. Subprocess is for calling other programs, running something else from the shell, etc. Multiproc is for adding concurrency to your python program.
- Boston is ~200 miles northeast of Manhattan.
- If you’re alive, you can breathe. If you can break, you can speak đ
- Threading is better for IO-bound tasks. Multiprocessing better for CPU-bound tasks.
- Read quite a bit about py3’s async capability. https://realpython.com/async-io-python/.
- Slightly different than the parallelism/concurrency of threading/multiprocessing. Its async features (like asyncio.sleep()) are self-aware of when they’re idling, efficiently giving the event loop to another coroutine during that time.
- Think of a standard web service, able to handle multiple requests in parallel. Asyncio is that capability, but for your program. It’s very different than multiprocessing, which has a given workload to do and wants to split it among many workers to make it complete faster.
- async def foo(): tells python to create a coroutine. Within that function, say await bar(), instruct the event loop to go work on something else until bar() returns.
- Websockets.
- These build on that exact same async infrastructure. Before py3’s asyncio, there was eventlet and gevent. Very similar principles.
- You basically have a route that’s decorated slightly different than a REST route, and it keeps a websocket connected to the client. Every time a new message appears, it will perform the logic in that route. Example: flask-socketio. Just @socketio instead of @app.route.
- You can also go low-level yourself and define an explicit `async def my_coroutine` with `await websocket.recv()`. The websockets module comes with a very basic server that can persist the connection. https://websockets.readthedocs.io/en/stable/intro.html.
- No matter what, this manifests itself eventually as a data structure in python. Think of it like a list, in memory, that an async service is continually appending to under the hood. Do whatever you want with it, it’s just a regular old list to your python app.
- Capsaicin is spread by birds. You might think seeds from chili plants don’t get spread around by animal scat because they don’t want to eat them, and you’d be right; with the exception of birds, who don’t have teeth and just swallow the seeds.
- Watched Shimmer Lake. Dwight!
- Worked again with javier, jack, and tara for the next rounds.
- Scheduled google for friday.
- More general prep. Checked linked profiles of Monday’s Amazon interviews. Watched about 10 system design videos, each ~30min.
- Lots from this channel for system design: https://www.youtube.com/channel/UCn1XnDWhsLS5URXTi5wtFTA.
- Locks.
- Even if the transaction isnât a single db statement (like a deposit) but has multiple steps (like a transfer), you can still lock. A transfer is a withdrawal + deposit + commission, for example. If ANY of those 3 steps fails, have it undo the whole chain.
- You can queue transactions such that if A has a lock while B tries, B will try again in a few seconds when the lock is freed. This is called retry logic. You can write it with TRY statements in SQL!
- The full queue of transactions to write is called a write-ahead log. Like a task manager built into the database.
- Most locks will only lock the scope that’s being changed (only that row). Some locks will disable write but allow read during the change. Some locks will disable both.
- Pessimistic lock = the standard one youâre thinking of. Lock a row, change some data, release the lock. The safest way. Better for joint sets of data, many conflicts.
- Optimistic lock = check the timestamp / version / hash at the start of the transaction, and then again when youâre ready to write back. Only write back if theyâre the some. Itâs less restrictive, and allows multiple people to read, but it basically favors fasters changes. Slow changes get clogged. Better for disjoint sets of data, few conflicts.
- Locks can be distributed, just like the databases themselves. You can have a cluster of redis nodes (direct duplicates for backup), each storing the mutexes for writes to the primary cluster. microservicenodes -> lockmanager -> caches, just like client -> loadbalancer -> microservicenodes.
- Locks have timeouts to ensure some dead node doesnât halt the system.
- Put timeouts on everything! Responses, db locks, everything.
- A single char usually takes up 1 byte, so a string of len 4 is the same as an int for a 32bit system. A full sentence, maybe 100bytes. A paragraph, maybe ~1KB. A large doc, maybe 1MB. Compression helps too.
- CAP = consistency availability and partition tolerance.
- P is how the system reacts when nodes canât talk to each other. Examples: two shards drop a connection, master fails the replication to the slave….
- The whole point is that in the presence of partitions (distributed systems), you can only choose one of (consistency, availability).
- Master-slave is AP. You might have inconsistency, where a write replication might take longer than the next read.
- Many equal nodes is CP. They might handle different tables (vertical partitioning), but they back each other up. This is a distributed datastore. Banking. REST. Locks on everything.
- Cassandra has something called “replication factor” which is how many other nodes to split the backup of a node on.
- Distributed transactions.
- Have the microservice itself (or a separate orchestrator/coordinator) keep a record of all planned queries/writes in the chain of the request. It needs to regard all of them as a single transaction. If any of the individual reads/writes to various other dbs fails, it needs to rollback the entire thing. You might hear this called “3-phase-commit” (can commit, precommit, do commit).
- Hash tables under the hood. You might think the keys are just assigned to registers, 0-2^32 or something. Then you hash the input and simply lookup by key. This would work, but if anything changed (like the # of total memory locations), you’d have to remap your table to the new modulo.
- Instead, just make the keys random numbers. Then during lookup, pick the first key that’s greater than your hash output. If you reach the end, just take the first element. Then, since you’re finding the closest neighbor instead of the exact key, it’s robust to remapping.
- While 50k requests/sec to a website is very large, to be considered âvery largeâ in the world of queries/sec you must go even higher. Large apps are over 1 million queries/sec.
- Caches.
- A global cache can be as big as terabytes for large websites.
- Cache latency should be <10ms.
- LRU is a very common cache eviction policy.
- Under the hood, a cache is just a message queue with an event loop that has a threadpool at its disposal (and access to ram).
- Fault tolerance in a cache is usually just regular snapshots to disk.
- Since caches are typically event-driven, you can also take the logs and just replay all the actions.
-
- Watched I Think You Should Leave with Tim Robinson. Some good sketches, some average.
- LSU national champs.
- Bought, trimmed, brined a $15 prime brisket.
- Since I have the tax/house loot sitting in my BOA, I’m temporarily eligible for preferred rewards. Not gonna get another card or anything, but there are some small perks like no ATM fees at non-BOA ATMs, etc.
- Paid the $53 parking ticket for the garage next to tower 12 last friday.
- HDFS = hadoop distributed file system. Part of the Apache Hadoop suite.
- Did a cloud product comparison review between the gigantic stacks of AWS vs Apache vs Azure.
- While I was working on the Netflix takehome assignment, the position was closed (over the weekend). Dunno if that meant filled or headcount-cancelled. The hiring manager emailed personally and said that she liked me and wanted to personally forward to another hiring manager. I looked through all the open engineering reqs in Los Gatos / Los Angeles and asked for the python runtime group.
- I really do like 23andMe.
- Forget privacy, murder charges, general data paranoia. Accountability is good. More data is good. And this data is about as fundamental as it gets.
- Leads to connections. Leads to drugs. Leads to wellbeing.
- It’s a fantastic business model. You’re paying them to give them data, which is their actual product.
- Average US credit score in 2019 was >700. Higher than I thought it would be.
- Google and FB building little towns of housing complexes near their headquarters to offer affordable housing (+ convenience, – work/life balance). https://medium.com/cxo-magazine/google-and-facebook-are-building-the-ultimate-perk-housing-3ec8ba3c4f6b.
- Current vegas money lines, niners and chiefs both favorites at -320.
- The most career interceptions is 81. There used to be a lot more. The most for active players is 35, Richard Sherman.
- Features like search autocompletion should have a response time <100ms.
- A trie (pronounced try) is a specific type of tree, where the nodes are characters.
- For the english alphabet with 26 letters, it’s just a tree with:
- Root node = null.
- Each node can have up to 26 child nodes, one for each next letter.
- Traversing down any path spells a word, (the path being the return string).
- With these properties, you can look up a word quickly. O(l+n), where l is the length of the prefix that you’re checking against, and n is the number of nodes underneath that trie subset (because you have to traverse all of them). Then you’d usually do a sort, for klogk more, where k is the length of the valid words you found.
- This is used for stuff like autocompletion / look-ahead text.
- Jeddah Tower will open this year and surpass Dubai’s Burj Khalifa as the tallest building, but then the Dubai Creek Tower should open the year after that and take the crown back. It costs over $1b. It will be over 1km tall.
- Remember to consider a timescale. If a particular computation is expensive, but the user doesnât need accuracy of the data down to a resolution of seconds/minutes, just compute it and cache it once an hour or something!
- Jeop GOAT day 4. Ken won!
- FTP = file transfer protocol.
- I didn’t know LinkedIn was the original creator of Kafka; Apache is simply the steward nowadays.
- Wanted to subscribe to highscalability’s RSS feed, but gmail doesn’t offer native support for RSS (anymore). This is crazy. I don’t want to have to create an account with a standard newsreader and use it as a middleman. I’ll just check the blog manually, but it’s a shame to have to do so in 2020.
- Bought a new food processor.
- Amazon called, basically said that hiring was a consensus, but they were considering bumping up from II to senior. They wanted to get more data points to confirm, so they requested a second, shorter, onsite interview with more senior developers.
- 2hrs instead of 6.
- Same format, half behavior and half technical for each hour. 1 would be another problemsolving question and 1 would be another system design question.
- The behavioral halves would be split among a specific 4 of the LPs: think big, invent and simplify, be right, hire the best.
- Overall, I feel good about this? It’s more work, but it’s the chance for an early promotion. 2hrs for a stressful onsite is easier than working for a promotion internally for a year. In the worst case, I hope there’s no offer retraction for the lower level if the second onsite goes poorly, for some strange reason. In general, I’m happy to see that the interviewers recognized some good principles and suggested higher leveling without my solicitation – the exact opposite of SpaceX :). Moving forward with gratitude, no matter what happens.
- Emailed Citadel and Google to let them know so that we can accelerate those tracks to stay in sync.
-
- 6 more leetcode problems.
- Remember that sets are mutable in python, and therefore are not hashable.
- If you need to solve a problem with nested lists/sets, and you need to check if an inner list is already in the outer list, it’s slow. It’s much faster in a hash table (set or dict).
- To solve, make the outer and set and the inner a tuple. You can leave the inner a list for whenever you’re modifying it, but then you can convert to tuple when checking/writing to the outer set.
- Amazon onsites. 6 1-hr interviews. Notes in drive. Enjoyed it.
- Jack mentioned that Citadel is still putting together the onsite panel, so it’ll likely be next week not this week.
- ZYME up 4% and TSLA up 11% today (to $530, w t f). ZYME had a -15% dip right after open, lasted for about 30min. Not sure what caused it, some afterhours behavior, who knows.
- Collisions. Two writes at the same time. Databases, both relational and in-mem, have locking mechanisms. Under the hood, there is a mutex that will keep transactions from overwriting each other (if itâs an ACID database). If you have two REST calls trying to book an airbnb, the db will confirm one as first and return the write transaction; the other will fail. The service should then tell that to the user.
- Remember if your DB doesn’t have native support for a date/datetime object (like redis), just convert to an integer via seconds-since-epoch.