• Petty released the sbsc winnings, $700 for 1st place, $50 for 10th. End of season party this saturday.
    • Path-planning is an np-complete problem. The traveling salesman is an np-complete problem. Dijkstra’s algorithm alone is solvable in polynomial time.
      • Traveling salesman is just visit every city in the country once in the shortest path. In other terms: visit every node in a graph once, where the edges can have different weights (like distances between cities).
    • ! to run shell commands from ipython.
    • Spoke with the google recruiter. Gave her my feedback from the other day, and combined with the interviewer’s fb, decided we’ll do a 2nd phone interview. This is good – I wanted another datapoint. We’ll do it next tuesday.
    • With py requests, can pass it in directly with ?key=value or via params={key: value}
    • You should typically use the rel links for next and prev when dealing with a paginated API, rather than just looping over an integer range and building the URL yourself. Sometimes the url will change.
    • Dentist and haircut.
    • Interesting report on the (primarily) medical state of psychedelics, and the effect on market investments we’ll see in years to come: https://www.reddit.com/r/investing/comments/emkhwt/2020_psychedelic_industry_insights_report/.
    • Went through a bit of highscalability. It really is a great blog.
    • Jitter can be a good thing. Sometimes it’s best to intentionally add a bit of entropy in. Helps avoid thundering herds.
    • Spent most of today on the Netflix take-home assignment. Private github repo.
    • breakpoint() is only available in py3.7 and beyond, it’s still import pdb; pdb.set_trace() for all vers before then.
    • Lots of system design practice today.
    • NoSQL vs SQL.
      • NoSQL is useful when you always need all information about that row. SQL better for select x, NoSQL better for select *.
      • Don’t need NULLs. Just skip keys in the nosql json. In this way, the schema is basically flexible per entry.
      • Good for analytics.
      • NoSQL is more expensive for updates, and doesn’t guarantee ACID.
      • NoSQL is more expensive for reads. While you can grab a whole blob easily, getting only one col across the whole db is slow.
      • JSON nesting is nowhere near as structured as the the relationships (FK, many-many, etc).
      • Joins are weird. Not easy.
      • Overall: if your data is always read/written in blocks, not individual cols, nosql is good.
      • Key-value is just key-value. Document nosql can have structure and nesting within (JSON usually).
      • Examples:
        • Document nosql dbs: mongo, couch.
        • Key-value nosql dbs: dynamo, redis.
        • Wide col nosql dbs: cassandra.
        • Graph nosql dbs: neo4j.
    • 10m call with Jack to set up the onsite with citadel, a few notes:
      • Will’s team is interested, but there are a few others as well (equities…). Interview might be at one site, but fulltime might be at the other.
      • You don’t need to have expertise in finance, but you need to show interest. The projects of last year are great examples.
    • ACID is a db transaction property for transactions. Atomicity, Consistency, Isolation, Durability. Financial institutions absolutely require this. Something like a metrics server wouldn’t.
      • Availability is another important metric, but isn’t part of ACID.
    • Can shard by ID, or by location, or whatever is the most convenient/divisive/common entry point.
    • JOINs get expensive if they’re cross-shard.
    • Master-slave architecture. Master gets all the writes, slaves update to match master every X sec/min/hours/days, slaves get all the reads.
      • If master goes down, one of the remaining slaves becomes master.
      • Still dumb that its formal name is master/slave in 2020.
    • Remember generators just return before they’re finished executing. Could be 3 lines straight with yield 1, yield 2, yield 3, but it’s usually a loop within the function. Just returns partial progress. Then you can iterate over the returns as they come, or call next(my_gen_func) to walk through manually.
    • Confirmed and planned the logistics for the amazon onsite on monday.
      • Need to print the NDA before then.
    • Consistency is very important. Do you want mutexes on everything read/write such that the data returned is always exact? This is much slower and much bigger. How much data can afford to be out-of-date? How much time must pass before data is considered obsolete?
    • If I scale a microservice horizontally, the machines need to talk to each other. RPC. If I scale vertically, it’s all local still. IPC. RPC is slower, and more complicated to manage. But it’s usually cheaper, and it’s boundless, and it’s more resilient to failure.
    • Remember: event-driven/pubsub/messagequeue/bus designs are worse with consistency, but better with availability. You’re distributing events and aggregating results. If you need higher fidelity responses, go with more of a request/response (timeout) design. Slower, but more integrity in consistent data.
    • Event-driven is easier to test, play, rollback, because it’s structured and sequenced. Kinda like react/redux.
    • In-mem solutions use snapshots to get persistence, but they’re obviously at a certain frequency and so you open risk for data lass.
    • Postgres has nosql features as well, it’s just used less commonly. You can store docs just like mongo.
    • Amazon 1hr onsite coaching call, general to all onsites.
      • Basic fundamentals. The one thing I usually have to remember is: Ask questions to clarify the problem. This especially applies to design questions, but can also apply to coding questions.
    • Amazon 30m final call, specific to me.
      • Schedule: (each 1hr)
        • 2 technical: Problem solving, DS&A.
        • 1 nontechnical with hiring manager. Customers, communication, dealing with adversity.
        • 1 lunch. Culture. Ask questions. Relax.
        • 1 technical: Logical and maintainable code.
        • 1 technical: System design. Scalability. Operational performance.
      • Each will have 2 leadership principle questions (behavioral) as well.
      • General:
        • Ask clarifying questions. Think out loud.
        • Start with an easy solution. Brute force, whatever. Then expand. Take bite-sized bites.
        • The interviewer is your coworker! If you start to panic, look at them like a collaborator.
    • Always consider priority. A bank withdrawal is much more time-sensitive than an email notification. This could affect the ordering in the message queue, the rate limiter thresholds on various notes, the load balancing, etc.
    • When realtime accurate data is cumbersome, you can trend or interpolate data, based on history, to give an estimate. This is true for something like the viewcount on popular youtube channels. Would take too much to show exactly, but is easy to show ballpark.
    • Thrashing – when a cache is inserting/deleting too quickly.
    • Going through this process of interview prep has been interesting. I feel like I’ve consumed what must equate to at least a handful of semesters in fundamental CS courses. This accessibility of information makes me wonder about the future of education. It’s been maybe 6 weeks and $0, whereas the university equivalent would have been a year and $30k. Will our grandchildren get in-person degrees? I’m not so sure this model is viable in the future.
    • Ordered more dry-erase markers for whiteboard practice at home. Arrive tomorrow.
    • Netflix splits every piece of content into small atoms, stored with different permutations of codecs and resolutions. That’s why viewing is seamless when your internet quality changes, or you select a new one; it will simply then fetch the right chunk instead of the whole movie again. The amount of chunks it preloads into the future is based on heuristics, by how much people typically jump during this movie. If a lot -> preload only a little. If a little -> preload a lot.
    • Did a practice system design from scratch for instagram/twitter.
    • It looks like neither grubhub nor yelp do group carts anymore? I remember loving this feature. You’d just send the link to anyone, they’d look through then menu and add, then be responsible for the finances individually when the order was sent.
    • You could store images in a db as blobs (binary large objects), but it’s almost always still best to put them on a file system elsewhere (CDN, s3, etc) and keep only a URL to the file in your db.
    • A client-server relationship is always initiated by the client. Request -> response. The protocol for this is http.
    • A peer-peer relationship can be initiated by either side. You can use a general socket (TCP) or a protocol specific to your use case. For chat (whatsapp, fb messenger, etc) the protocol is XMPP – extensible messaging and presence protocol.
    • It’s ok to have gut instincts. It’s even ok if they’re wrong. State them, and then state that you’re going to analyze them now. Do the verification out loud. A raw, transparent thought process is worth a lot.
    • Designing google search.
      • First, just think about keeping a big dictionary. As people add sites to the internet, it adds words and phrases to the dictionary. You have a rough mapping of what is out there.
      • Then think of it like a gigantic hash table, or an indexed db. Look for the word (or words) in the search, and return the direct results.
      • That might not be all though – look for misspellings, common associations, and other relatives. This is where association algorithms come in. They’re usually based on prior results. There are many other keys to consider: location, how recent, etc. There are also blacklists for known spam, risks, more.
      • Then, you have a list of possible options. How do you rank them? What do you show at the top of the list? This is another ranking algorithm. Again, usually based on prior results.
      • How do you manage the size of the dictionary? MapReduce. Basically shard the hash table by characters (since those are they keys that the user is typing), farm out to the all the appropriate nodes to their more-efficient subsearches, then combine the data back.
    • ~1 in every 7 strings typed into the google search engine has never been typed before!
    • Write-ahead logging is when the action is logged before it is performed. Think of it in the context of a database write operation – this handles the atomicity and durability principles (of ACID).
    • Third day of jeop goat.
    • Replication types. Consider a db mirror, or a master-slave.
      • Snapshot replications. Vompletely copy the full data over, like a backup or a pgdump.
      • Transactional replication. Replay the transaction on each slave.
      • There are more, but don’t worry a ton about them.
    • Think about if a system needs to be read-heavy or write-heavy. Something like twitter is much heavier on read operations.
    • An index is placed on a specific column in a table so that you can perform lookups on that column more quickly. It’s not the primary key, although you can think of the PK as an index if you know the ID, because that’s the search val. More generally, it allows an indexed column to be sorted efficiently on the backend, so that lookup is binary search (logn) instead of n. The downside: this extra data takes space.
      • Think of it like a dictionary (an actual physical book with word). Indexing it would mean adding pages for each letter in the appropriate place, with little tabs that jut out. You can find your section much faster, but they take a little space.
    • PUT will update the data in that spot or create it if it isn’t there. It’s idempotent. POST just passes data to the service. The service can do whatever it wants with it. It’s not idempotent. It’s more generic.
    • Citadel interview got a callback for onsites. Call tomorrow for scheduling.
    • YAML is a superset of JSON. Can do more, but is bigger. Looks better, too.
    • Testing pyramid: unit -> integrated -> e2e.
    • Netflix phone interview.
      • Great convo. Took a one-pager of notes. Just phone, no coding. 1hr.
      • Discussed the INTERNET SHIELD, market viability in the future of streaming, attrition, culture.
      • Her team directly aligns with my automation tools / devops / autotest / sx-setuptools experience.
      • Next steps: take-home assignment, build an app. I said I’d submit by Saturday.
    • DHCP for dynamic IP allocation.
    • Perf is a main linux profiler.
    • “Computers perform tasks. Machines should be solving problems.”
    • Practice problems:
    • Remember runbooks for ops/support. All stacks/apps should have a list of common tasks or inquiries, each with an explicit set of steps to respond with.
    • Slight tweaks to my resume. I still like my abridged resume more, although nobody else does. It’s just so concise: 20 bullet points to summarize my life so far.
    • TSLA is absolutely skyrocketing. Hit ~500 today. Its market cap is now 85b, which the highest for any american car company in history. I bought 2 shares on margin just to have fun with it. The emotional roller coaster is worth it, regardless of final profit/loss (since I jumped in ON a surge woohoooooo).
    • Remember margin trading on RH is interest-free up to 1k, then 5% after that.
    • Watched a lot of this guy’s channel on system design: https://www.youtube.com/channel/UCRPMAqdtSgd0Ipeef7iFsKw. I like em.
    • Athletes say a lot of dumb stuff on social media. It seems easiest to critique them for poor decisionmaking/tact, but in reality it’s probably closer to: giving a huge stage to people on a career path where stage presence is not a required skill.
    • Jeop GOAT day twoooo.
    • Google phone interview.
      • Was 45m. The guy was cold, didn’t really reply to anything or change tone. Accent was tough over the phone.
      • Didn’t introduce himself or the team, jumped straight into a coding question on the google doc.
      • We did a 2 subsequence comparison question. I solved it, but still had a bad taste in my mouth.
      • We didn’t discuss previous experience, scaling services, managing projects, writing apps.
      • Big turnoff for Google, when my interactions with all other folks from Amazon/Netflix/BMW/Citadel/GitLab/Disney have been very positive.
    • There’s an online shazaam service called acrcloud. https://www.acrcloud.com/identify-songs-music-recognition-online#record-div.
    • That old ACB call order which expired would not have broken even if it were filled. Coo.
    • Dis is still about the same plateau after the d+ surge, when I sold. Coo.
    • Was digging this article until they starting talking about “electromagnetic radiation” affecting your sleep, suggesting that you ground yourself before bed. Then on to light therapy and all sorts of good stuff: https://medium.com/swlh/sleep-like-a-pro-whats-the-deal-with-deep-sleep-and-how-to-get-more-46dad5da233d.
    • Interesting discussion in a drug from KRTX against alzheimers: https://www.reddit.com/r/investing/comments/elclxc/im_a_physician_and_im_long_krtx_dd_inside/. This sector is crazy, I like it. Being in the industry seems to provide a ton of inside info, more than relative to being inside others even.
    • Ricky Gervais’ opening monologue for the golden globes was great.
    • Practice problems:
    • Convertible arbitrage: taking a long position in a convertible security and a short position in the underlying stock. It’s a hedge tactic.
    • Coding interview with citadel. 1hr. Phone and coderpad. Prepped. Took a lot of notes during. 2 behavioral questions, 2 coding questions. Overall impression: good in both directions.
      • Said “see ya later” to hang up without even thinking twice about it, like I always do. Had some regret later; might have been interpreted as overconfidence or something. Need to pay attention to details, even ones this small.
    • Created a doc with all the resources I’ve used over the past month for interviews.
    • Generalized definition of a greedy algorithm/problem: start small and build the problem up, taking the (new) local optimum each time.
    • Went back over everything in gdrive/Notes/Career/*. Some fantastic refresher content in there. Mostly technical software, some workplace environment, some interview logistics. Being diligent about documentation my whole life has paid off.
    • Quicksort/mergesort/heapsort nlogn. Heapsort constant mem, others linear. Bubble slower time, const mem. Radix/bucket can be faster than the main 3 if n is small.
    • Anything hashable can be used as keys. All the immutable python built-ins qualify, including tuples (whereas lists don’t).
    • QR = quantitative researcher.
    • CDS = credit default swap. Kinda like an insurance policy on credit card debt, can get it backed by another investor for a price.
    • Supercontest.
      • Changed the home page (the root route /) to the leaderboard instead of your weekly picks.
    • Scheduled haircut for friday, before onsites.
    • First day of jeopardy GOAT tournament.
    • Explored http://highscalability.com/.
    • Looked up a BUNCH of resources on system design. I feel pretty good about behavioral and algorithm/datastructure questions now – system design is the last piece. Took a bunch of notes, did a few examples. Will focus on more in-depth ones over the next few days for on-sites, but I feel ready enough for the brevity of phones.
    • Went over 27 common “tell us about a time…” questions and wrote responses to each. I then tried a ~60s spontaneous response to each. There are definite overlaps. A good core of 10 responses can apply/stretch to all 30.
      • I did generics, and then I did concretes for Amazon’s leadership principles. They’re obviously the same set/class of responses, just need to have them in the toolbox ready to connect.
    • Pubsub research. Not great in consistency. Latency means timing-related transactions are not great. Subscribers might receive messages at different times. Better for event-driven architectures like games.
    • JSON serialization/deserialization gets less efficient as the size gets larger and larger. This is where you might pick another data transport for your API, like protobuf. You can have an endpoint return a protobuf object instead of a json object. Very similar.
      • Typically, use json when talking to a browser. Use protobuf when talking to a service.
    • Practice problems.
      1. https://www.hackerrank.com/challenges/max-array-sum/problem. DP. Iterate through the array and notice patterns. What info do you need to hold on to? What are the possible answers? Keep track of the max by index and just walk through.
      2. https://www.hackerrank.com/challenges/abbr/problem. Easy to get most the cases, but hard to get all. Remember for DP you can sometimes build a 2D matrix of 1s and 0s and traverse it (like longest common substring/subsequence).
      3. https://www.hackerrank.com/challenges/candies/problem. An interesting one. Iterate through an array, once forward, once backward, and count the rising edges. Match them by index and take the max.
      4. https://www.hackerrank.com/challenges/decibinary-numbers/problem. Didn’t do it. Wrapped my head around it, but didn’t implement.
      5. https://www.hackerrank.com/challenges/min-max-riddle/problem. A cool problem. What mapping do you need to solve this? Can find the INVERSE of that mapping?. Swapping keys/values in a dict is easy.
      6. https://www.hackerrank.com/challenges/castle-on-the-grid/problem. Used a graph. Build an adjacency matrix of the 4 cardinal directions from each node and remove the ones that are off the grid or on an X. Then it’s simply a shortest-path problem. Use BFS (queue).
      7. https://www.hackerrank.com/challenges/poisonous-plants/problem. Got the easy solution.
      8. https://www.hackerrank.com/challenges/ctci-bfs-shortest-reach/problem. When using BFS for shortest path, remember to slice the list for each neighbor, so you’re not overwriting the root path. And just logically think through everything. Created visited set() and queue deque(). while queue. Get path. Get node. If node == end, exit. If node in visited, continue. Else, lookup the next nodes in the adjacency matrix, add to path, then append to queue.
      9. https://www.hackerrank.com/challenges/find-the-nearest-clone/problem. Getting good at the graph problems. It was a less-experienced topic for me before this.
      10. https://www.hackerrank.com/challenges/ctci-connected-cell-in-a-grid/problem. Could do DFS or recursion or iteration.
      11. https://www.hackerrank.com/challenges/matrix/problem. Fun problem. There’s actually a bug in a unittest in this one.
      • Done with the hackerrank kit. I’ve probably done 100 practice problems not in total across the platforms.
    • Confirmed the phone interviews this week.
    • Dijkstra’s algorithm is used to find the shortest path between two nodes in a graph with weighted edges. If you assigned no weights to edges (such that they’re all the same, 1), it reduces to plain old BFS!
    • BFS is exponential (based on how many neighbors each node has) in time and memory. It’s great for small problems, but solving something with a state space like a rubik’s cube is huge.
    • Sets and lists are unhashable in python. For most problems, you can just convert the (i, j) coordinates to a string or something similar.
    • collections.deque is faster. This whole package is great for performance. defaultdict is awesome. Counter is also really useful for counting stuff (like frequency of char in str, or others).
    • When using BFS for shortest path, put PATHS (lists of nodes) instead of nodes into the queue.
    • Append the new nodes to the right of a queue or a stack. But in a queue, pop the node off the left to search next. In a stack, pop the node off the right.
    • Run, pull, sauna, meditate.
    • Checked the snail mail. Probably 100 items. Everything was complete garbage except for 1 dmv registration. The signal to noise ratio is laughable (and wasteful of paper / people resources).
    • Renewed the BMW registration. $150 total. The DMV charges a 2.1% fee for online processing. Fuck them. It should be a discount for online logistics, which have significantly reduced overhead.
      • Even if it’s Elavon, you need to front that.
    • For number systems, the exponents always increase from 0 to n-1. This is known.
      • For binary, the base is 2, so the digits in the number (the constants before the base-exponents) can only be those 2:
        • [0, 1]
      • For decimal, the base is 10, so the digits in the number (the constants before the base-exponents) can only be those 10:
        • [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      • I had thought about the base/exponent before, but never the limited range of digit values.
    • For previous-project questions, use STAR: Situation, Task, Action, Result.
    • My Amazon interview is actually at the Santa Monica location, which is more convenient! The hiring managers can pass offers between sites after, if need be.
    • Took a ton of “give an example of …” notes. Moved gdrive docs around, cleaned.
    • Database partitioning. Horizontal partitioning is when you keep the table the same and split groups of rows across different machines. Each is usually called a shard. Vertical partitioning is when you split columns off into a new table. Both improve performance, making it distributed (but more complex).
    • Collections.deque is pronounced “deck” – it stands for double ended queue.
    • Of course threads in a process have their own stacks but share heap. Separate processes get their own stack and heap.
    • Disjoint sets. Another useful data structure. Usually implemented as trees. Two operations: find() and union().