• Monday

    • Peter Thiel’s stanford talk.
      • Competition is for losers.
      • Be the big fish in a small market, then grow the market (rather than enter as a small fish in a big market).
    • 3 on restaurants/drugstores. Temp benefits with doordash+lyft. 200 after 500 in first 3mo.
    • Read a bit about fund fees, different bonds, windfalls, tax, more while making fried chicken (remember cornstarch + baking powder for extra crunch).
    • Finished aws reinvent.
      • Deep dive on Amazon EKS.
        • Vanilla k8s, other than the occasional security patch. They manage nodes, os, kubelet, cri, ami (plus the full standard k8s control plane). Upgrading cluster k8s version is turnkey (then test).
        • AWS Key Management Service, AWS certificate manager, AWS Secrets Manager. No need for vault or others.
        • IAM roles to service accounts, then runs pods as that service account (not 1-1 mapped SA <-> NS). RBAC and cluster separation for different tenants, of course.
        • No ssh, access nodes through AWS Systems Manager.
        • They scale the cluster level, even up to 10s of thousands of nodes. Also intelligent scaling of stuff like maxRequests. They have a custom next-gen autoscaler called Karpenter (although HPA/VPA/CA are avail). https://github.com/aws/karpenter.
        • Supports opentelem, fluentbit, prometheus, many for obs. Flux for gitops (although argo better).
      • Kubernetes at AWS: Strategy, road map, and vision.
        • Combining with AWS Outposts, remember you can run EKS on-prem.
        • Supports all the usuals. Helm, spinnaker, istio.
        • Bottlerocket is the aws os built specifically for containers.
      • Up-level your container image security with the latest from Amazon ECR.
        • Managed container registry. Integrates with all the expected AWS services, eks ecs ec2 fargate. Currently ~8b images downloaded per week.
        • Scalable, storage-wise. Runs security too, scanning all images (Amazon Inspector). This runs continuously, not just on push. Not just OS, but all layers of images. And can sign/verify images.
      • Delivering code and architectures through AWS Proton and Git.
        • Proton handles IaC (terraform), pipelines (jenkins), and observability (prometheus). Offers templates for devs to start, apply, verify against, and deploy.
        • Some template verification offerings as well.
      • Amazon builder’s library: Operational excellence at Amazon.
        • Sending “Ops Win” emails, good milestones with large audiences to encourage ops culture.
        • Retros and postmortems, regularly.
        • Go over prepared content together, but also have free time to look at graphs as a team. Open your datadog dashboards and ask for insights.
      • Best practices for securing your software delivery lifecycle.
        • Security testing (in your CI pipelines) must include static and dynamic security analysis.
        • AWS CodeArtifact, AWS CodeGuru, AWS CodeBuild, AWS Parameter Store, AWS CodeDeploy, and AWS ECR (aforementioned container scans).
        • Integrate with CodeReview interface, and use ML to automatically suggest changes.
      • How to reuse patterns when developing infrastructure as code.
        • AWS Cloud Development Kit and AWS CloudFormation. Competitor to Terraform.
        • Create reusable/sharable components/modules.
      • Automating cross-account CI/CD pipelines.
        • Not much.
      • Using feature flags to avoid downtime during migrations (LaunchDarkly).
        • New features, debug logging, heavy loadpaths, subjective canaries, switching databases, more.
      • Slack is the digital HQ for AWS developers and DevOps teams (Slack).
        • Incident management, organized involvement, anyone can spectate. Much agreed.
        • Standups, PR activity, workflows, emoji responses, everything else. All known, but all good.
      • On AWS, details matter: Why full-stack observability wins (Splunk).
        • Extract with OpenTelemetry. Primarily traces, metrics, logs. SDKs for languages, collectors, ui, infra.
        • Processors/filters to exclude secrets.
        • Investigate path: Errors/latency at the mesh -> zoom into node/app/service -> traces and find bottleneck -> check logs for that.
      • Intentional and empathetic observability (Datadog).
        • Don’t start with a ticket titled “Create dashboard”. Start with a metric of concern and focus on it, deliberately for outcomes.
          • Kinda agree. Anomaly detection and generics are much better at insights now than they used to be. Specific aren’t always better, and are often biased.
        • Put dashboards in pager description, etc. All the expected. Need a consistent reaction, especially with newhires.