Mostly Software – Page 122

May 3, 2019

Puppeteer is a popular headless browser webscraper like selenium.
HN/MD.
Completely reorganized all of my files in Google drive. This took a couple hours.
Watched deliverance. It was hyped as one of the most disturbing movies of all time. It was average. The movie was 2 hours long and crawled. There were essentially 3 scenes: rape, river peril, and cliff climbing. It could have been 60 minutes shorter. It was neither shocking nor provocative. The banjo duel was the best part.
Ordered a single person tent and sleeping pad (totaling $125, but they’re good quality and reusable) for lightning in a bottle.
I get a couple recruitment calls a day. Considering taking my number off linkedin.

They’re starting this new tactic of “we’re looking for the best of the best for <X> role, can you recommend one of your colleagues who is looking for an amazing new opportunity?” They manipulate the tone by flattering your altruistic side, seeming informal and taking pressure off the real focus before circling back to your pursuit.

Supercontest.

Played around with multiple docker-compose, splitting the dev/prod envs into separate files rather than separate services within the same file. I prefer it the way I had it.
Had to recalibrate my consideration of the reverse proxy. I’m not adding an nginx container to then forward traffic to multiple other containers running nginx as the server frontend for other app containers. I’m simply taking the existing nginx and certbot containers that serve supercontest, and I’m swapping them for nginx-proxy and letsencrypt-ngnix-proxy-companion so that they can be the webserver for multiple service domains (running in separate other containers).

Those two containers are therefore abstracted outside of the supercontest project. They probably won’t be in a compose file. They’ll just be a command to run prior to starting the supercontest containers, with the sc compose file just having the app and the db.
This is a wonderful compartmentalization, in both technical and general terms. Service administration has advanced wondrously in the past decade.

Created a repo for my rc files and uninstalled dejadup: https://github.com/brianmahlstedt/config.

May 2, 2019

Supercontest

Remember, you have to docker-compose down before running the tests on the host. Otherwise, the app will not be able to start (socket already in use).
Added tests for http auth and the graphql endpoint. CSRF is disabled in the python tests, so I gave a full client example in the readme.
Stopped using relays and connectionfields, opting instead to just use graphene’s List type. You lose some automatic pagination and other cursor capabilities, but querying becomes a lot simpler – you don’t have to specify edges and nodes every time. As it grows, switching back would be very easy.
Deployed to prod and closed #41 after merge. The `make deploy` convenience target works.
Graphiql in prod was missing the csrf token. I ensured SC_DEV worked and the csrf protection was initialized.
Changed the makefile recipes for ansible to use –key-file at the command line instead of requiring that ssh-agent and ssh-add be run before.
Added csrf_protect back to the app in dev mode, then exempted the graphql view. This is just wrapping it with the func (instead of the decorator), like login_required(csrf_protect.exempt(GraphQLView.as_view())).
Right now, my graphql resolvers just return everything. I don’t add nodes and fields for people to filter on certain values. I can later, but right now I just expect clients to handle it all.
Emailed the group with the graphiql link and the python query example.

docker top can be used to check processes in the container.
You can check the environment of an already-running process with `cat /proc/<pid>/environ`. Super useful.
Removed the ps1 stuff from tmux (it was exiting immediately on startup) and removed the ssh-agent stuff from .bash_profile (I explicitly call out the ssh key in my makefile calls to ansible).
Nginx reverse proxy.

Multiple websites in different containers on the same machine.
First create the umbrella network on the host: docker network create nginx-proxy
Then start the reverse proxy container which uses this network.
Then start the actual service containers with this network as well. These must have 3 things in their docker-compose yamls:

Expose port 80 on the service
Add the nginx-proxy network
Add the VIRTUAL_HOST env var for the domain.

There’s a companion container to handle letsencrypt as well.

Add a few more volumes to share certs between the containers.
Add the LETSENCRYPT_HOST (and _EMAIL) env vars for the domain.

Didn’t finish this, but will resume tomorrow.

Generalized my digitalocean project name and droplet name to MyProject and MyDroplet, since they host multiple services. Added the A and NS records for my second domain (bmahlstedt.com), so digitalocean’s nameservers direct to both my domains instead of godaddy. The registrar update it within like 60 seconds, much faster than last time.

Since I haven’t finished the reverse proxy yet, https://bmahlstedt.com points to the supercontest application lol. This makes sense, as I’m not VIRTUAL_HOST routing the traffic by domain yet.

This could be used easily to point multiple domains at the exact same service/site.

It’s also not ssl trusted and red, which makes sense since I haven’t certified this domain yet.

May 1, 2019

Bazel.

WORKSPACE file defines the root. File(s) named BUILD within that root defines the rules, to point at the input source and define the outputs. You can have multiple BUILD files. Each defines a “package” for bazel. They can depend on each other (need to add “visibility” in the build file), and each can have multiple targets.
bazel build //path-to-package:target-name
Say you have a .cc file that prints hello world. Building that target with cc_binary would add it to <workspace_root>/bazel-bin/main/hello-world, which you can then call whenever you want.
bazel-bin, bazel-genfiles, bazel-out, bazel-* are all just symlinks (in your workspace root) to ~/.cache/bazel.
You can query dependencies of your targets: bazel query –output graph –nohost_deps –noimplicit_deps ‘deps(//main:hello-world)’
Installed graphviz and xdot, common viewers for many things (including bazel dependency graphs).

http://www.webgraphviz.com/ is an awesome browser viewer, just copy the text output from the command line. Or, pipe it to xdot at the command line.

The value here is the entire tree. Everything is a file, and the entire dependency graph is known. Therefore, building outputs (binaries, whatever) can be optimized. When outputs need to be rebuilt, only the inputs that have changed need to be rebuilt.
For a language like python that isn’t built (compiled) manually, but rather interpreted, this has a lot less value. There are four standard python targets. py_binary, py_library, py_test, py_runtime.
Looked up some more python/bazel suggestions, watched https://www.youtube.com/watch?v=9mhmGcR6CPo.
Ultimately, not using this for supercontest or any of my other projects. Simple GNUmake and sx-setuptools are wonderful.
There is value in a monorepo setting, but the hardest part is getting the dependency resolution down to the file level instead of the python package level.

This becomes impossible fully, because third-party packages will be vendored and you can’t specify all of those down to file.
If third-party packages started defining as bazel packages instead of python packages, we could get somewhere.

This is all an attempt to define a language-agnostic packaging standard that ultimately just defines file inputs and file outputs.
Bazel users absolutely love the word hermetic. It means airtight, people.

Remember, compiling is just translating to a lower-level language (like assembly, bytecode, machine code…).
Some nix reminders.

inode is a data structure. It stores metadata like owner, perms, parent dir, last modified, etc. It does not store filename or the actual data in the file.
Hard links are basically copies. They contain the data. Can only hard link files, not dirs. Same inode. Must be on same filesystem.
Soft links (symlinks) are basically shortcuts. They do not contain the data. Can soft link dirs or files. Different inodes. Can cross filesystems.

To nest bullets in github markdown, leave the hyphen and just put 4 spaces in front of it.
$PS1 is a linux variable that defines the custom shell prompt. It’s different within tmux vs outside, hence the lack of color. Tried the top 5 solutions to fix this, none worked. Messed with a ton of bashrc and tmux.conf.
Nginx can directly serve multiple websites (domains) from the same machine. If you are running your services in a container, then you can also use nginx on the host as a reverse proxy to forward traffic to the appropriate containers (where nginx again can be the server for the app-specific request).
Bought bmahlstedt.com for $21 (2yr contract) through GoDaddy, same as southbaysupercontest.
If a website tells you to disable your adblocker, you can often just set style=”display:none;” on the banner and then change the background color back to white or increase brightness.
GraphQL.

There are a few places in my application where I translate an email to an ID, an ID to picks, picks to scores, etc. GraphQL should be able to help quite well with this over-fetching that REST is vulnerable to.
Was created at FB in 2012, earlier than I thought.
graphene and graphene-sqlalchemy are two python packages to aid in use with graphql models. flask-graphene is the extension to add the /graphql view. gql is the client.
Added the graphql view, with the query schema wrapped around my existing user/pick/matchup models.
Created the environment variable SC_DEV and set it to 1 in docker-compose for app_dev. This skips csrf protection and enables graphiql in the browser.
Wrapped the view_func with login_required() for add_user_rule, rather than decorating it like a normal route. You now need to login to hit the graphql endpoint, even programmatically.
In graphiql, ctrl-space will autocomplete with an option dropdown. ctrl-enter will execute the query.
You can then query from the command line with curl at /graphql?query=<>
You can then query from python with gql.
Since the app has direct access to the database, sqlalchemy is fine to perform internal app queries. To go through graphql for the app itself would be weird and inefficient: python -> http through view -> python.
I am intentionally not adding mutations. This is a read-only interface for users to mess with the db.
Graphiql is an extremely useful interface for users to query the db. I had to do some fancy stuff to extend csrf/auth to the graphql endpoint, but I was successful.
Added two tests. One that verifies that you can auth with the app via basic requests + csrf token (rather than with selenium). The second auths verifies that the graphql endpoint can return data programmatically. This was simply achieve with json={‘query’: query} where query is a docstring with the same content you’d enter into graphiql. Didn’t end up needing gql (bc I couldn’t really use it without hacking my auth mechanism for csrf in).
Ended up enabling graphiql for production, since it’s protected by auth anyway.

Github offers an API to query their data with graphql: https://developer.github.com/v4/.
Medium obviously collaborates with freecodecamp.org and codeburst.io.
Alexa (not amazon) is another company that monitors internet traffic. They rank the most popular sites: https://www.alexa.com/topsites. In the US the top 24 are: google youtube facebook amazon wikipedia reddit yahoo twitter linkedin instagram ebay microsoftonline netflix twitch instructure pornhub imgur live craigslist espn chase paypal bing cnn
JWT = json web tokens.
Extremely using for programmatically repeating a manual browser request (like a login): open chrome devtools, perform an action, then go to the network tab, right click the request, copy as curl, then convert to python requests with https://curl.trillworks.com/.
It totally depends on the service, but selenium should be able to login for all because it’s closest to a real user. For direct auth with requests, the server can expect whatever it wants. Some require certain cookies (which you can get with a naked request then session.cookies.get_dict()). Supercontest requires a csrf_token to be passed with your credentials, that’s it. Make a request, save the csrf token from the response, the hit /user/sign-in with your creds and the csrf token.

Apr 30, 2019

MD. Articles on ML and React component testing.
Jasmine.

Ahhhh, played with jasmine a bit more. “$ is not defined” and such is because I was using jasmine-node. jQuery requires the browser’s DOM api, which node does not have. You have to use the html runner for jasmine (so you can add it as a usual script, from a cdn or otherwise).
Here is a library that allows you to load html/css/other fixtures into the jasmine context: https://www.npmjs.com/package/jasmine-jquery.
You could still just run tests with a headless browser to exercise the frontend, which I’m already doing. It’s just a bit easier to unittest js functions directly with jasmine.
Opting to not pursue this further for now, coverage is already sufficient.

Installed pkg-config zlib1g-dev openjdk-8-jdk bazel
ipython was busted because of an incompatible version of prompt-toolkit (probably from the jupyter install yesterday), so I reinstalled that package.

Remember at the ipython prompt you can type ! before a command to execute something in the system shell (useful for pip installations and such).

Lightning in a bottle schedule was released. Didn’t really look at it much, just a quick skim.
Webscraping.

(preferable) Use an exposed API, with structure that is designed for programmatic requests of data.
HTTP with form assistance. Use a library like mechanize or robobrowser to handle all the browser defaults for the form fields you don’t care about (everything except un/pw). This handles session.
Pure HTTP. Use devtools to watch the network activity of your manual request. Copy all the form data. Use a requests session to persist returned cookies and other things across subsequent requests.
Selenium. Simulate the frontend and perform your usual data entry / clicking.

Pharma.

Logged into new.andanet.com and searched for some drug prices, then tried to fetch the data programmatically and serve it in an aggregate marketplace across a few vendors.
Mechanize doesn’t support py3 and robobrowser is newer, so I used that. Both robobrowser and pure requests would not return the login form. “Your current web browser session has been closed due to inactivity. Please login to create a new session.” This is likely because neither uses an actual browser, which is an interesting deterrent to programmatic interaction. Let me try with Selenium next.
Selenium was able to successfully log in. The next problem is the 2FA. It emails a verification code. I could automate this with Selenium as well, fetching the code from email and clicking the second submit, but then we’d need creds to the email account as well. Before trying that, I want to see if there’s an easy way to emulate a trusted device.
There are many parameters a server could use to determine a valid session. It’s more than just cookies. Could be MAC, could be timing-related, could be much more. It’s a hidden decision by the service that you couldn’t figure out without brute forcing. The client can’t know.
Instead, I’ll try changing the email to mine and actually following the the 2FA programmatically.
That worked!
webdriverElement.click() can be covered by other elements, display off, etc. To force the submission, use webdriver.execute_script(‘arguments[0].click();’, element).
It looks like I only needed to confirm 2FA once. Even if I close python and the session, selenium is saving something somewhere that is deemed by the anda service as the same trusted device. Could be IP, who knows. This is great news! Could be different for other vendors though.
Successfully webscraped through 2FA.
Moved this logic from a script into an app. flask-table. Just a search bar and an updating table.
Demo marketplace, with only 1 vendor, is complete.
Used query params (?q=<>) instead of variable args in the route. Just request.args.get(). Very simple and clean.
Updated link action based on the value of another (input) element, with just HTML. Easy.

Github shows file details now: i.e. what percent python, what percent css, etc.
ctrl-alt-shift-r will start/stop screen recording on ubuntu. Amazing. It saves to .webm files, which chrome can open and play.

Apr 29, 2019

The ribs yesterday ended up VERY spicy. The ghost pepper sauce (bhut jolokia) is 150k scoville heat units. I used a lot. Also made sweet potato pancakes properly today.
Game of thrones was good, but the resolution was too simple. No warging, no myths, no allegiance swapping, no surprises, bad battle tactics, no legendary explanations, no major deaths, etc. The night king isn’t apparently the primary antagonist. We’re gonna end politically with a human vs human war.
Python at Netflix: https://medium.com/netflix-techblog/python-at-netflix-bba45dae649e. Their CDN for all media is called Open Connect, but a lot of their service infrastructure is Python (load balancing, ML, automation, monitoring, alerting, infosec, marketing, metrics).

AWS for hosting, spinnaker for deployment, atlas for monitoring, genie for job execution, winston to pull it all together.
BLESS = their SSH certificate authority.

Jupyter

I’ve never really found a need for Jupyter notebooks, but they’re becoming more essential for data scientists.
Features

You can run code, visualize output, execute and tinker all the browser, and more.
You can basically created parametrized notebooks as templates to allow inputs and output configuration, across many languages (not just python!).
You can add notes, diagrams, and more surrounding the code.
There’s an API for data in and out of notebooks, for integration with actual services.

I installed it and played around (via pip into my sys python).

You can edit local files in the browser, run them, open a generic terminal. Useful stuff.
You can create a “notebook” which allows you to enter code and markdown together, as before (for reports, for datasharing, for notes, for more!). This saves as extension `.ipynb`.

Jasmine

Comparison to python unittesting

describe = class (like `unittest.TestCase`)
it = unittest (like `def test_x()` – these are also called “specs”)
expect = assert (there are a ton)
beforeEach beforeAll afterEach afterAll = setup setupClass teardown teardownClass
self = this (persists across tests)
spies = mocks

`jasmine server` allows you to iterate your tests in the browser. `jasmine ci` uses selenium to run for builds.
While jasmine focuses on being a test framework (for describing the tests, like unittest), karma is a very popular js testrunner (for execution of the tests, like nose).

Supercontest

Separated all python and js tests into distinct make recipes. The python suite still enters with tox, while eslint and jasmine run for js.
Globally installed nodejs and npm (although not using for anything right now).
Added the jasmine infra and a basic test. My js isn’t very complicated so I was intentionally light here. I could organize some of the event callbacks into separate functions to be tested, but it’s fine in scope for now.
Closed #20 after lots of testing! Pylint, pytests, eslint, jasmine.
Modified my app dockerfile to use joyzoursky/python-chromedriver directly instead of pulling the python base image and installing chrome/chromedriver/selenium in the dockerfile.

Apr 28, 2019

A king cobra bite can kill an elephant!
Bought and prepped 2 spare rib racks yesterday. Smoked them today for game of thrones. Getting much better at trimming, especially with focus to both the tips and the st louis cut.
Shopped for motorcycles at a few shops with jcriss yesterday, he brought a grommmm.
WAF = web application firewall.
BitMitigate is a cool service that provides ddos protection, waf, cdn, and more.
To get the thumbnail from a youtube video, just enter the ID into: https://img.youtube.com/vi/<>/maxresdefault.jpg

Apr 26, 2019

Supercontest.

Added pytest-cov. 45% overall. Not bad! Most of the core functionality is covered.
JS lint/tests.

Activate your python venv, then pip install nodeenv, then run nodeenv -p. This creates a virtualenv for node modules integrated with your current virtualenv for python modules. Gives you the `npm <>` command.
./node_modules instead of ./venv. Has a bin and all the usual. This should be gitignored. The package-lock.json should not.
`npx` can be used instead of `npm` at the command line. This will fetch packages as needed into tmpdirs and run them.
Instead of coupling it with python/tox, you can use `nave` to create a pure node venv, and then add make targets to execute whatever test commands (instead of tox).
You can also run through npm itself, by defining a lint task in package.json (setup.py) and `npm run lint` (pip run lint). I’ve never like coupling the package manage with the admin commands. An external infra like make is better for this.
Ran npm init to create a skeleton package (required for style guide).
Ran eslint –init to create an eslintrc.json.
Ran a full eslint evaluation! 113 errors in my 4 simple files. Extending Google style guide. Common fixes I performed:

Single instead of double quotes.
Const instead of let for vars that are never reassigned.
No double space before inline comment, like Python.
Regular line indents are all 2 spaces. Indents for line continuation are 4 spaces.
No space between function and ().
Trailing common after the last item in sequences.

jasmine init, then added the spec yml to vcs.

Creed from The Office is performing at Saint Rocke on June 29: https://www.ticketweb.com/event/creed-bratton-saint-rocke-tickets/9166055?pl=saintrocke. It’s not comedy, it’s music: https://www.youtube.com/watch?v=Vt6kcF-PIsE.
HN. Some mild interest this week.
Removed the js blocks from vimrc. Didn’t like the folding.
JS speed is important, obviously, because of clientside implications. Hence the preference for 2 spaces instead of 4. JS typically isn’t compiled or compressed, so this small percentage makes a little difference. Also, with tons of callbacks, 4 spaces can visually get way too indented.
PharmaDB

Talked a little with Art about aggregating prices into a central marketplace, like Kayak for drugs.
Currently, technicians spend a lot of time shopping around for the lowest prices. Pharmacies (or their parents) have contracts with vendors which change the prices. These are reflected in the account, and work automatically with the creds to log in to the vendor site.
Some common vendors: AmerisourceBergen, Cardinal, HD Smith, Anda, Parmed.
Medicaid hosts a database with (I assume) average prices: https://data.medicaid.gov/Drug-Pricing-and-Payment/National-Pharmacy-Pricing-Database-xls/uima-szn8.
Gonna try to get a demo account (can only see prices, not order) to write an app that fetches all this information and aggregates it in a simple marketplace platform.

Sugar caramelizes at about 340F. If you’re doing low and slow ~200, it’s fine. I prefer a bit of brown sugar in the texas crutch. And don’t worry about bbq sauce on ribs if you’re cooking low. Do not use sugar/bbq sauce for hot and fast.
Prepared the rib recipe for sunday. Going to focus on the rib tips this time instead of the st louis cut (still doing full spare racks). That’s cartilage in the tips, not bone!
Lots of info came for lighting in a bottle. Skimmed most of it.
Pivotal, the company Rachel came from, makes RabbitMQ and Jasmine. They’re owned by Dell. I knew neither of these facts.
Csslint is also a thing.
Finished the suspiria remake (2018). I watch a lot of horror movies, and I loved how different this one is. That ENDING whattttt.
If you have chrome devtools open on a tab, you have a LOT of data. If you login to a site, the network tab shows you the form data of the request, which literally contains your plaintext password. This can be recorded (for that tab only). After that, it’s obviously stored in a cookie so that subsequent requests don’t contain the password. Still, given all that information, another client could copy and imitate that request.
Sharks and Warriors playoff games.
Remember, TDD = Test Driven Development and BDD = Behavior Driven Development.