#NoDeployFriday: helpful or harmful?

Leave a comment
“But it was just two lines of code!” Credit: Wikimedia Commons.

A fun tweet, no?

Well, maybe not.

Should there be particular times in which production deploys are forbidden? Or is #NoDeployFriday a relic of a time before comprehensive integration tests and continuous deployment?

You may face a similar dilemma in your team. Who’s right and who’s wrong? Is not deploying on a Friday a sensible risk-averse strategy or is it a harmful culture that prevents us from building better and more resilient systems?

Ring ring

I’m sure that engineers who have had the pleasure of being on call have had their weekend ruined by a Friday change that’s blown up. I’ve been there too. That robot phone call strikes during a family outing or in the middle of the night stating that the application is down. After scrambling to the computer to check rapidly filling logs, it becomes apparent that a subtle edge-case and uncaught exception has killed things. Badly.

However, on investigation, it is discovered that there were no tests written for the scenario that caused the failure, presumably because it wasn’t thought possible. After a variety of day-ruining calls to other engineers to work out the best way to revert the change and fix the mess, everything’s back online again. Phew.

A Five Whys session happens on the following Monday.

“Let’s just stop deploying on Friday. That way everything’ll be stable going into the weekend, and we’ll be around during the week after all of our deploys.”

Everyone nods. If it hasn’t hit production by Thursday afternoon, it’s waiting until Monday morning. But is this behavior doing more harm than good?

As we all know, interactions on Twitter are often strongly opinionated. Although the logic behind forbidding Friday deploys may seem reasonable, others are quick to point out that it’s a band-aid to underlying fragility in a platform, caused by poor tests and bad deployment processes.

Some go as far to suggest that the pleasure of worry-free deployment is better than the weekend itself:

Another user suggests that feature flags are probably the solution to this problem.

This user suggests that risky deploys shouldn’t be an issue with the processes and tooling that we have available today.

Who decides?

These exchanges highlight is that as a community of engineers we can be strongly opinionated and not necessarily agree. Who’d have thought? It perhaps also shows that the full picture of #NoDeployFriday contains nuances of arguments that don’t translate too well to Twitter.  Is it correct that we should we all be practicing continuous delivery, else we’re “doing it wrong”?

One aspect is the psychology involved in the decision. The aversion to Friday deployments stems from a fear of making mistakes due to the time of the week (tiredness, rushing) and also the potential that those mistakes might cause harm while most staff are getting two days of rest. After all, the Friday commit that contains a potential production issue could end up bothering a whole host of people over the weekend: the on-call engineer, the other engineers that get contacted to solve the problem, and perhaps even specialist infrastructure engineers who mend corrupted data caused by the change. If it blows up badly, then others in the business may potentially need to be involved for client communications and damage limitation.

Taking the stance of the idealist, we could reason that in a perfect world with perfect code, perfect test coverage and perfect QA, no change should ever go live that causes a problem. But we are humans, and humans will always make mistakes. There’s always going to be some bizarre edge case that doesn’t get picked up during development. That’s just life. So #NoDeployFriday makes sense, at least theoretically. However, it’s a blunt instrument. I would argue that we should consider changes on a case by case basis, and our default stance should be to deploy them whenever, even on Fridays, but we should be able to isolate the few that should wait until Monday instead.

There are some considerations that we can work with. I’ve grouped them into the following categories:

  1. Understanding the blast radius of a change
  2. The maturity of the deployment pipeline
  3. The ability to automatically detect errors
  4. The time it takes to fix problems

Let’s have a look at these in turn.

Understanding the blast radius

Something vital is always missed when differences of opinion butt heads online about Friday deploys: the nature of the change itself. No change to a codebase is equal. Some commits make small changes to the UI and nothing else. Some refactor hundreds of classes with no changes in the functionality of the program. Some alter database schemas and make breaking changes to how a real-time data ingest works. Some may restart one instance whereas others may trigger a rolling restart of a global fleet of different services.

Engineers should be able to look at their code and have a good idea of the blast radius of their change. How much of the code and application estate is affected? What could fail if this new code fails? Is it just a button click that will throw an error, or will all new writes get dropped on the floor? Is the change in one isolated service or have many services and dependencies changed in lockstep?

I can’t see why anyone would be averse to shipping changes with small blast radii and straightforward deployment at any time of the week, yet I would expect major – especially storage infrastructure-related – changes to a platform to be considered more carefully, perhaps being done at the time when there are the least number of users online. Even better, such large-scale changes should run in parallel in production so that they can be tested and measured with real system load without anyone ever knowing.

Good local decisions are key here. Does each engineer understand the blast radius of their changes in the production environment and not just on their development environment? If not, why not? Could there be better documentation, training and visibility into how code changes impact production?

Tiny blast radius? Ship it on Friday.

Gigantic blast radius? Wait until Monday.

The maturity of the deployment pipeline

One way of reducing risk is by continually investing in the deployment pipeline. If getting the latest version of the application live still involves specialist knowledge of which scripts to run and which files to copy where, then it’s time to automate, automate, automate. The quality of tools in this area has improved greatly over the last few years. We’ve been using Jenkins Pipeline and Concourse a lot, which allow the build, test and deploy pipeline to be defined as code.

The process of fully automating your deployment is interesting. It lets you step back and try to abstract what should be going on from the moment that a pull request is raised through to applications being pushed out into production. Defining these steps in code, such as in the tools mentioned previously, also lets you generalize your step definitions and reuse them across all of your applications. It also does wonders at highlighting some of the wild or lazy decisions you’ve made in the past and have been putting up with since.

For every engineer that has read the previous two paragraphs and reacted in a way such as “But of course! We’ve been doing that for years!”, I can guarantee you that there are nine others picturing their application infrastructure and grimacing at the amount of work that it would take to move their system to a modern deployment pipeline. This entails taking advantage of the latest tools that not only perform continuous integration, but also allow continuous deployment by publishing artifacts and allowing engineers to press a button to deploy them into production (or even automatically, if you’re feeling brave).

Investing in the deployment pipeline needs buy-in, and it needs proper staffing: it’s definitely not a side-project. Having a team dedicated to improving internal tooling can work well here. If they don’t already know the pressing issues – and they probably will – they can gather information on the biggest frustrations around the deployment process, then prioritize them and work with teams on fixing them. Slowly but surely, things will improve: code will move to production faster and with fewer problems. More people will be able to learn best practice and make improvements themselves. And as things improve, practices begin to spread, and that new project will get done the right way from the start, rather than copying old bad habits ad infinitum.

The journey between a pull request being merged and the commits going live should be automated to the point that you don’t need to think about it. Not only does this help isolate real problems in QA, since the changed code is the only variable, it also makes the job of writing code much more fun. The power to deploy to production becomes decentralized, increasing individual autonomy and responsibility, which in turn breeds more considered decisions about when and how to roll out new code.

Solid deployment pipeline? Deploy on Friday.

Copying scripts around manually? Wait until Monday.

The ability to detect errors

Deployment to production doesn’t stop once the code has gone live. If something goes wrong, we need to know, and preferably we should be told rather than needing to hunt out this information ourselves. This involves the application logs being automatically scanned for errors, the explicit tracking of key metrics (such as messages processed per second, or error rates), and an alerting system that lets engineers know when there are critical issues or particular metrics that have shown a trend in the wrong direction.

Production is always a different beast to development, and engineers should be able to view the health of the parts of the system they care about, and also be able to compose dashboards that allow them to view trends over time. It should allow questions to be answered about each subsequent change: has it made the system faster, or slower? Are we seeing more or less timeouts? Are we CPU bound or I/O bound?

Tracking of metrics and errors should also feed into an alerting system. Teams should be able to identify which signals from telemetry mean that something bad has happened, and route automated alerts through to a pager system. We happen to use PagerDuty for our teams and top-level major incident rota.

A focus on measurement of production system metrics means that engineers can see if something has changed as the result of each deployment, whether that change is for the better or for worse, and in the absolute worst case, the system will automatically let somebody know if something has broken.

Good monitoring, alerts and on-call rota? Deploy on Friday.

Scanning through the logs manually via ssh? Wait until Monday.

The time it takes to fix problems

Finally, a key consideration is the time that it will take to fix problems, which is somewhat related to the blast radius of a change. Even if you have a slick deployment pipeline, some changes to your system may be tricky to fix quickly. Reverting a change to the data ingest and to the schema of a search index might involve an arduous reindex as well as fixing the actual line of code. The average time to deploy, check, fix and redeploy a CSS change may be a matter of minutes, whereas a bad change to underlying storage might require days of work.

For all of the deployment pipeline work that can increase the reliability of making changes at a macro level, no change is equal, so we need to consider them individually. If something goes wrong, can we quickly make this right?

Totally fixable with one revert commit? Deploy on Friday.

Potentially a massive headache if it goes wrong? Wait until Monday.

Picking your battles

So where am I on #NoDeployFriday? My answer is that it depends on each individual deploy. For changes with a small blast radius that are straightforward to revert if needed, then I’m all for deploying at any time of the day or week. For any big, major version changes that need their effects carefully monitored in the production system, then I’d strongly advise waiting until Monday instead.

Fundamentally, whether or not you deploy on Friday is up to you. If you’re struggling with a creaking and fragile system, then maybe don’t until the necessary work has been put in to improve how changes go into production. But do that work – don’t put it off. Banning Friday deploys as a band-aid to cover temporary underinvestment in infrastructure is fine. That’s sensible damage limitation for the good of the business. However using it to cover permanent underinvestment is bad.

If you’re not entirely certain of the potential effects of a change, then delay until Monday. But work out what could be in place next time to make those effects more clear, and invest in the surrounding infrastructure to make it so. As with life, there are always nuances in the details of each of our decisions. It’s not black or white, wrong or right: as long as we’re doing our best for the business, our applications and each other, whilst improving our systems as we go along, then we’re doing just fine.

Happy deploying.

Coding Bootcamps: a Glimpse at the Future of Education?

comments 3
Current affairs
Photo by Charles ?? on Unsplash.

Education is big business.

It is projected that over 20 million students will be enrolled in degree-granting institutions in the US in fall 2020. That’s 20 million people willing to invest multiple years of their lives and to incur an average of nearly $30,000 of debt in order to earn a degree, typically to maximize their chances of starting a career in the discipline of their choice. This brings high pressure for the hope of a stable future.

The barrier to entry for our top institutions isn’t just financial. Getting a place is hard work. It is a journey that begins in the early years of a child’s life, requiring persistent effort from them, their teachers and their families. Throughout the high school years they must sustain a high GPA, discover their interests, and apply for the best colleges. Securing a place at a prestigious university is a big deal. The recent US college admissions scandal revealed that the family of a Chinese student paid $6.5 million to help her secure her place at Stanford University.

But let’s step back a second. Why is a place at a prestigious institution worth risking prosecution for a bribe of millions of dollars? Derek Thompson writes for The Atlantic that “Ivy League and equivalent institutions provide more than world-class instruction. They confer a lifetime of assistance from prodigiously connected alumni and a message to all future employees that you’re a rarified talent.” It’s no wonder that many will try to secure a place by any means possible.

However, a 2002 economics paper revealed that the salary increase from going to the most selective schools is “indistinguishable from zero”, although the “payoff to attending an elite college appears to be greater for students from more disadvantaged family backgrounds”. In short, good universities will give your job prospects a lift compared with non-attendance, especially if you are not already in an elite class, but they won’t necessarily make you any better off than if you had attended an institution further down the rankings table.

So let’s step back even further. Colleges, and the education system more broadly, are believed to have originated circa 387 BC. Imagine yourself at Plato’s Academy, set amongst the olive groves of ancient Athens. Here, a selective cohort gravitated around a rare resource: the knowledge that was taught through lectures and discourse. The rarity and exclusivity of knowledge as a resource is a theme that can be observed through history: the Library of Alexandria formed as a central repository of irreplaceable documents in ancient Egypt. Ancient Athenian academic culture gave birth to the formal education system, with the most talented students being taught by the most renowned philosophers: think Socrates, Aristotle and Pythagoras.

In Europe during the High Middle Ages, familiar formal degree titles were developed such as bachelor’s, master’s and doctorates. Oxford University was constructed in the 11th Century, with Cambridge following in the 13th. The tradition of an institution awarding formal degree titles continues today. The most exclusive schools offer the most exclusive degrees which carry the most prestige. Being admitted to an exclusive school is about gaining access to the knowledge and wisdom that they contain. One tries to gain access to the best possible resources for the hope of a better future.

However, we could argue that the greatest repository of human knowledge is now no longer a particular school, library or degree program. It is the Internet. Armed with a computer and an Internet connection, a self-motivated individual could, in theory, teach themselves anything that they want to know. At the time of writing, the English version of Wikipedia alone has over 5 million articles. Compare that to the library of just 320 volumes that were bequeathed by John Harvard for the founding of the eponymous university in 1636.

However, with no formal syllabus or curriculum, the self-learner would find themselves overwhelmed by the amount of information online. Where would they start? What is essential information and what is merely supplementary? What is a trusted source, and what is incorrect? Autodidactism – the act of self-education without teachers or institutions – is a rare skill possessed by the most self-motivated and often the most naturally gifted. We could not expect everyone on the planet to learn unaided.

The process of finding the most valuable information, and ensuring a steady progression through concepts of increasing complexity, is a key that traditional education institutions still hold: decades of teaching and proven feedback loops, informed by the latest research, keeps syllabuses up to date. Teachers can also observe and interact with learners in order to gauge their understanding of the material and help them progress if they are stuck.

Clearly, our schools and colleges offer a learning experience far greater than self-learning via the Internet, regardless of the ease of access to the material. They also reward those that graduate with formal qualifications or accreditations which have become the societal norm for acceptance into graduate schemes and prestigious jobs.

However, is it possible to imagine a future where the Internet could be used as a delivery mechanism of world-class education, regardless of the student’s location or background? Can we lower the barrier of entry to education through technology and educate the world in a democratized manner?

Tutoring turned nonprofit

Salman Khan, founder of Khan Academy. Credit: Darth Viral on Flickr/CC BY 2.0.

In 2003, Salman Khan, a financial analyst, began tutoring his cousin in mathematics over the Internet using Yahoo! Doodle Notepad, a tool for remote participants to draw pictures together. With his friends and relatives also reaching out for his help, he started a YouTube channel. The videos proved extremely popular, enough so for him to quit his job and build Khan Academy, a nonprofit organization that offers free online courses for anyone in the world with an Internet connection.

At present, the site offers a comprehensive curriculum of STEM subjects up to high school level and also features some arts and humanities courses. Google, Comcast and Bank of America are some of the supporters that have donated over $10 million each to the cause, with a total of $53 million being donated in 2017, the latest reporting period. The success stories detail how Khan Academy has allowed high school students to make breakthroughs in understanding mathematics and how adults have used it to refresh their skills after being away from subjects for decades.

Nonprofits such as Khan Academy have harnessed the distribution power of the Internet in order to deliver education to millions around the world. However, being nonprofit organizations, these courses are not pitched as a replacement for traditional education. They are instead a supplement delivered alongside the traditional education system. They do not guarantee any particular outcome or accreditation as a result of completing the courses. However, the website does state that “students who complete 60% of their grade-level math on Khan Academy experience 1.8 times their expected growth” on the NWEA MAP Test, which measures a student’s academic progress at school.

People can take or leave Khan Academy: it has no ulterior motive. The material is there for the learner if they want it, and they do not rely on those that take the courses for their income, so they have no incentive other than their own will to provide good material. Yet, not everyone can have Khan Academy’s runaway success. Salman Khan was in the right place at the right time with what the world wanted. He had first-mover advantage.

For those who didn’t ride Khan Academy’s wave, but still want to make a living offering educational resources, the alternative route is through paid models, where the student is also the customer. But is this model tainted by the need to provide results?

Who even needs a degree?

Given that most students attend university in order to maximize their chances of landing a good career, could it be possible that they could achieve this without going to university at all? Whereas some fields still require lengthy formal education, such as medicine, other fields are beginning to relax the constraints on employment. Fueled by the high demand for talent, technology companies are beginning to look at other ways in which they can fill entry-level positions apart from targeting those that are graduating from computer science programs.

There has been an explosion in the number of so-called “coding bootcamps” over the last decade. The premise is simple: students pay to enroll and are given a crash course in computer programming using the most common languages and tools in the industry. Courses typically last 6 months to a year, and those that perform well have a good chance of landing a well-paid job at a technology company, without having to have spent the time and money required to go through a university degree. The cost of enrolling in a coding bootcamp is still high, however: according to Course Report, bootcamp tuition fees can range from $9,000 to $21,000. This is still not as expensive and time consuming as college, given that many offer their tuition online, but it is still expensive enough for it to be a considered life decision for many to enroll.

One of the highest profile coding bootcamps is Lambda School, a San Francisco based bootcamp offering courses in computer science combined with either full-stack Web, Android or iOS development. Students can also task courses in Data Science or UX Design. These are all highly in-demand skills in the technology industry. Entry level positions in these roles in tech-centric US cities such as San Francisco and New York City can net six-figure starting salaries. This bootcamp model seems to be working: Lambda School announced in January that it raised $30 million in Series B funding from venture capital firms such as Google Ventures, and renowned startup incubator Y Combinator amongst others, giving it a post-money valuation of around $150 million.

In addition to having no physical campus with courses being delivered online, Lambda School is offering an extremely attractive financial arrangement for those that enroll: as an alternative to paying the $20,000 tuition fee up front, students can opt to enter an income share agreement. This allows them to defer payment until they are earning over $50,000 annually for two years. At that point, 17% of their salary is paid back to Lambda School until the debt is paid off. If students are unable to find a job with this level of compensation after five years, they owe nothing. For students from disadvantaged backgrounds, or those wanting to make a complete career switch, this offer is seriously enticing.

Via their Twitter accounts, current and past Lambda School students share their stories of success. There are numerous tweets highlighting graduates that have landed full time programming jobs, with some earning around $100,000 a year. “I make more than both of my parents combined,” writes one anonymous user in a screenshot from their Slack account. “Also my family situation requires me home a lot, which was a huge motivator for the switch to tech anyway.”

According to the self-reported outcomes webpage, graduates have gone on to work for well-known technology companies such as Stripe and Uber. Of their first four cohorts for the Full Stack Web Programming bootcamp, 100%, 88%, 71% and 75% have gone on to find jobs, respectively. At the time of writing, Course Report gives Lambda School an average rating of 4.91 from 65 reviews, and SwitchUp reports a similar average from 143 reviews.

But are students happy?

However, searching online for the experiences of students doesn’t come up all roses. Clusters of reviews on Reddit offer a different picture of the student experience. “I’m going to say this course is not worth it and I don’t recommend it,” says user SpecialistManner. “It’s a scam with a business plan. It’s basically a MOOC [Massively open online course] without the organization, a Slack channel, and 8,000x the brogrammer snark,” reports another user. Various posts highlight concerns, such as unprofessional staff, the disorganization of Slack communications, and inability to cater for the learning styles of individuals.

“They have still neglected to report their hiring stats to CIRR since forever,” writes a participant on Reddit. The CIRR, or the Council on Integrity in Results Reporting by its full name, provides a standardized system for reporting student outcomes for courses. This omission has since been rectified by Lambda School, and the graduate outcomes for January – June 2018 are available. It states that 71 students graduated in the reported time period, with 51.4% graduating on time, and 78.5% graduating within 150% of the length of the program. After 180 days, 85.9% of students had achieved full-time employment, with a median salary of $60,000. 37.8% of students had landed jobs paying $80,000 or more. Lambda School isn’t alone in reporting similar success rates. Hack Reactor Austin reports 94.5% of 50 students in full-time employment earning a median pay of $76,500 and Codesmith NYC reports 91.2% of 31 students employed at a median pay of $112,500.

It is worth noting that most, if not all, schools and colleges have their fair share of bad reviews. According to StudentCrowd, the University of Oxford and the University of Cambridge, currently ranked as world number one and two in the Times Higher Education university rankings, have reviews of 4.4 (20 respondents) and 4.17 (33 respondents) respectively. “As a PhD graduate from Cambridge University, I can say that this is the most racist place on earth. All they want is internal student’s money!” writes user rtert. Despite the prestigiousness of the institution, there will always be people experiencing the full spectrum of human behavior, good and bad.

As Lambda School is the poster child for online coding bootcamps, the burning heat of the media spotlight holds it to a certain amount of scrutiny. However, it is worth remembering that the school is a for-profit private institution that does not need to conform to the same kinds of governmental or state regulation that traditional schools are held to. In fact, in 2014, the Californian Bureau for Private Postsecondary Education sent cease and desist letters to a number of coding bootcamps as they were deemed to be operating without approval in states that have authorization requirements for private education.

Regardless of state authorization requirements, the most worrying aspect of the explosion in online coding bootcamps is that their content is unregulated by an external body, meaning that bad actors or courses of poor quality can sell a dream to students which never comes true. At worst, students could find themselves out of pocket and out of luck as the market begins to rapidly expand and other bootcamps try to get in on the action. “False hope and exaggeration is something these camps thrive on,” writes an anonymous source. “I have become increasingly agitated by articles that completely contradict my own experience and the experiences of my fellow students.” With many people considering to invest serious amounts of money into these programs, how can they guarantee that it’ll be worthwhile?

What does the future of education hold?

We cannot argue that coding bootcamps are offering real opportunities to people. We also cannot argue that websites like Khan Academy and Udemy are having success in designing and delivering online curriculums. The futurist can ponder whether there is a universal education that is delivered to students worldwide via the Internet, rather than via the physical classroom. Could we imagine a future where students can learn at their own pace, in self-selected areas of interest, and have software and AI monitor their progress and suggest additional materials and guidance if they are getting stuck? After all, plenty of technology work is now fully remote.

In his tweet storm, Naval Ravikant, CEO of AngelList paints a compelling picture of technology-based education. “A generation of autodidacts, educated by the Internet and leveraged by technology, will eventually starve the industrial-education system,” he writes. “Eventually, the tide of the Internet and rational, self-interested employers will create and accept efficient credentialing… and wash away our obsolete industrial-education system.” It is a bold view, highlighting the gap between a traditional degree and the skills that an employer wants. Yet, one could argue that the sole purpose of university is not to push all students into employment, it is just one of the outcomes in a journey of self-discovery, the joyous opportunity to explore a subject deeply, and the ability to mix with a vast cohort of like minded individuals.

Perhaps instead we could see a greater diversity in the institutions in which students are able to get their education. Coding bootcamps have some parallels to vocational education whereas a university education is more abstract and academic. Both are valid routes to employment. Students of all ages needing daily support from teachers, either because of their individual needs or their personal preference, may always have a place in traditional school systems. However, one could imagine an alternative self-led education system for the autodidact, taught primarily online.

Such education could reach people who may not have access to quality schools in their local area. Technology could connect with world’s best teachers with the world’s most needing students. Whether this can begin to replace traditional education at the K-12 level remains to be seen, as school is also implicit childcare for busy and working parents. But as Ravikant says, “The best teachers are on the Internet. The best books are on the Internet. The best peers are on the Internet.” We should be working on more ways to connect these teachers, books and peers together for the good of society.