Site Reliability Engineer Interview Questions
SRE related questions
- What is the tech stack?
- I'm looking at the question from an operations perspective. Are they using a hodgepodge of languages or is the development flow opinionated? How many different technologies does the team have to support?
- What is the infrastructure stack?
- Depending on what they say, we'll be talking about this for a while and will probably create a lot of other questions.
- What does your metrics & monitoring setup look like? How do you debug issues with the system?
- This may be a controversial one, but if the title is "SRE" I ask why the title is "SRE" and not something else (same for "DevOps"). I'm looking to see if they're being thoughtful about what the term means and how they are defining "resilience" for their systems.
- Walk me through the experience a developer has on-boarding to your development environment. How long do you think it takes?
- Walk me through the experience a developer has deploying with your pipeline. What would you say are the biggest pain points?
- How would you rate test coverage and do you continue to measure that? What about test coverage is important to the team?
- Do you have blue-green deployments? Do you have canaries?
- How do engineers share their work with product teammates in the QA phase? How many environments do you have?
- These questions are incredibly important to me. It could both surface fun red flags for you to discuss with your interviewer and see how receptive they are to your opinions and give you an idea of things you might be working on for them.
- How would you describe the relationship between the operations team, IT, and the rest of the engineering team?
- How do you handle app security? How do you encourage developers to think about the security of their services?
- Do you have to be GDPR compliant? Did that process go smoothly for you?
- This may not lead anywhere, but I'm looking for a discussion about what their data auditing procedures look like, and how easy it is to answer security questions about their data quickly.
- What's your on-call set up look like?
- How many times a month are you on-call?
- When you are on-call, how many times during that period are you getting paged?
- Would you say when you get paged, alerts are actionable?
- Are developers on-call for their services?
- How do you on-board people to on-call?
- How much time is spent on the team in "reactive" rather than "proactive" mode?
- Are most things in the infrastructure stack self-service? Like, what's the process of setting up a new service with data stores?
- Which level of Dickerson's hierarchy of site reliability do you think needs the most work in your stack?
- Overall, how would you rate developer productivity?
- Do you have any open source projects? If not, are you interested in open sourcing anything?
ContentMiddleAd
Standard interview questions
- Does your engineering team have a values statement? What's in it?
- What do you do to foster an environment of learning?
- What does success in this role look like? What sorts of projects or accomplishments could you see being completed 3 months, 6 months, and 1 year out?
- Is the working environment collaborative during work or do people mostly keep to themselves? How so? Is the office open? (I would also ask during on-sites that they show you where you'd be sitting. If you're sensitive to lots of noise while working this could be very important.)
- How much of the team is distributed? What is your "work from home" or "work from X" policy? Is it flexible or set?
- What is the thing you are most excited about working on or launching in the next year?
- What do you like best about working here?
Wait there are some more
ContentMiddleAd
What is an SRE?
Having spent the last 2 years employed as a DevOps, I've often felt that DevOps and SRE were two slightly differing implementations of the same ideas. The first one felt like a set of general principles, when the second one is a clear and detailed model (pre-dating DevOps), with a set of rules and guidelines. Google developed the SRE model and explained it in the SRE book. The underlying ideas are simple, but powerful:
- Develop tools and systems reducing toil and repetitive work from engineers
- Automate everything, or as much as possible (deployments, maintenances, tests, scaling, mitigation)
- Monitor everything
- Think scalable from the start
- Build resilient-enough architectures
- Handle change and risk through SLAs, SLOs and SLIs
- Learn from outages
ContentMiddleAd
If you haven't yet read the SRE book, I strongly urge you to do so. There's even a free online version available. If you do not have the time, then maybe have a look at this Ben Treynor (Google VP Engineering) What is 'Site Reliability Engineering'? interview, for a general introduction.
According to the SRE book, an SRE should spend half of its time on "ops" work, and the other half doing development.
Google places a 50% cap on the aggregate "ops" work for all SREs—tickets, on-call, manual tasks, etc. [...] An SRE team must spend the remaining 50% of its time actually doing development. Source
Some skills are thus paramount to an SRE:
- coding / software development
- system administration and automation
- scalable system design
- system troubleshooting
ContentMiddleAd
Consequently, each of these areas of expertise can be (and often are) the subject of an interview.
As stated by Stanford Medical, It is in fact the one and ONLY reason women in this country get to live 10 years longer and weigh on average 42 lbs less than us.
ReplyDelete(Just so you know, it has totally NOTHING to do with genetics or some secret-exercise and absolutely EVERYTHING about "HOW" they eat.)
BTW, What I said is "HOW", and not "what"...
Click on this link to find out if this brief test can help you find out your real weight loss possibilities
Good blog keep sharing
ReplyDeleteangularjs online training
angularjs online course
7890
ReplyDelete