Caring. Connecting. Growing Together.
As a Site Reliability Engineer (SRE), you will play a key role in ensuring the reliability, scalability, and performance of our systems, applications, and infrastructure. You will bridge the gap between development and operations, applying software engineering practices to solve operational challenges. Your focus will be on automating processes, improving system reliability, and ensuring seamless delivery of services to our customers.
You'll enjoy the flexibility to work remotely from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week.
Primary Responsibilities:
- Build, maintain, and operate the AWS hosted platform
- Work closely with dev teams to identify and measure SLOs, SLAs and SLIs
- Contributor to development of platform services including architecture, provisioning, configuration, deployment, and support
- Integration with centralized logging, metrics dashboards, instrumentation, incident monitoring and management
- Participate in on-call rotation for incident resolution for the platform and/or any dependent components
- React to production deficiencies by continuously implementing automation, self-healing, and real-time monitoring to production systems
- Maintain operational tooling, frameworks
- Perform root cause analysis and deliver resolution for tools and automation failures
- Build/integrate/administer systems and tools that enable engineering teams to observe their applications in production with autonomy (Dashboards, APMs)
- Automate alerts for metrics on performance, cost, vulnerabilities, risk, compliance violations
- Conduct postmortem after production issues
You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.
Required Qualifications:
- 3+ years of experience in software engineering
- 2+ years of scripting experience in Python or Powershell
- 2+ years of experience with Linux system administration and shell scripting
- 2+ years of experience with networking fundamentals including VPN setup, routing, security groups, cross-cloud connectivity
- 1+ years of experience with AWS services to include EC2, VPC, IAM, Lambda, S3, CloudWatch
- 1+ years of experience with Infrastructure-as-Code, Terraform, AWS CloudFormation, CDK
- If you are offered this position, you will be required to provide extensive personal information to obtain and maintain a suitability or determination of eligibility for a Confidential/Secret or Top Secret security clearance as a condition of your employment
- United States Citizenship
Preferred Qualifications:
- Bachelor's degree in Information Technology, Computer Science or related field
- 1+ years of experience with CI/CD pipeline basics using Git and GitLab
- 1+ years of experience monitoring and alerting with CloudWatch and Dynatrace
- 1+ years of experience with containerized workloads (ECS, EKS, etc)
- Experience with security and compliance frameworks: FedRAMP Moderate, NIST 800-171
- Demonstrated use of AI-driven anomaly detection in CloudWatch for proactive issue resolution
- Experience with automation of patching and scaling using predictive models as well as supporting infrastructure for AI-based applications
Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). The salary for this role will range from $71,200 to $127,200 annually based on full-time employment. We comply with all minimum wage laws as applicable.