Senior Datacenter Resiliency Architect

Company : TieTalent
Location : Santa Clara, CA, 95053
Posted Date : 15 September 2025
Job Type : Other
Category : Architecture
Occupation : Architect
Job Details
Join to apply for the Senior Datacenter Resiliency Architect role at TieTalent
We are seeking a Senior Datacenter Resiliency (RAS) Architect to support the development and validation of GPU hardware and software resiliency features. You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries, with impact on the industry’s leading Datacenter GPUs and SOCs powering AI and HPC products.
What you’ll be doing
- Architect hardware and software resiliency features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter.
- Model and analyze RAS metrics (e.g., Failures in Time for permanent and transient errors, Availability from GPU to Rack to Datacenter); use models to identify gaps and drive RAS improvements.
- Collaborate with architects, unit designers, and software engineers to ensure alignment of verification requirements.
- Develop and implement comprehensive architecture verification test plans for resiliency features.
- Execute Architecture Test Plan by developing test content and enabling, running, and debugging tests on architecture models; support test debug on RTL, emulation, and silicon.
- Run simulations to analyze Architectural Vulnerability Factor and liveness of on-die memory, flip-flops, and latches.
- Develop CUDA software diagnostics kernels to run on clusters of NVIDIA GPUs to identify hardware issues.
- Develop and automate fault models to simulate various fault types (e.g., transient faults, stuck-at faults) in gate-level netlists, RTL, architectural models, silicon, and other environments.
What we need to see
- Master’s or PhD in Computer Engineering, Electrical Engineering, or closely related field, or equivalent experience.
- At least 5+ years of relevant experience.
- Familiarity with GPU and networking architectures, computer architecture basics (caches, coherence, buses, DMA), and machine learning/deep learning concepts.
- Strong knowledge and experience in GPU hardware architecture or RAS features, or both.
- Proficiency in developing architecture models.
- Scripting and automation with Python or similar; proficiency in C/C++.
- Excellent interpersonal skills and ability to collaborate with on-site and remote teams; strong debugging and analytical skills; self-driven and results oriented.
- Experience with resiliency and datacenter RAS or Verilog/SystemVerilog RTL simulations and debugging; ability to set up test benches and integrate components is a plus.
- Programming with CUDA is a plus.
Company/role notes
NVIDIA’s work spans high-performance computing and AI computing—roles involve building resilient, high-availability computing platforms for AI, HPC, and data center workloads. NVIDIA is an equal opportunity employer; we do not discriminate on protected characteristics.
#J-18808-LjbffrRecently Posted Jobs
Concession Lead- Florence Civic Center
ASM Global
Florence, SC
1 Year Exp. Required - CDL-A Truck Driver - Competitive Pay + Benefits
Crete Carrier Corporation
Blackwood, NJ
Laboratory Technician
Cardinal Health
Colton, CA
Nuclear Medicine Technologist
Compunnel Software Group, Inc.
Charlotte, NC
Travel Certified Surgical Technologist - $2,435 per week
HonorVet Technologies
Pleasant Prairie, WI
Korean Speaking Physical Therapist (PT) for Home Health
FeldCare Connects
Northridge, CA
Travel Nurse RN - Med Surg - $1,270 per week
Skyline Med Staff Allied
Sahuarita, AZ
Engineering Manager - Ubuntu Security
Canonical
Chicago, IL
Manager of Nursing Hos- Neonatal ICU
Albany Medical Center
Albany, NY
Automotive and Diesel Mechanic - Hicksville School District
First Student
Hicksville, NY
Coordinator III, Enrollment Communications
Lone Star College
Cypress, TX
Physician / Cardiology / Georgia / Locum or Permanent / Non-invasive Cardiologist opening in Columbus, GA - inpatient/outpatient mix Job
Britt Medical Search
Columbus, GA
CNA - Weekend Option
American Senior Communities
Noblesville, IN
HOUSEKEEPER LEAD (FULL TIME)
Crothall Healthcare
Boston, MA
Pediatrics Physician Assistant
Summit Recruiting Group
New York, NY
Senior, Software Engineer - Machine Learning Platform
Walmart
San Jose, CA
Registered Nurse (RN) - Cath Lab
Conemaugh Health System
Homer City, PA
Registered Nurse - Orthopaedics and Spine - PRN Days
Williamson Health
Madison, TN
Pediatric, Hospitalist
Indiana University Health
Bloomington, IN
Home Health Speech Therapist, SLP
Residential Home Health and Hospice
Palmetto, FL
Other Jobs You May Like
Senior Director, Platform Technical Product Management - EXE & SRE
Company : Nike
Location : Beaverton, OR
Senior Software Engineer, Converse Marketing Tech (ATL or Boston)
Company : Nike
Location : Boston, MA
Senior Director, Product Management, Consumer Product & Innovation
Company : Nike
Location : Beaverton, OR
Attorney -Customs & Regulatory Affairs Law (Senior Legal Counsel)
Company : DHL
Location : Washington, DC
Senior Administrative Assistant - North America Retail Marketing
Company : Nike
Location : Beaverton, OR
Senior Director, Platform Technical Product Management - Foundational Platforms
Company : Nike
Location : Beaverton, OR
Senior Administrative Associate SUNDAY-THURSDAY 12:00 AM - 8:30 AM
Company : UPS
Location : Louisville, KY
Top searches
Employment opportunities at TieTalent
- TieTalent jobs near me Santa Clara, CA
- TieTalent jobs hiring near me Santa Clara, CA
- TieTalent jobs near Santa Clara, CA
- TieTalent jobs hiring near me
- TieTalent openings near me
- TieTalent jobs near me in Santa Clara, CA
- TieTalent jobs hiring in Santa Clara, CA
- Employment opportunities near me
- Job openings near me
- Jobs hiring immediately
Trending Searches in Santa Clara, CA