JobsUSA Logo ImageJobs USA

  • Home
  • About Us
  • Contact Us

Senior Datacenter Resiliency Architect

TieTalent

Company : TieTalent

Location : Santa Clara, CA, 95053

Posted Date : 18 November 2025

Job Type : Other

Category : Architecture

Occupation : Architect

Job Details

Join to apply for the Senior Datacenter Resiliency Architect role at TieTalent

We are seeking a Senior Datacenter Resiliency (RAS) Architect to support the development and validation of GPU hardware and software resiliency features. You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries, with impact on the industry’s leading Datacenter GPUs and SOCs powering AI and HPC products.

What you’ll be doing

  • Architect hardware and software resiliency features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter.
  • Model and analyze RAS metrics (e.g., Failures in Time for permanent and transient errors, Availability from GPU to Rack to Datacenter); use models to identify gaps and drive RAS improvements.
  • Collaborate with architects, unit designers, and software engineers to ensure alignment of verification requirements.
  • Develop and implement comprehensive architecture verification test plans for resiliency features.
  • Execute Architecture Test Plan by developing test content and enabling, running, and debugging tests on architecture models; support test debug on RTL, emulation, and silicon.
  • Run simulations to analyze Architectural Vulnerability Factor and liveness of on-die memory, flip-flops, and latches.
  • Develop CUDA software diagnostics kernels to run on clusters of NVIDIA GPUs to identify hardware issues.
  • Develop and automate fault models to simulate various fault types (e.g., transient faults, stuck-at faults) in gate-level netlists, RTL, architectural models, silicon, and other environments.

What we need to see

  • Master’s or PhD in Computer Engineering, Electrical Engineering, or closely related field, or equivalent experience.
  • At least 5+ years of relevant experience.
  • Familiarity with GPU and networking architectures, computer architecture basics (caches, coherence, buses, DMA), and machine learning/deep learning concepts.
  • Strong knowledge and experience in GPU hardware architecture or RAS features, or both.
  • Proficiency in developing architecture models.
  • Scripting and automation with Python or similar; proficiency in C/C++.
  • Excellent interpersonal skills and ability to collaborate with on-site and remote teams; strong debugging and analytical skills; self-driven and results oriented.
  • Experience with resiliency and datacenter RAS or Verilog/SystemVerilog RTL simulations and debugging; ability to set up test benches and integrate components is a plus.
  • Programming with CUDA is a plus.

Company/role notes

NVIDIA’s work spans high-performance computing and AI computing—roles involve building resilient, high-availability computing platforms for AI, HPC, and data center workloads. NVIDIA is an equal opportunity employer; we do not discriminate on protected characteristics.

#J-18808-Ljbffr

Trending Searches in Santa Clara, CA

  • Full time jobs near me Santa Clara, CA
  • Local job openings
  • Places hiring near me
  • Job vacancies near me

Other Jobs You May Like

Senior Product Leader, SVP (Enterprise Transformation Enablement) - Hybrid

Company : M&T Bank

Location : Bridgeport, CT

Senior Portfolio Manager (Strategic Enterprise Initiatives) - Hybrid

Company : M&T Bank

Location : Buffalo, NY

Senior Manager, Technical Accounting

Company : Lyft

Location : New York, NY

Senior Regional Marketing Manager, Lyft Urban Solutions

Company : Lyft

Location : New York, NY

Data Visualization Analyst Senior

Company : M&T Bank

Location : Buffalo, NY

Senior Recruiter GTM

Company : ManpowerGroup

Location : San Jose, CA

Senior Manager, Global Trade Compliance

Company : Macpower Digital Assets Edge

Location : Salt Lake City, UT

Senior Buyer - Medical/Healthcare industry

Company : Macpower Digital Assets Edge

Location : Irving, TX

Senior Internal HR Consultant / HR Project Manager - Business Transformation Support

Company : Macpower Digital Assets Edge

Location : New York, NY

Senior Homecare Marketing Business Analyst

Company : Macpower Digital Assets Edge

Location : Atlanta, GA

Senior Manager, People Services - Hybrid Remote(OR/WA only)

Company : Legacy Health

Location : Portland, OR

Senior Information System Security Engineer (ISSE)

Company : Leidos

Location : Alexandria, VA

Senior Salesforce Architect

Company : Leidos

Location : Vienna, VA

Senior Distribution Engineer

Company : Leidos

Location : Reading, PA

Top searches

  • Jobs hiring immediately
  • Part time jobs near me
  • Full time jobs near me
  • Jobs that are hiring near me
  • Jobs near me hiring now

Employment opportunities at TieTalent

  • TieTalent jobs near me Santa Clara, CA
  • TieTalent jobs hiring near me Santa Clara, CA
  • TieTalent jobs near Santa Clara, CA
  • TieTalent jobs hiring near me
  • TieTalent openings near me
  • TieTalent jobs near me in Santa Clara, CA
  • TieTalent jobs hiring in Santa Clara, CA
  • Employment opportunities near me
  • Job openings near me
  • Jobs hiring immediately
  1. Home
  2. Companies
  3. TieTalent Jobs Hiring Near Me
  4. Senior Datacenter Resiliency Architect job in Santa Clara, CA, US
JobsUSA Footer Logo ImageJobs USA

Email: admin@jobsusa.ai

Address: 100 Summit Dr, Burlington,
Massachusetts 01803, US

twitter linkedin instagram

Useful Links

  • Blogs
  • Jobs By Company
  • Cookie Policy
  • Privacy Policy
  • Terms and Conditions

Browse Categories

  • Nursing
  • Therapy
  • Management
  • Physicians & Surgeons
  • Administrative Assistance
  • Medical Technician
  • Retail
  • Sales
  • Driving
  • Installation & Maintenance

© 2025 Jobs USA.
All Rights Reserved.