Home Insights Blogs


Top 10 Questions: Hiring AI Infra Experts​

Are you looking to bring on board an AI Infrastructure Practitioner or an AI Consulting Practice Leader or an AI Service Provider?​

Discover ​what to ask them what to look for ​in the current climate of GPU Shortage, High Costs and Volatility!​

Quick overview of the questions!​

  • Understanding GPU Utilization in AI: “How different AI tasks like training and   inferencing affect GPU requirements and cost optimization?”
  • Procurement Strategy Amid Market Volatility: “What strategies do you recommend for GPU procurement in volatile markets?”
  • Deciding Between Cloud and On-Premise GPUs: “What factors to consider when choosing between cloud-based & on-prem GPUs?”​
  • Tackling Deployment Challenges: “What are common GPU deployment challenges and how can they be overcome?”​
  • Cost Management in GPU Deployment: “How should we manage the costs of GPU deployment?”​
  • Future-Proofing GPU Investments:  “What approaches can future-proof GPU investments?”​
  • Criteria for Choosing a GPU Vendor: “What factors are key when choosing a GPU vendor?”​
  • Adapting to AI Trends and GPU Needs: “How should organizations adapt to changing AI trends and GPU needs?”​
  • Strategies for Managing GPU Shortages:  “What strategies ensure continuous operation during GPU shortages?”​
  • Vision for Future AI Infrastructure: “What trends will shape AI infrastructure in the next five years?”​

Read more to understand area of questioning and potential response:

Understanding of high-level nuances of GPU Utilization in AI  

Q: Given that different AI tasks like training and inferencing have varying GPU requirements, could you explain – with an example – how these differences affect resource allocation and cost optimization in AI applications?

Potential Response: ​

Training AI models requires robust GPUs due to their intensive computational needs, whereas inferencing can often utilize less powerful GPUs since it involves applying pre-trained models which is generally less computationally intensive. Effective resource allocation based on specific AI tasks can significantly cut costs. ​

For example, a tech startup specializing in real-time video processing might find that while training their models requires high-end GPUs like NVIDIA’s A100, their inferencing could be efficiently handled by more cost-effective T4 GPUs, drastically reducing operational costs.

Understanding of high-level nuances of GPU Utilization in AGPU Procurement Strategy amid High Volatility​  

Q: Considering the volatility in the GPU market, with frequent shortages and high costs, what strategies would you recommend for effectively procuring GPUs under these conditions?

Potential Response: ​

Organizations should evaluate their actual computational needs versus market availability, consider leasing options, and possibly secure GPUs through long-term contracts to mitigate risks associated with market volatility and high costs.​

Take the case of a data analytics firm during the 2020 GPU shortage; by entering a long-term lease agreement with a hardware provider, they secured a steady supply of GPUs at a fixed cost, avoiding the price spikes from market scarcity.​

Deciding between Cloud and On-Premise GPUs​  

Q: With companies facing the decision between cloud-based and on-premise GPU solutions, each offering distinct benefits and challenges, what critical factors should they consider to make the best choice?

Potential Response: ​

Key factors include cost, scalability, control over hardware, and data security concerns. Cloud GPUs offer flexibility and scalability without the upfront investment, whereas on-premise solutions provide more control and can be cost-effective over the long term.​

A healthcare organization handling sensitive patient data might opt for on-premise GPUs to comply with strict data privacy regulations, despite the higher initial cost, providing them better control over their data security compared to cloud solutions.

Appreciation of common hurdles in GPU Deployment  

Q. Deploying GPU infra presents several challenges that can impact project success. Could you share some common hurdles organizations face during deployment and how to overcome them?

Potential Response: ​

Common deployment challenges include supply chain delays, technical expertise required for installation, compatibility issues with existing systems, etc. Overcoming these involves strategic planning, vendor support, and ensuring staff are trained in the latest GPU technologies. ​

Consider a gaming company that faced deployment delays due to improper compatibility checks with existing infrastructure. They overcame this by implementing a pre-deployment validation process that tests new GPUs with their systems before full-scale deployment.​

Cost management issues in GPU deployment 

Q. Cost management is crucial in GPU deployment, given the significant upfront and ongoing expenses. How should companies manage these costs effectively to maintain budget health and project feasibility?

Potential Response: ​

Effective cost management requires evaluating total cost of ownership, including energy costs and maintenance. Organizations should also explore volume discounts, consider energy-efficient GPUs, and calculate the return on investment for each deployment scenario.​

A multinational corporation implemented a tiered approach to GPU deployment, using high-performance GPUs for critical data science tasks while opting for mid-range GPUs for routine tasks, effectively balancing performance needs with cost efficiencies.​

Future-proofing GPU Investments

Q. As AI technologies evolve rapidly, the need for adaptable and scalable GPU resources becomes more critical. What approaches would you suggest for organizations looking to future-proof their GPU investments?

Potential Response: ​

To future-proof GPU investments, companies should focus on scalability and flexibility in their GPU setups, stay updated with the latest technology trends, and choose hardware that can support upcoming advancements in AI and ML.​

Consider an AI research lab that I know of, which frequently updates its GPU inventory to keep up with cutting-edge developments, ensuring their infrastructure is not only scalable but also adaptable to new AI models and algorithms that emerge.​

Selecting the Right GPU Vendors

Q. Selecting the right GPU vendor is vital for ensuring the reliability and performance of AI applications. What criteria do you consider most important when you had to evaluate potential GPU vendors?

Potential Response: ​

Important criteria include technological leadership, comprehensive support services, proven reliability, and favourable cost structures. Additionally, assessing the vendor’s stability and commitment to future technologies is crucial.​

A visual effects studio selected a GPU vendor not only based on the raw performance of the hardware but also the vendor’s track record of consistent driver updates and technical support, crucial for their time-sensitive rendering tasks.​

GPU Investments : Keeping Pace (& making peace!) with AI Trends

Q. With current trends in AI technology potentially altering the GPU landscape, how should organizations adapt their strategies to meet these changing requirements?

Potential Response: ​

Organizations need to of course remain agile, continually reassessing their GPU needs as AI technologies evolve. Keeping abreast of technological advancements and preparing to quickly pivot GPU strategies are key to adapting successfully. But clearly this can’t be done frequently. ​

A financial services firm revisited its GPU strategy to incorporate quantum ML algorithms, which required different computational capabilities, thus ensuring their infrastructure remains relevant as new AI paradigms emerge.​

Articulating Strategies to Managing GPU Shortages

Q. GPU shortages pose significant operational challenges. What strategic sourcing strategies would you recommend during these shortages to ensure continuous Ops?

Potential Response: ​

During shortages, organizations should diversify their supplier base, consider alternative technologies, secure inventory in advance, and establish strong relationships with multiple vendors to improve access to necessary GPU resources.​

During the global chip shortage, a cloud service provider collaborated with several smaller semiconductor manufacturers to create a diversified supply chain, reducing their dependency on mainstream GPU suppliers.​

Where is AI Infra heading?

Q: Considering the rapid evolution of AI infrastructure, what key trends do you think will shape up over the next five years, and how should organizations prepare?

Potential Response: ​

I envision AI infrastructure increasingly incorporating specialized processing units to handle diverse AI tasks more efficiently. Organizations should prepare for a shift towards more heterogeneous computing environments that leverage a mix of traditional GPUs and specialized accelerators tailored to specific AI applications.​

Looking ahead, for example, we can already see large clients experimenting with AI-specific chips like Google’s TPU and FPGAs, which offer efficient processing for particular types of neural networks, signalling a shift towards more specialized components in AI data centres.​

Ready to scale your AI operations?Contact us to build a world-class AI Infrastructure.​