Employment Type: Permanent/Full-Time
Location: Poland (Remote)
Job Description:
- Deep understanding of the full lifecycle of an ML solution (incl. related ML tooling/infrastructure.
- Solid understanding of advanced analytics, machine learning and deep learning workflows, and industry experience with relevant frameworks (e.g., Pandas, PyTorch)
- Strong experience in Python programming.
- Further programming languages are a plus (eg R, JavaScript)
- 5+ years Hands-on experience with cloud technologies and providers (preferably AWS)
- Experience in software engineering and cloud DevOps)
- Join the Federated Open Science team as an Analytics/ML Engineer and help us push towards a privacy preserving future by researching, designing and implementing analytical code and ML models that fit the federated paradigm.
Key responsibilities
- Ease the adoption of a federated computing software (e.g. NVFLARE, DataShield)
- Create quick starts / data scientists oriented how-to guides explaining step-by-step e.g. how to convert from centralized to federated code
- Contribute to the development and testing of a secure and privacy-preserving federated computing platform and participate in technical design and implementation of the user workflows in a federated setting (incl. at network level)
- Assist in developing and applying of federated analysis scripts / notebooks for RWD/clinical analysis e.g. descriptive cohort statistics, time series analysis
- Facilitates the testing and validation of models to determine viability for federated deployment
- Actively collaborate with PDD/PHC/RWD data scientists to design and assist in the creation of a statistical analysis plan according to a scientific protocol in a federated setting
- Stay up to date with the latest advances in federated computing (FA/FL)
- Communicate results and findings to the team and stakeholders in a clear and concise manner
- Contribute to methodological research in FA/FL to unlock novel analytical/ML approaches (e.g. privacy-preserving functions) in collaboration with data scientists and researchers
- Review code developed by others and provide feedback to ensure best practices (e.g. disclosure, style guidelines, checking code in, accuracy, testability and efficiency)
- Contribute to existing documentation or educational content and adapt content based on platform updates and user feedback
Requirement:
- Passionate about the intersections of healthcare and analytics/AI and software engineering and data engineering
- Sound knowledge and experience in data science workflows and expertise on multi-modal datasets (among RWD, clinical, imaging, biomarker, omics)
- Experience with ML model deployment and lifecycle management and in particular with building, testing, measuring, and deploying machine learning models in production
- Practical understanding of different anonymization and privacy-preserving techniques (nice-to-have)
- Familiarity with machine learning concepts and toolkits such as: cross validation, experiment tracking, statistical tests; scikit learn, xgboost, etc.
- Sound technical and computing scripting / programming skills: Python, R, Pandas, Numpy, PyTorch ML, TensorFlow ML, Matplotlib, Jupyter, Git, Bash/Linux basics
- Good knowledge of software development best practices including testing, continuous integration, and DevOps tools
- Beneficial technical skills: NVIDIA FLARE, Apheris, DataSHIELD, AWS administration, Docker, Kubernetes, Kubeflow, MONAI