About the job Data Scientist nlp remote
Data Scientist nlp remote
Data Scientist to help revolutionize the healthcare industry with AI. This is a critical role where the right candidate will have the ability to work on a wide range of problems in the healthcare industry with an unparalleled amount of data.
Youll join a team focused on deep medical document understanding, extracting meaning, intent, and structure from unstructured medical and administrative records. Our mission is to build intelligent systems that can reliably interpret complex, messy, and high-stakes healthcare documentation at scale.
This role is a unique blend of applied machine learning, NLP, and product thinking. Youll collaborate closely with cross-functional teams to:
Design and develop models to extract entities, detect intents, and understand document structure
Tackle challenges like long-context reasoning, layout-aware NLP, and ambiguous inputs
Evaluate model performance where ground truth is partial, uncertain, or evolving
Shape the roadmap and success metrics for replacing legacy document processing systems with smarter, scalable solutions
We operate in a high-trust, high-ownership environment where experimentation and shipping value quickly are key. If youre excited by building systems that make healthcare data more usable, accurate, and safe, please reach out.
Qualifications
3+ years of experience with data science and machine learning in an industry setting, particularly in designing and building NLP models.
Proficiency with Python
Experience with the latest in language models (transformers, LLMs, etc.)
Proficiency with standard data analysis toolkits such as SQL, Numpy, Pandas, etc.
Proficiency with deep learning frameworks like PyTorch (preferred) or TensorFlow
Industry experience shepherding ML/AI projects from ideation to delivery
Demonstrated ability to influence company KPIs with AI
Demonstrated ability to navigate ambiguity
Bonus Experience
Experience with document layout analysis (using vision, NLP, or both).
Experience with Spark/PySpark
Experience with Databricks
Experience in the healthcare industry
Responsibilities
Play a key role in the success of our products by developing models for document understanding tasks.
Perform error analysis, data cleaning, and other related tasks to improve models.
Collaborate with your team by making recommendations for the development roadmap of a capability.
Work with other data scientists and engineers to optimize machine learning models and insert them into end-to-end pipelines.
Understand product use-cases and define key performance metrics for models according to business requirements.
Set up systems for long-term improvement of models and data quality (e.g. active learning, continuous learning systems, etc.).
After 3 Months, You Will
Have a strong grasp of technologies upon which our platform is built.
Be fully integrated into ongoing model development efforts with your team.
After 1 Year, You Will
Be independent in reading literature and doing research to develop models for new and existing products.
Have ownership over models internally, communicating with product managers, customer success managers, and engineers to make the model and the encompassing product succeed.
Be a subject matter expert on models and a source from which other teams can seek information and recommendations.
Data Scientist nlp remote
Data Scientist to help revolutionize the healthcare industry with AI. This is a critical role where the right candidate will have the ability to work on a wide range of problems in the healthcare industry with an unparalleled amount of data.
Youll join a team focused on deep medical document understanding, extracting meaning, intent, and structure from unstructured medical and administrative records. Our mission is to build intelligent systems that can reliably interpret complex, messy, and high-stakes healthcare documentation at scale.
This role is a unique blend of applied machine learning, NLP, and product thinking. Youll collaborate closely with cross-functional teams to:
Design and develop models to extract entities, detect intents, and understand document structure
Tackle challenges like long-context reasoning, layout-aware NLP, and ambiguous inputs
Evaluate model performance where ground truth is partial, uncertain, or evolving
Shape the roadmap and success metrics for replacing legacy document processing systems with smarter, scalable solutions
We operate in a high-trust, high-ownership environment where experimentation and shipping value quickly are key. If youre excited by building systems that make healthcare data more usable, accurate, and safe, please reach out.
Qualifications
3+ years of experience with data science and machine learning in an industry setting, particularly in designing and building NLP models.
Proficiency with Python
Experience with the latest in language models (transformers, LLMs, etc.)
Proficiency with standard data analysis toolkits such as SQL, Numpy, Pandas, etc.
Proficiency with deep learning frameworks like PyTorch (preferred) or TensorFlow
Industry experience shepherding ML/AI projects from ideation to delivery
Demonstrated ability to influence company KPIs with AI
Demonstrated ability to navigate ambiguity
Bonus Experience
Experience with document layout analysis (using vision, NLP, or both).
Experience with Spark/PySpark
Experience with Databricks
Experience in the healthcare industry
Responsibilities
Play a key role in the success of our products by developing models for document understanding tasks.
Perform error analysis, data cleaning, and other related tasks to improve models.
Collaborate with your team by making recommendations for the development roadmap of a capability.
Work with other data scientists and engineers to optimize machine learning models and insert them into end-to-end pipelines.
Understand product use-cases and define key performance metrics for models according to business requirements.
Set up systems for long-term improvement of models and data quality (e.g. active learning, continuous learning systems, etc.).
After 3 Months, You Will
Have a strong grasp of technologies upon which our platform is built.
Be fully integrated into ongoing model development efforts with your team.
After 1 Year, You Will
Be independent in reading literature and doing research to develop models for new and existing products.
Have ownership over models internally, communicating with product managers, customer success managers, and engineers to make the model and the encompassing product succeed.
Be a subject matter expert on Datavants models and a source from which other teams can seek information and recommendations.