Position Title: Computational Science Analyst II
The Brain Architecture Project at Mitra Lab, Cold Spring Harbor Laboratory is currently seeking a Data engineer/analyst with expertise in big image data and a background in machine vision, to work on a petabyte data set of histological brain image volumes in the Mouse Brain Architecture Project in the Mitra lab. The position provides an opportunity to apply and further develop big data analytics skills in a cutting edge neuroscience project to map and understand neural circuit architecture in whole brains. The successful candidate should be comfortable working in a Linux environment and distributed/networked computation, and be able to participate in maintaining and growing a large storage and compute cluster.
In order to understand how brains work, we need to understand their circuit connectivity. After a century of research this knowledge remains incomplete, so multiple research groups are now targeting this knowledge gap to fundamentally advance our understanding of brain function. The Brain Architecture Project is a collaborative effort aimed at creating an integrated resource containing knowledge about nervous system architecture in multiple species. Currently the project is focused on analyzing a large data set of mouse brains (>1000 brains, each with >100Gigavoxels of image data) to obtain a comprehensive circuit map of the whole brain. The candidate will participate in an exciting fundamental neuroscience project, while undertaking advances in machine vision/learning in a big data environment using methods that are competitive in both academic and industrial environments.
• The individual is expected to be able to build efficient, flexible, extensible, and scalable solutions to system administration problems and big data handling.
• Develop and translate algorithms (image processing) that integrate into working prototype code.
• Create algorithms and heuristics to extract information from large data sets and implement into software and scripts.
• Maintain and enhance data pipeline (image handling, cluster) for scalability and reliability.
• Mine and organize data sets of both structured and unstructured data.
• Design, implement, and support a platform that can provide ad-hoc access to large image datasets.
• Develop interactive dashboards, reports, and analysis templates.
• A MS or PhD degree in Computer Science, Machine Vision, Artificial Intelligence, Machine Learning, or related technical field (Data Science, Mathematics/Statistics, physical science or engineering is strongly desired).
• Strong Linux and software development skills are required together with experience coding in C/C++, Python and associated languages. MATLAB experience is desirable.
• Database engineering and coding skills are required, including those for big-data (MySQL/NoSQL, etc).
• Experience with the software stack/framework relevant for distributed processing of big data is required (e.g. Spark, SGE)
• Experience in building or maintaining, and interacting with, large, scalable, or high-performance computer systems is required.
• GPU coding experience is desirable.