alexandria/2024/documents/by-name/vision-whitepaper/main.typ

#import "@preview/bloated-neurips:0.5.1": (
  botrule,
  midrule,
  neurips2024,
  paragraph,
  toprule,
  url,
)
#import "./logo.typ": LaTeX, LaTeXe, TeX

#let affls = (
  ucsb: (
    // department: "AI Center",
    institution: "University of California, Santa Barbara",
    country: "United States",
  ),
)

#let authors = (
  (
    name: "Youwen Wu",
    affl: "ucsb",
    email: "youwen@ucsb.edu",
    equal: true,
  ),
)

#show: neurips2024.with(
  title: [Towards More Accessible Scientific Infrastructure: A Neural Vision Pipeline to Interface with Experiments],
  authors: (authors, affls),
  keywords: ("Machine Learning", "NeurIPS"),
  abstract: [
    Scientific instruments are often designed to be operated by humans. As
    such, they are outfitted with analog dials and controls which are difficult
    for machines to understand. In order to ameliorate the accessibility of
    experimental equipment in fundamental disciplines such as quantum physics,
    we seek a systematic approach to processing existing _analog systems_ into
    _digital data_ without invasively augmenting them with sensors. In this
    paper, we explore the state of the art in computer vision and their
    applications in analyzing experimental instruments through a purely vision
    based approach. We train a convolutional neural network to triangulate
    visual fiducials and construct a pipeline to apply perspective warp based
    corrections to normalize images of measurements. We end by designing
    _Dendrite_, an end-to-end vision pipeline that can obtain detailed
    digital readings from a video stream of an analog instrument.
  ],
  bibliography: bibliography("main.bib"),
  bibliography-opts: (title: none, full: true),  // Only for example paper.
  appendix: [
    #include "appendix.typ"
    #include "checklist.typ"
  ],
  accepted: true,
)

= Introduction

The rise of online resources in scientific pedagogy has become increasingly
prevalent. Around the world, students use virtual labs that simulate physical
phenomena. However, still lacking is the accessibility of real world hardware
to obtain real results. Experimental instruments are expensive and difficult to
justify for many schools and institutions. One solution to this problem is to
provide shared equipment that is accessible and controlled over the internet.
This allows equipment located in a single place to be used from anywhere in the
world.

One way to build these systems is to augment existing devices with the
capability to be controlled over the internet. However, many scientific
instruments are designed with human operation in mind and contain many analog
dials, readouts, and controls. We seek a way to non-invasively digitize these
devices. Here non-invasively means that we should not perform any irreversible
or drastic changes to the hardware. Digitize refers to obtaining all relevant
outputs as digital data that can be processed by computers, and being able to
operate relevant controls over digital protocols (such as the internet). In
this paper, we focus primarily on obtaining the outputs.

We propose a system which uses an end-to-end vision pipeline that can scan
readouts and translate them into data. Then, the data can be streamed to
virtual simulations which will react exactly as the real life equipment does.

== Requirements

Our end-to-end pipeline will consist of a component to locate the desired
instrument in the image and determine the corrections needed to transform the
image into a point of view where it is directly visible. This may be a neural
network based model that identifies a key fiducial from which we can
extrapolate the perspective transforms needed to bring the image to a
normalized state (here normalized refers to a flattened 2D image that can be
easily analyzed by computer vision).

We then extrapolate from that data to map out all of the various points of
interest. From that point, we can run specialized models on readouts such as
dials to determine their readings.

= The state of the art

We first