alexandria/2024/documents/by-name/vision-whitepaper/main.typ

#import "@preview/bloated-neurips:0.5.1": (
  botrule,
  midrule,
  neurips2024,
  paragraph,
  toprule,
  url,
)
#import "./logo.typ": LaTeX, LaTeXe, TeX

#let affls = (
  ucsb: (
    // department: "AI Center",
    institution: "University of California, Santa Barbara",
    country: "United States",
  ),
)

#let authors = (
  (
    name: "Youwen Wu",
    affl: "ucsb",
    email: "youwen@ucsb.edu",
    equal: true,
  ),
)

#show: neurips2024.with(
  title: [Towards More Accessible Scientific Infrastructure: A Neural Vision Pipeline to Interface with Experiments],
  authors: (authors, affls),
  keywords: ("Machine Learning", "NeurIPS"),
  abstract: [
    Scientific instruments are often designed to be operated by humans. As
    such, they are outfitted with analog dials and controls which are difficult
    for machines to understand. In order to ameliorate the inaccessibility of
    experimental equipment in fundamental disciplines such as quantum physics,
    we seek a systematic approach to processing existing _analog systems_ into
    _digital data_ without invasively augmenting them with sensors. In this
    paper, we explore the state of the art in computer vision and their
    applications in analyzing experimental instruments through a purely vision
    based approach. We train a convolutional neural network to triangulate
    visual fiducials and construct a pipeline to apply perspective warp based
    corrections to normalize images of measurements. We end by designing
    _Dendrite_, an end-to-end vision pipeline that can obtain detailed
    digital readings from a video stream of an analog instrument.
  ],
  bibliography: bibliography("main.bib"),
  bibliography-opts: (title: none, full: true),  // Only for example paper.
  appendix: [
    #include "appendix.typ"
    #include "checklist.typ"
  ],
  accepted: true,
)

= Introduction

The rise of online resources in scientific pedagogy has become increasingly
prevalent. Around the world, students use virtual labs that simulate physical
phenomena. However, still lacking is the accessibility of real world hardware
to obtain real results. Experimental instruments are expensive and difficult to
justify for many schools and institutions. One solution to this problem is to
provide shared equipment that is accessible and controlled over the internet.
This allows equipment located in a single place to be used from anywhere in the
world.

One way to build these systems is to augment existing devices with the
capability to be controlled over the internet. However, many scientific
instruments are designed with human operation in mind and contain many analog
dials, readouts, and controls. We seek a way to non-invasively digitize these
devices. Here non-invasively means that we should not perform any irreversible
or drastic changes to the hardware. Digitize refers to obtaining all relevant
outputs as digital data that can be processed by computers, and being able to
operate relevant controls over digital protocols (such as the internet). In
this paper, we focus primarily on obtaining the outputs.

We propose a system which uses an end-to-end vision pipeline that can scan
readouts and translate them into data. Then, the data can be streamed to
virtual simulations which will react exactly as the real life equipment does.

== Requirements

Our end-to-end pipeline will consist of a component to locate the desired
instrument in the image and determine the corrections needed to transform the
image into a point of view where it is directly visible. This may be a neural
network based model that identifies a key fiducial from which we can
extrapolate the perspective transforms needed to bring the image to a
normalized state (here normalized refers to a flattened 2D image that can be
easily analyzed by computer vision).

We then extrapolate from that data to map out all of the various points of
interest. From that point, we can run specialized models on readouts such as
dials to determine their readings.

= The state of the art

We first
udpate 2024-11-20 14:51:30 -08:00			`#import "@preview/bloated-neurips:0.5.1": (`
			`botrule,`
			`midrule,`
			`neurips2024,`
			`paragraph,`
			`toprule,`
			`url,`
			`)`
			`#import "./logo.typ": LaTeX, LaTeXe, TeX`

			`#let affls = (`
			`ucsb: (`
			`// department: "AI Center",`
			`institution: "University of California, Santa Barbara",`
			`country: "United States",`
			`),`
			`)`

			`#let authors = (`
			`(`
			`name: "Youwen Wu",`
			`affl: "ucsb",`
			`email: "youwen@ucsb.edu",`
			`equal: true,`
			`),`
			`)`

			`#show: neurips2024.with(`
			`title: [Towards More Accessible Scientific Infrastructure: A Neural Vision Pipeline to Interface with Experiments],`
			`authors: (authors, affls),`
			`keywords: ("Machine Learning", "NeurIPS"),`
			`abstract: [`
			`Scientific instruments are often designed to be operated by humans. As`
			`such, they are outfitted with analog dials and controls which are difficult`
			`for machines to understand. In order to ameliorate the inaccessibility of`
			`experimental equipment in fundamental disciplines such as quantum physics,`
			`we seek a systematic approach to processing existing _analog systems_ into`
			`_digital data_ without invasively augmenting them with sensors. In this`
			`paper, we explore the state of the art in computer vision and their`
			`applications in analyzing experimental instruments through a purely vision`
			`based approach. We train a convolutional neural network to triangulate`
			`visual fiducials and construct a pipeline to apply perspective warp based`
			`corrections to normalize images of measurements. We end by designing`
			`_Dendrite_, an end-to-end vision pipeline that can obtain detailed`
			`digital readings from a video stream of an analog instrument.`
			`],`
			`bibliography: bibliography("main.bib"),`
			`bibliography-opts: (title: none, full: true), // Only for example paper.`
			`appendix: [`
			`#include "appendix.typ"`
			`#include "checklist.typ"`
			`],`
			`accepted: true,`
			`)`

			`= Introduction`

			`The rise of online resources in scientific pedagogy has become increasingly`
			`prevalent. Around the world, students use virtual labs that simulate physical`
			`phenomena. However, still lacking is the accessibility of real world hardware`
			`to obtain real results. Experimental instruments are expensive and difficult to`
			`justify for many schools and institutions. One solution to this problem is to`
			`provide shared equipment that is accessible and controlled over the internet.`
			`This allows equipment located in a single place to be used from anywhere in the`
			`world.`

			`One way to build these systems is to augment existing devices with the`
			`capability to be controlled over the internet. However, many scientific`
			`instruments are designed with human operation in mind and contain many analog`
			`dials, readouts, and controls. We seek a way to non-invasively digitize these`
			`devices. Here non-invasively means that we should not perform any irreversible`
			`or drastic changes to the hardware. Digitize refers to obtaining all relevant`
			`outputs as digital data that can be processed by computers, and being able to`
			`operate relevant controls over digital protocols (such as the internet). In`
			`this paper, we focus primarily on obtaining the outputs.`

			`We propose a system which uses an end-to-end vision pipeline that can scan`
			`readouts and translate them into data. Then, the data can be streamed to`
			`virtual simulations which will react exactly as the real life equipment does.`

			`== Requirements`

			`Our end-to-end pipeline will consist of a component to locate the desired`
			`instrument in the image and determine the corrections needed to transform the`
			`image into a point of view where it is directly visible. This may be a neural`
			`network based model that identifies a key fiducial from which we can`
			`extrapolate the perspective transforms needed to bring the image to a`
			`normalized state (here normalized refers to a flattened 2D image that can be`
			`easily analyzed by computer vision).`

			`We then extrapolate from that data to map out all of the various points of`
			`interest. From that point, we can run specialized models on readouts such as`
			`dials to determine their readings.`

			`= The state of the art`

			`We first`