100 lines
4 KiB
Text
100 lines
4 KiB
Text
|
#import "@preview/bloated-neurips:0.5.1": (
|
||
|
botrule,
|
||
|
midrule,
|
||
|
neurips2024,
|
||
|
paragraph,
|
||
|
toprule,
|
||
|
url,
|
||
|
)
|
||
|
#import "./logo.typ": LaTeX, LaTeXe, TeX
|
||
|
|
||
|
#let affls = (
|
||
|
ucsb: (
|
||
|
// department: "AI Center",
|
||
|
institution: "University of California, Santa Barbara",
|
||
|
country: "United States",
|
||
|
),
|
||
|
)
|
||
|
|
||
|
#let authors = (
|
||
|
(
|
||
|
name: "Youwen Wu",
|
||
|
affl: "ucsb",
|
||
|
email: "youwen@ucsb.edu",
|
||
|
equal: true,
|
||
|
),
|
||
|
)
|
||
|
|
||
|
#show: neurips2024.with(
|
||
|
title: [Towards More Accessible Scientific Infrastructure: A Neural Vision Pipeline to Interface with Experiments],
|
||
|
authors: (authors, affls),
|
||
|
keywords: ("Machine Learning", "NeurIPS"),
|
||
|
abstract: [
|
||
|
Scientific instruments are often designed to be operated by humans. As
|
||
|
such, they are outfitted with analog dials and controls which are difficult
|
||
|
for machines to understand. In order to ameliorate the inaccessibility of
|
||
|
experimental equipment in fundamental disciplines such as quantum physics,
|
||
|
we seek a systematic approach to processing existing _analog systems_ into
|
||
|
_digital data_ without invasively augmenting them with sensors. In this
|
||
|
paper, we explore the state of the art in computer vision and their
|
||
|
applications in analyzing experimental instruments through a purely vision
|
||
|
based approach. We train a convolutional neural network to triangulate
|
||
|
visual fiducials and construct a pipeline to apply perspective warp based
|
||
|
corrections to normalize images of measurements. We end by designing
|
||
|
_Dendrite_, an end-to-end vision pipeline that can obtain detailed
|
||
|
digital readings from a video stream of an analog instrument.
|
||
|
],
|
||
|
bibliography: bibliography("main.bib"),
|
||
|
bibliography-opts: (title: none, full: true), // Only for example paper.
|
||
|
appendix: [
|
||
|
#include "appendix.typ"
|
||
|
#include "checklist.typ"
|
||
|
],
|
||
|
accepted: true,
|
||
|
)
|
||
|
|
||
|
= Introduction
|
||
|
|
||
|
The rise of online resources in scientific pedagogy has become increasingly
|
||
|
prevalent. Around the world, students use virtual labs that simulate physical
|
||
|
phenomena. However, still lacking is the accessibility of real world hardware
|
||
|
to obtain real results. Experimental instruments are expensive and difficult to
|
||
|
justify for many schools and institutions. One solution to this problem is to
|
||
|
provide shared equipment that is accessible and controlled over the internet.
|
||
|
This allows equipment located in a single place to be used from anywhere in the
|
||
|
world.
|
||
|
|
||
|
One way to build these systems is to augment existing devices with the
|
||
|
capability to be controlled over the internet. However, many scientific
|
||
|
instruments are designed with human operation in mind and contain many analog
|
||
|
dials, readouts, and controls. We seek a way to non-invasively digitize these
|
||
|
devices. Here non-invasively means that we should not perform any irreversible
|
||
|
or drastic changes to the hardware. Digitize refers to obtaining all relevant
|
||
|
outputs as digital data that can be processed by computers, and being able to
|
||
|
operate relevant controls over digital protocols (such as the internet). In
|
||
|
this paper, we focus primarily on obtaining the outputs.
|
||
|
|
||
|
We propose a system which uses an end-to-end vision pipeline that can scan
|
||
|
readouts and translate them into data. Then, the data can be streamed to
|
||
|
virtual simulations which will react exactly as the real life equipment does.
|
||
|
|
||
|
== Requirements
|
||
|
|
||
|
Our end-to-end pipeline will consist of a component to locate the desired
|
||
|
instrument in the image and determine the corrections needed to transform the
|
||
|
image into a point of view where it is directly visible. This may be a neural
|
||
|
network based model that identifies a key fiducial from which we can
|
||
|
extrapolate the perspective transforms needed to bring the image to a
|
||
|
normalized state (here normalized refers to a flattened 2D image that can be
|
||
|
easily analyzed by computer vision).
|
||
|
|
||
|
We then extrapolate from that data to map out all of the various points of
|
||
|
interest. From that point, we can run specialized models on readouts such as
|
||
|
dials to determine their readings.
|
||
|
|
||
|
= The state of the art
|
||
|
|
||
|
We first
|
||
|
|
||
|
|