Setting up a machine vision camera for object detection
Cameras are everywhere these days, and with good reason.
It’s never been so easy or cheap to incorporate one into an IoT device, or leverage image-based data in a user experience, and image inference can unlock product and service possibilities that would’ve been unimaginable a few years ago.
For developers, one of the biggest challenges of bringing image inference into a project is often identifying and setting up the right camera. For smartphone apps, this is less of an issue—if you’re developing for an iPhone, you know what camera you’ll be using, and there’s a robust ecosystem of software tools for doing so. But for IoT devices, augmented reality, or any other non-smartphone project that depends on object detection and tracking, the options are practically endless. This article will help you get started.
One complicating factor is the lag between hardware and software. If you’re writing code for a project that uses image inference, you may find yourself needing good-quality image-based data now, but the camera(s) that the product eventually uses won’t be ready for months. In this case, it’s worth investing a bit of money and time into setting up a camera that’s designed specifically for machine vision.
A machine vision camera (like the Allied Vision Alvium 1800 U-120c we use in the example below) has several advantages over a webcam or smartphone camera. They’re far more configurable, for one thing: where a webcam might have a few settings, machine vision cameras can have over 100, offering granular control over every conceivable imaging sensor parameter. They provide high-resolution data with minimal fuss, and won’t try to automatically adjust color balance or image compression, as web- and smartphone cameras often do. They also come with a GPIO interface that enables things like image capture synchronization for multi-camera setups, and precisely timed lighting.
Last but not least, machine cameras come with technical support, which can be critical if you’re in the middle of a week-long sprint and need to solve a problem immediately. We like Allied Vision in particular for their friendly, knowledgeable staff and tendency to respond in just a few hours.
Moreover, machine vision cameras aren’t terribly expensive or complicated to use. Working with the technology team at Smart, I’ve set up and run several such cameras on AR and IoT projects, using just a few hundred dollars worth of hardware. To show you how straightforward this process is, I’ve listed out the details of such a system, including hardware, drivers, and operating environment. Once we’ve completed the setup process, I’ll also go through a simple exercise for having the system perform a quick YOLO (“You Only Look Once”) inference on an image.
Set up
For this example, we’ll be using the following hardware:
Camera
Alvium 1800 U-120c from Allied Vision
Lens
8mm f/1.8 C-Mount from Edmund Optics
Processor
UDOO V8 SBC (or any x64 Ubuntu 20.04 machine with GPU)
Environment
Jupyter Notebook
MatPlotLib
numpy
PyTorch (Includes Yolo5 support)
Vimba Python – A camera interface library for Python, provided by Allied Vision
The setup process looks like this:
This is a handy utility to get your camera up and running. It also provides some controls on the right, if you want to test things out really quickly.
We are going to be using a virtual environment. You can create one using virtualenv or conda. We use pyenv to manage Python version on our development machines, so the Python we will be using for this is located in:
~/.pyenv/viersions/3.8.0/bin/python
Note that if Python is setup differently on your machine, you can replace this with an appropriate command.
$ ~/.pyenv/viersions/3.8.0/bin/python -m venv ~/.venvs/python380
The above command will execute a venv module and create a virtual environment which will be located in:
~/.venvs/Python380
Now we can activate the environment by executing the following:
$ source ./venvs/Python380/bin/activate
There are other tools for working with virtual environments. We prefer the above method for linux machines, especially when working with Docker. On MacOS and Windows, minoconda3 can be a good tool for this purpose.
7. Now install the Vimba Python module into your virtual environment.
$ pip install /opt/Vimba_6_0/VimbaPython/Source/xxx
8. Check if the library is installed.
$ Python
$ >>> import vimba
$ >>>
9. We can see that Vimba installed successfully, so we are ready to go.
Exercise
1. Start by importing the Vimba library that we just installed.
from vimba import *
2. We will need numpy too, since image data is stored using numpy arrays.
import numpy as np
3. Last is the opencv library for image manipulation.
import cv2
4. You’ll also need to import pytorch so you can run a quick inference test at the end.
import torch
Next, we will do a quick investigation to see what features of the camera are available. The Vimba library allows us to list all of the camera’s available settings. Keep in mind that these could be different depending on which model camera you have.
Vimba is written in a very Python-ic way, so all operations on the camera are done using context manager. This means that you as a user don’t have to fuss with opening and closing the camera manually every time you use it.
5. Start by getting an instance of Vimba module:
with Vimba.get_instance() as vimba:
cams = vimba.get_all_cameras()
7.
with cams[0] as cam:
for feature in cam.get_all_features():
try:
value = feature.get()
except (AttributeError, VimbaFeatureError)
value = None
8. Next, print the features and values depending on their availability.
print(f"Feature name: {feature.get_name()}")
print(f"Display name: {feature.get_display_name()}")
if not value == None:
if not feature.get_unit() == '':
print(f"Unit: {feature.get_unit()}", end = ' ')
print(f"value={value}")
else:
print(f"Not set")
print("--------------------------------------------")
Here is just a few lines:
Found 1 camera(s)
--------------------------------------------
Feature name: AcquisitionFrameCount
Display name: Acquisition Frame Count
Not set
--------------------------------------------
Feature name: AcquisitionFrameRate
Display name: Acquisition Frame Rate
Unit: Hz value=40.9975471496582
--------------------------------------------
Feature name: AcquisitionFrameRateEnable
Display name: Acquisition Frame Rate Enable
value=False
.
.
.
with Vimba.get_instance() as vimba:
cams = vimba.get_all_cameras()
with cams[0] as cam:
formats = cam.get_pixel_formats()
opencv_formats = intersect_pixel_formats(formats, OPENCV_PIXEL_FORMATS)
print(f"Available formats:")
for i, format in enumerate(formats):
print(i, format)
print(f"\nOpencv compatible formats:")
for i, format in enumerate(opencv_formats):
print(i, format)
Available formats:
0 Mono8
1 Mono10
2 Mono10p
3 Mono12
4 Mono12p
5 BayerGR8
6 BayerGR10
7 BayerGR12
8 BayerGR10p
9 BayerGR12p
10 Rgb8
11 Bgr8
12 YCbCr411_8_CbYYCrYY
13 YCbCr422_8_CbYCrY
14 YCbCr8_CbYCr
Opencv compatible formats:
0 Mono8
1 Bgr8
with Vimba.get_instance() as vimba:
cams = vimba.get_all_cameras()
cam.set_pixel_format(opencv_formats[1])
with Vimba.get_instance() as vimba:
cams = vimba.get_all_cameras()
with cams[0] as cam:
# Will also set expeosure to 20000us i.e. 20 milliseconds
exposure_time = cam.ExposureTime
exposure_time.set(2000)
print(f"Exposure changed to: {(exposure_time.get()/1000):.0f} ms")
with Vimba.get_instance() as vimba :
cams = vimba.get_all_cameras()
with cams[0] as cam:
frame = cam.get_frame().as_opencv_image()
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
show_image_with_histogram(rgb, 5)
17. Let’s reset it with a slightly longer exposure…
with Vimba.get_instance() as vimba:
cams = vimba.get_all_cameras()
with cams[0] as cam:
# Will also set exposure to 20000us i.e. 20 milliseconds
exposure_time = cam.ExposureTime
exposure_time.set(12000)
print(f"Exposure changed to: {(exposure_time.get()/1000):.0f} ms")
18. …and then run image capture again.
with Vimba.get_instance() as vimba :
cams = vimba.get_all_cameras()
with cams[0] as cam:
# We can now aquire single frame from camera and have it changed to
# opencv compatible format i.e. a properly shaped numpy array.
frame = cam.get_frame().as_opencv_image()
# Change image format to RGB for display
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
show_image_with_histogram(rgb, 5)
The possibilities of machine vision and image inference are enormous, and as you can see, it’s surprisingly straightforward to set up. So whether you’re a veteran developer or someone just getting started in tech, this is a capability that’s accessible to you.
A relatively inexpensive camera and some readily available libraries can let you pull usable data from just about anything on earth that you can point a camera at. The bigger challenge is what comes next: to turn all that data into a product or user experience that’s genuinely useful.
About Boris Kontorovich