Machine learning is the process of using computers to detect patterns in massive data sets and then make predictions based on what the computer learns from those patterns. This makes machine learning a specific and narrow type of artificial intelligence. Comprehensive artificial intelligence involves machines that can perform abilities that we associate with the minds of intelligent humans and animals, such as perceiving, learning, and problem-solving.
All machine learning is based on algorithms. In general, algorithms are specific sets of instructions that a computer uses to solve problems. In machine learning, algorithms are rules for analyzing data using statistics. Machine learning systems use these rules to identify relationships between data inputs and desired outputs, usually predictions. For starters, scientists provide machine learning systems with a set of training data. The systems apply their algorithms to this data to practice analyzing similar inputs they will receive in the future.
One of the areas where machine learning holds great promise is in the detection of cancer using computed tomography (CT) imaging. First, the researchers assemble as many CT images as possible to use as training data. Some of these images show tissue with cancer cells and some show healthy tissue. Researchers are also gathering information on what to look for in a picture to identify cancer. For example, this could include what the boundaries of cancerous tumors look like. Then, they create rules about the relationship between the data in the images and what doctors know about identifying cancer. Then they feed these rules and the training data to the machine learning system. The system uses the rules and training data to learn how to recognize cancerous tissue. Finally, the system obtains the CT images of a new patient. Using what it’s learned, the system decides which images show signs of cancer, faster than any human. Doctors could use the system’s predictions to help decide if a patient has cancer and how to treat it.
The way training data is configured divides machine learning systems into two broad types: supervised and unsupervised. If the training data is tagged, the system is supervised. Labeled data tells the system what the data is. For example, CT images could be tagged to indicate cancerous lesions or tumors next to healthy tissue. Basically, this means that the machine learning system learns by example. Labeling data can be time consuming for the large amounts of data required for training data sets.
If the training data is not labeled, the machine learning system is not supervised. In the cancer scan example, an unsupervised machine learning system would receive a large number of CT scans and information about tumor types and then be left on its own to learn what to look for. to recognize cancer. This frees humans from the need to label data used in the training process. The downside to unsupervised learning is that the results may not be as accurate due to the lack of explicit labels.
Some machine learning systems may improve their capabilities based on feedback received on predictions. This is called reinforcement machine learning systems. For example, the system could be informed of the results of other tests performed by doctors to find out whether or not patients have cancer. The system could then fine-tune its algorithms to produce more accurate predictions in the future.
- The newest of the DOE supercomputers, the Summit at Oak Ridge National Laboratory, has an architecture particularly well suited to artificial intelligence applications.
- Machine learning allows scientists to analyze amounts of data that were previously inaccessible.
- DOE-funded researchers used machine learning to develop a new cancer screening, better understand the properties of water, and run experiments independently.
- Physics-based machine learning uses deep neural networks that can be trained to incorporate specific laws of physics to solve supervised learning tasks and scientific problems.
- Machine learning algorithms are not a silver bullet. The development of machine learning systems is susceptible to human error and bias, and requires the same careful design as software engineering.
DOE Science Office: Contributions to Machine Learning
The Department of Energy Office of Science supports machine learning research through its Advanced Scientific Computing Research (ASCR) program. ASCR has a portfolio of data management, data analytics, computer technology, and related research that all contribute to machine learning and artificial intelligence. As part of this portfolio, DOE owns some of the best performing supercomputers in the world.
The DOE Office of Science as a whole is committed to using machine learning to support scientific research. Science depends on big data, and Office of Science user facilities like particle accelerators and x-ray light sources generate mountains of it. Through machine learning, researchers are identifying patterns or designs in the data from these facilities that are difficult or impossible for humans to detect, at speeds that are hundreds to thousands of times faster than digital techniques. traditional data analysis.