New York Tandon researcher is developing a mathematical system for determining the effects between geographic events, which makes small data sets as effective as large data to determine spatial dependencies.
Identification and dissemination of human migration due to climate change COVID-19, agricultural trends, and socioeconomic problems in neighboring regions depend on information – the more complex the model, the more information is needed to understand such widespread phenomena. However, obtaining reliable data is often expensive and difficult, or very rare to make accurate predictions.
Maurizio Porfiri, a professor at the Institute of Mechanics and Aerospace, Biomedical and Civil and Urban Engineering and a member of the Center for Urban Planning and Development (CUSP) at the New York Tandon School of Engineering, has found a new solution based on network and information theory. normally applied to spatial processes by mathematical techniques used for time series, “small data” move in large numbers.
Research, “An information-theoretical approach to studying spatial dependencies in small data sets”, Proceedings of the Royal Society A: Mathematics, Physics and Engineering Sciences, explains how observers can give strong results from areas of influence from a small attribute sample in a limited number of locations, including interpolation to intermediate areas or remote areas that share similar key features.
“Often data sets are weak,” Porphyry explained. “Therefore, we have taken a very basic approach by applying data theory to investigate whether temporary effects are spreading into space, and this allows us to work with a very small set of data between 25 and 50 observations,” he said. “Not on the basis of cause and effect, but on the basis of the interaction between individual points – we take a snapshot of the data and see if there is a kind of foundation, a collective response in the system.”
The method, developed by Porphyry and Manuel Ruiz Marin of the Department of Quantitative Methods, Law and Modern Languages at the Technical University of Cartagena, Spain, is as follows:
- To combine a set of data into a small, acceptable sign, similar to a machine learning system’s method of identifying a face with limited pixel information: the jaw, cheekbones, forehead, and so on.
- Uncertainty in another place to establish relationships between events and learn that uncertainty in one place is reduced by applying the information-theory principle to create a test that is not parametric (does not accept any basic model for interaction between places).
Porphyry explained that since a non-parametric approach does not create any basic structure for interactions between nodes, it provides flexibility in how nodes can be linked and even how the concept of neighbor is defined.
“Because we abstract this concept of neighbor, we can define it in the context of the quality you want, for example, ideology. Ideologically, California may be a neighbor of New York, even if they are not geographically located. They can share similar values. ”
The command system approved a two-pronged study of population migration in Bangladesh due to rising sea levels and the death of motor vehicles to give a statistically fundamental view of the mechanisms of a significant socio-economic problem.
“In the first case, we wanted to see if the migration between places could be predicted from a geographical distance or the severity of the flooding in that area – which area is closest to another area or the level of flooding would help. Predict the size of the migration, ”said Ruiz Marín.
For the second case, looking at the spatial distribution of alcohol-related car accidents in 1980, 1994, and 2009, countries with high rates of such accidents were compared with neighboring states and states with similar legislative ideologies on drinking and driving.
“We have found a stronger link between states that share borders and states that share legislative ideologies related to alcohol consumption and driving.”
Porphyry and Ruiz Marin then plan to expand their methods to analyzing spatial-temporal processes such as gun violence in the United States – a major research project recently funded by the National Science Foundation’s LEAP HI program or epileptic seizures in the brain. . Their work can help them understand when and where gun violence can occur or when seizures can begin.
Reference: Maurizio Porphyry and Manuel Ruiz Marin, October 21, 2020, “Information-theoretical approach to the study of spatial dependencies in small data sets” Royal Society A: Mathematics, Physics and Engineering Sciences.
DOI: 10.1098 / rspa.2020.0113
The research is supported by the National Science Foundation and the Perfect Groups of the Murcia Region and the Fundación Séneca, Science and Technology Agency.