publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- The Aleph: Decoding DNS PTR Records With Large Language ModelsKedar Thiagarajan , Esteban Carisimo , and Fabián E. BustamanteIn ACM CoNEXT , Dec 2025
Geolocating network devices is essential for various research areas. Yet, despite notable advancements, it continues to be one of the most challenging issues for experimentalists. An approach for geolocating that has proved effective is leveraging geolocating hints in PTR records associated with network devices. We argue that Large Language Models (LLMs), rather than humans, are better equipped to identify patterns in DNS PTR of tools like Hoiho. We introduce an approach that leverages LLMs to classify PTR records, and generate regular expressions for these classes, and hint-to-location mapping. We present preliminary results showing the applicability of using LLMs as a scalable approach to leverage PTR records for infrastructure geolocation.
2024
- DarkSim: A similarity-based time-series analytic framework for darknet trafficMax Gao , Ricky P. K. Mok , Esteban Carisimo , and 3 more authorsIn ACM IMC , Nov 2024
Network Telescopes, often referred to as darknets, capture unsolicited traffic directed toward advertised but unused IP spaces, enabling researchers and operators to monitor malicious, Internet-wide network phenomena such as vulnerability scanning, botnet propagation, and DoS backscatter. Detecting these events, however, has become increasingly challenging due to the growing traffic volumes that telescopes receive. To address this, we introduce DarkSim, a novel analytic framework that utilizes Dynamic Time Warping to measure similarities within the high-dimensional time series of network traffic. DarkSim combines traditional raw packet processing with statistical approaches, identifying traffic anomalies and enabling rapid time-to-insight. We evaluate our framework against DarkGLASSO, an existing method based on the Graphical LASSO algorithm, using data from the UCSD Network Telescope. Based on our manually classified detections, DarkSim showcased perfect precision and an overlap of up to 91% of DarkGLASSO’s detections in contrast to DarkGLASSO’s maximum of 73.3% precision and detection overlap of 37.5% with the former. We further demon- strate DarkSim’s capability to detect two real-world events in our case studies: (1) an increase in scanning activities surrounding CVE public disclosures, and (2) shifts in country- and network-level scanning patterns that indicate aggressive scanning. DarkSim provides a detailed and interpretable analysis framework for time-series anomalies, representing a new contribution to network security analytics.
- Of Choices and Control - A Comparative Analysis of Government HostingRashna Kumar , Esteban Carisimo , Lukas De Angelis Rivas , and 4 more authorsIn ACM IMC , Nov 2024
We present the first large-scale analysis of the adoption of third-party serving infrastructures in government digital services. Draw ing from data collected across 61 countries spanning every continent and region, capturing over 82% of the world’s Internet population, we examine the preferred hosting models for public-facing government sites and associated resources. Leveraging this dataset, we analyze government hosting strategies, cross-border dependencies, and the level of centralization in government web services. Among other findings, we show that governments predominantly rely on third-party infrastructure for data delivery, although this varies significantly, with even neighboring countries showing con- trasting patterns. Despite a preference for third-party hosting solutions, most government URLs in our study are served from domestic servers, although again with significant regional variation. Looking at overseas located servers, while the majority are found in North America and Western Europe, we note some interesting bilateral relationships (e.g., with 79% of Mexico’s government URLs being served from the US, and 26% of China’s government URLs from Japan). This research contributes to understanding the evolving landscape of serving infrastructures in the government sector, and the choices governments make between leveraging third-party solutions and maintaining control over users’ access to their services and information.
- Ten years of the Venezuelan crisis - An Internet perspectiveEsteban Carisimo , Rashna Kumar , Caleb J. Wang , and 2 more authorsIn ACM SIGCOMM , Aug 2024
The Venezuelan crisis, unfolding over the past decade, has garnered international attention due to its impact on various sectors of civil society. While studies have extensively covered the crisis’s effects on public health, energy, and water management, this paper delves into a previously unexplored area - the impact on Venezuela’s Internet infrastructure. Amidst Venezuela’s multifaceted challenges, understanding the repercussions of this critical aspect of modern society becomes imperative for the country’s recovery. Leveraging measurements from various sources, we present a comprehensive view of the changes undergone by the Venezuelan network in the past decade. Our study reveals the significant impact of the crisis captured by different signals, including bandwidth stagnation, limited growth on network infrastructure growth, and high latency compared to the Latin American average. Beyond offering a new perspective on the Venezuelan crisis, our study can help inform attempts at devising strategies for its recovery.
- Network topology facilitates internet traffic control in autocraciesEda Keremoğlu , Nils B Weidmann , Alexander Gamero-Garrido , and 3 more authorsPNAS Nexus, Feb 2024
Recent years have seen an increase in governmental interference in digital communication. Most research on this topic has focused on the application level, studying how content is manipulated or removed on websites, blogs or social media. However, in order for governments to obtain and maintain control of digital data flows, they need to secure access to the network infrastructure at the level of Internet service providers. In this paper, we study how the network topology of the Internet varies across different political environments, distinguishing between control at the level of individual Internet users (access) and at a higher level in the hierarchy of network carriers (transit). Using a novel method to estimate the structure of the Internet from network measurements, we show that in autocratic countries, state-owned (rather than privately-owned) providers have a markedly higher degree of control over transit networks. We also show that state-owned Internet providers often provide Internet access abroad, with a clear focus on other autocratic countries. Together, these results suggest that in autocracies, the network infrastructure is organized in a way that is more susceptible to the monitoring and manipulation of Internet data flows by state-owned providers both domestically and abroad.
- A hop away from everywhere: A view of the intercontinental long-haul infrastructureEsteban Carisimo , Caleb J. Wang , Mia Weaver , and 2 more authorsIn Proc. ACM Meas. Anal. Comput. Syst. , Feb 2024
We present a longitudinal study of intercontinental long-haul links (LHLs) – links with latencies significantly higher than that of all other links in a traceroute path. Our study is motivated by the recognition of these LHLs as a network-layer manifestation of critical transoceanic undersea cables. We present a methodology and associated processing system for identifying long-haul links in traceroute measurements. We apply this system to a large corpus of traceroute data and report on multiple aspects of long haul connectivity including country-level prevalence, routers as international gateways, preferred long-haul destinations, and the evolution of these characteristics over a 7 year period. We identify 85,620 layer-3 links (out of 2.7M links in a large traceroute dataset) that satisfy our definition for intercontinental long haul with many of them terminating in a relatively small number of nodes. An analysis of connected components shows a clearly dominant component with a relative size that remains stable despite a significant growth of the long-haul infrastructure.
2023
- Destination Unreachable: Characterizing Internet Outages and ShutdownsZachary Bischof , Kennedy Pitcher , Esteban Carisimo , and 6 more authorsIn ACM SIGCOMM , Feb 2023
In this paper, we provide the first comprehensive longitudinal analysis of government-ordered Internet shutdowns and spontaneous outages (i.e., disruptions not ordered by the government). We describe the available tools, data sources and methods to identify and analyze Internet shutdowns. We then merge manually curated datasets on known government-ordered shutdowns and large-scale Internet outages, further augmenting them with data on real-world events, macroeconomic and sociopolitical indicators, and network operator statistics. Our analysis confirms previous findings on the economic and political profiles of countries with government-ordered shutdowns. Extending this analysis, we find that countries with national-scale spontaneous outages often have profiles similar to countries with shutdowns, differing from countries that experience neither. However, we find that government-ordered shutdowns are many more times likely to occur on days of mobilization, coinciding with elections, protests, and coups. Our study also characterizes the temporal characteristics of Internet shutdowns and finds that they differ significantly in terms of duration, recurrence interval, and start times when compared to spontaneous outages
- A hop away from everywhere: A view of the intercontinental long-haul infrastructureEsteban Carisimo , Mia Weaver , Paul Barford , and 1 more authorIn arXiv , Feb 2023
Over the past two decades, a desire to reduce transit cost, improve control over routing and performance, and enhance the quality of experience for users, has yielded a more densely connected, flat network with fewer hops between sources and destinations. The shortening of paths in terms of the number of hops or links has also meant, for what is at the end an infrastructure-bound network, the lengthening of many of these links. In this paper, we focus on an important aspect of the evolving logical connectivity of the Internet that has received little attention to date: intercontinental long-haul links. We develop a methodology and associated processing system for identifying long haul links in traceroute measurements. We apply this system to a large corpus of traceroute data and report on multiple aspects of long haul connectivity including country-level prevalence, routers as international gateways, preferred long-haul destinations, and the evolution of these characteristics over a 7 year period. We identify over 9K layer 3 links that satisfy our definition for intercontinental long haul with many of them terminating in a relatively small number of nodes. An analysis of connected components shows a clearly dominant one with a relative size that remains stable despite a significant growth of the long-haul infrastructure.
- as2org+: Enriching AS-to-Organization Mappings with PeeringDBAugusto Arturi , Esteban Carisimo , and Fabián E. BustamanteIn Passive and Active Measurement , Feb 2023
An organization-level topology of the Internet is a valuable resource with uses that range from the study of organizations’ footprints and Internet centralization trends, to analysis of the dynamics of the Internet’s corporate structures as result of (de)mergers and acquisitions. Current approaches to infer this topology rely exclusively on WHOIS databases and are thus impacted by its limitations, including errors and outdated data. We argue that a collaborative, operator-oriented database such as PeeringDB can bring a complementary perspective from the legally-bounded information available in WHOIS records. We present }}as2org+}}as2org+, a new framework that leverages self-reported information available on PeeringDB to boost the state-of-the-art WHOIS-based methodologies. We discuss the challenges and opportunities with using PeeringDB records for AS-to-organization mappings, present the design of }}as2org+}}as2org+and demonstrate its value identifying companies operating in multiple continents and mergers and acquisitions over a five-year period.
2022
- Jitterbug: A new framework for jitter-based congestion inferenceEsteban Carisimo , Ricky K. P. Mok , David D. Clark , and 1 more authorIn Passive and Active Measurement , Feb 2022
We investigate a novel approach to the use of jitter to infer network congestion using data collected by probes in access networks. We discovered a set of features in jitter and jitter dispersion —a jitter- derived time series we define in this paper— time series that are char- acteristic of periods of congestion. We leverage these concepts to create a jitter-based congestion inference framework that we call Jitterbug. We apply Jitterbug’s capabilities to a wide range of traffic scenarios and discover that Jitterbug can correctly identify both recurrent and one-off congestion events. We validate Jitterbug inferences against state-of-the- art autocorrelation-based inferences of recurrent congestion. We find that the two approaches have strong congruity in their inferences, but Jitter- bug holds promise for detecting one-off as well as recurrent congestion. We identify several future directions for this research including lever- aging ML/AI techniques to optimize performance and accuracy of this approach in operational settings.
- Quantifying Nations’ Exposure to Traffic Observation and Selective TamperingAlexander Gamero-Garrido , Esteban Carisimo , Shuai Hao , and 3 more authorsIn Passive and Active Measurement , Feb 2022
Almost all popular Internet services are hosted in a select set of countries, forcing other nations to rely on international connectivity to access them. We identify nations where traffic towards a large portion of the country is serviced by a small number of Autonomous Systems, and, therefore, may be exposed to observation or selective tampering by these ASes. We introduce the Country-level Transit Influence (CTI) metric to quantify the significance of a given AS on the international transit service of a particular country. By studying the CTI values for the top ASes in each country, we find that 34 nations have transit ecosystems that render them particularly exposed, where a single AS is privy to traffic destined to over 40% of their IP addresses. In the nations where we are able to validate our findings with in-country operators, our top- five ASes are 90% accurate on average. In the countries we examine, CTI reveals two classes of networks frequently play a particularly prominent role: submarine cable operators and state-owned ASes.
2021
- Identifying ASes of State-Owned Internet OperatorsEsteban Carisimo , Alexander Gamero-Garrido , Alex C. Snoeren , and 1 more authorIn ACM Internet Measurement Conference (IMC) , Feb 2021
In this paper we present and apply a methodology to accurately identify state-owned Internet operators worldwide and their Autonomous System Numbers (ASNs). Obtaining an accurate dataset of ASNs of state-owned Internet operators enables studies where state ownership is an important dimension, including research related to Internet censorship and surveillance, cyber-warfare and international relations, ICT development and digital divide, critical infrastructure protection, and public policy. Our approach is based on a multi-stage, in-depth manual analysis of datasets that are highly diverse in nature. We find that each of these datasets contributes in different ways to the classification process and we identify limitations and shortcomings of these data sources. We obtain the first data set of this type, make it available to the research community together with the several lessons we learned in the process, and perform a preliminary analysis based on our data. We find that 53% (i.e., 123) of the world’s countries are majority owners of Internet operators, highlighting that this is a widespread phenomenon. We also find and document the existence of subsidiaries of state-owned governments operating in foreign countries, an aspect that touches every continent and particularly affects Africa. We hope that this work and the associated data set will inspire and enable a broad set of Internet measurement studies and interdisciplinary research.
2020
- A First Look at the Latin American IXPsEsteban Carisimo , Julián M. Del Fiore , Diego Dujovne , and 2 more authorsSIGCOMM Comput. Commun. Rev., Mar 2020
We investigated Internet eXchange Points (IXPs) deployed across Latin America. We discovered that many Latin American states have been actively involved in the development of their IXPs. We further found a correlation between the success of a national IXP and the absence of local monopolistic ASes that concentrate the country’s IPv4 address space. In particular, three IXPs have been able to gain local traction: IX.br-SP, CABASE-BUE and PIT Chile-SCL. We further compared these larger IXPs with others outside Latin America. We found that, in developing regions, IXPs have had a similar growth in the last years and are mainly populated by regional ASes. The latter point clearly contrasts with more internationally re-known European IXPs whose members span multiple regions.
2019
- Studying the evolution of content providers in IPv4 and IPv6 internet coresEsteban Carisimo , Carlos Selmo , J. Ignacio Alvarez-Hamelin , and 1 more authorComputer Communications, Mar 2019
There is recent evidence that the core of the Internet, which was formerly dominated by large transit providers, has been reshaped after the transition to a multimedia-oriented network, first by general-purpose CDNs and now by private CDNs. In this work we use k-cores, an element of graph theory, to define which ASes compose the core of the Internet and to track the evolution of the core since 1999. Specifically, we investigate whether large players in the Internet content and CDN ecosystem belong to the core and, if so, since when. In addition, we examine differences between the IPv4 and IPv6 cores. We further investigate regional differences in the evolution of large content providers. Finally, we show that the core of the Internet has incorporated an increasing number of content ASes in recent years. To enable reproducibility of this work, we provide a website to allow interactive analysis of our datasets to detect, for example, “up and coming” ASes using customized queries.
2018
2016
- Hidden Internet Topologies Info: Truth or Myth?Sofı́a Silva Berenguer , Esteban Carisimo , J. Ignacio Alvarez-Hamelin , and 1 more authorIn Proceedings of the 2016 Workshop on Fostering Latin-American Research in Data Communication Networks , Mar 2016
Mapping projects usually get information from several routing data collectors or vantage points. The accuracy of maps relies on the amount and location of these collectors, which are usually located near the backbone or at large developed regions, such as ARIN’s or RIPE NCC’s. The lack of vantage points in Latin America makes these maps not really show the current actual status of the network in this region. For this reason, in this work we have added data from some local sources and measured how much information was missing without them.