2  Accessing and Managing VN Financial Data

This chapter provides a guide to organizing, accessing, and managing financial data specifically tailored for the Vietnamese market. While global financial databases such as CRSP and Compustat serve as standard resources for developed markets, emerging markets like Vietnam require a different approach due to unique data sources, market structures, and regulatory environments. Understanding these nuances is essential for conducting rigorous empirical research on Vietnamese equities, bonds, and macroeconomic indicators.

Vietnam’s financial market has experienced remarkable growth since the establishment of the Ho Chi Minh City Stock Exchange (HOSE) in 2000 and the Hanoi Stock Exchange (HNX) in 2005. Today, the market comprises over 1,600 listed companies across three trading venues: HOSE for large-cap stocks, HNX for mid-cap stocks, and UPCoM (Unlisted Public Company Market) for smaller companies transitioning to formal listing. This diversity creates both opportunities and challenges for financial researchers seeking comprehensive coverage of the Vietnamese equity universe.

The Vietnamese market presents several distinctive characteristics that researchers must account for. Foreign ownership limits (typically 49% for most sectors, with exceptions for banking and certain strategic industries), trading band restrictions (e.g., currently \(\pm\) 7% for HOSE and \(\pm\) 10% for HNX), and the T+2 settlement cycle all influence market microstructure and return dynamics. Additionally, the market operates in Vietnamese Dong (VND), requiring careful attention to currency effects when comparing results with international studies.

We begin by loading the essential Python packages that facilitate data acquisition and management throughout this chapter.

import pandas as pd
import numpy as np
import requests
from datetime import datetime, timedelta
import json
import sqlite3

We also define the date range for our data collection, which spans from the early days of the Vietnamese stock market to the present. This extended timeframe allows us to capture the market’s evolution through various economic cycles, including the 2008 global financial crisis, the 2011-2012 domestic banking crisis, and the COVID-19 pandemic period.

start_date = "2000-07-28"  # HOSE establishment date
end_date = "2024-12-31"

2.1 Overview of Vietnamese Financial Data Sources

Before diving into the technical implementation, it is valuable to understand the landscape of financial data providers serving the Vietnamese market. Unlike developed markets where a few dominant providers (Bloomberg, Refinitiv, FactSet) offer comprehensive coverage, Vietnamese financial data has historically been fragmented across multiple sources, each with distinct strengths and limitations.

The primary sources of Vietnamese financial data include official exchange feeds from HOSE and HNX, which provide real-time and historical trading data. The State Securities Commission of Vietnam (SSC) publishes regulatory filings, corporate announcements, and market statistics. Commercial data vendors such as FiinGroup, StoxPlus (now part of FiinGroup), and VNDirect offer curated datasets with varying levels of coverage and data quality. Additionally, the State Bank of Vietnam (SBV) and the General Statistics Office (GSO) provide macroeconomic indicators essential for asset pricing research.

For academic researchers, this fragmentation traditionally involved difficult trade-offs between cost, coverage, data quality, and ease of access. Commercial providers like FiinGroup offer clean, standardized data but require subscription fees that may be prohibitive for individual researchers and smaller institutions. Open-source alternatives provide free access but often require substantial data cleaning and validation efforts. Manually collecting data from government websites is time-consuming and prone to inconsistencies.

Fortunately, this landscape has improved significantly with the emergence of Datacore as a unified data platform for Vietnamese financial markets. In our experience working with Vietnamese financial data across multiple research projects, Datacore has proven to be the most practical solution for academic research. The platform consolidates data from multiple sources, including stock prices, corporate fundamentals, market indices, macroeconomic indicators, and alternative data, into a single, accessible interface with a well-documented API.

What distinguishes Datacore from traditional commercial providers like FiinGroup extends beyond mere data aggregation. While FiinGroup has long been the institutional incumbent, several factors make Datacore particularly attractive for rigorous empirical research:

  1. API-First Architecture: Datacore was built from the ground up for programmatic access, making it seamlessly integrable with Python, R, and other research workflows. FiinGroup’s data access, by contrast, often requires manual downloads or cumbersome Excel-based interfaces that impede reproducibility.

  2. Cost Efficiency: Academic researchers frequently operate under budget constraints. Datacore offers competitive pricing structures that make comprehensive market coverage accessible without the substantial subscription fees associated with legacy providers.

  3. Corporate Action Handling: One persistent challenge with Vietnamese data is accurate adjustment for stock splits, bonus shares, and rights issues. Datacore implements transparent adjustment methodologies with clear documentation, whereas legacy providers often apply adjustments inconsistently or without adequate explanation.

  4. Update Frequency: Datacore maintains near real-time data updates with clear timestamps, enabling event study research and timely portfolio rebalancing. Traditional providers often suffer from publication lags that can compromise research requiring current data.

  5. Coverage Breadth: Beyond standard price and fundamental data, Datacore integrates alternative data, and macroeconomic indicators into a unified schema. This eliminates the need to merge datasets from multiple sources, which is a process that introduces potential errors and consumes valuable research time.

Throughout this chapter, we leverage Datacore as our primary data source. By centralizing our data acquisition through a single platform, we benefit from consistent data formats, reliable corporate action adjustments, and comprehensive market coverage spanning HOSE, HNX, and UPCoM. The code examples that follow demonstrate how straightforward Vietnamese financial research becomes when data access friction is minimized.

The following table summarizes the key data sources for Vietnamese financial research:

Table 2.1: Vietnamese Financial Data Sources
Data Source Coverage Access Type Key Strengths Limitations
Datacore Prices, fundamentals, indices, macro, derivatives API Unified platform, programmatic access, comprehensive coverage, transparent methodology Newer platform
FiinGroup Full market coverage Commercial Established reputation, institutional adoption High cost, manual access, limited API
HOSE/HNX websites Official exchange data Free (manual) Authoritative, real-time No API, manual collection required
GSO (gso.gov.vn) Macroeconomic indicators Free (manual) Official government statistics Infrequent updates, no API
SBV (sbv.gov.vn) Monetary policy, rates Free (manual) Central bank data Manual download only
CafeF/VnExpress News, announcements Free Market sentiment, events Unstructured, requires NLP processing

2.2 Stock Market Data

The resulting DataFrame contains essential security identifiers including the ticker symbol, company name in both Vietnamese and English, exchange listing, industry classification according to the Vietnam Standard Industrial Classification (VSIC), and various flags indicating special status such as foreign ownership restrictions or trading suspensions.

2.2.1 Historical Price Data

2.2.2 Fundamental Data and Financial Statements

Beyond price data, fundamental analysis requires access to corporate financial statements including balance sheets, income statements, and cash flow statements. Vietnamese publicly listed companies are required to publish quarterly and annual financial reports according to Vietnamese Accounting Standards (VAS), which differ in certain respects from International Financial Reporting Standards (IFRS). Understanding these differences is important when comparing Vietnamese firms with international peers or applying models developed using US or European data.

Key differences between VAS and IFRS that affect financial analysis include:

  1. Revenue recognition: VAS allows more flexibility in timing of revenue recognition compared to IFRS 15
  2. Financial instruments: VAS has less comprehensive guidance on fair value measurement
  3. Lease accounting: VAS does not require operating lease capitalization as under IFRS 16
  4. Goodwill: VAS requires amortization while IFRS requires impairment testing only

2.2.3 Corporate Actions and Events

Accurate treatment of corporate actions is essential for computing correct returns and maintaining data integrity. Vietnamese companies frequently engage in corporate actions including cash dividends, stock dividends (bonus shares), rights issues, and stock splits.

2.3 Market Indices and Benchmarks

Constructing appropriate benchmarks is fundamental to performance evaluation and factor model estimation. The Vietnamese market features several indices that serve different purposes in financial research.

Table 2.2: Vietnamese Market Indices
Index Exchange Description Use Case
VN-Index HOSE All HOSE-listed stocks Broad market benchmark
VN30-Index HOSE 30 largest, most liquid Investable benchmark
HNX-Index HNX All HNX-listed stocks Mid-cap benchmark
HNX30-Index HNX 30 largest HNX stocks HNX large-cap
VNAllShare Combined HOSE + HNX Total market
VN100 Combined Top 100 stocks Large/mid-cap

The VN-Index, which tracks all stocks listed on HOSE, is the most widely followed benchmark and serves as the primary gauge of overall market performance. The HNX-Index covers stocks on the Hanoi exchange, while the VN30-Index tracks the thirty largest and most liquid stocks on HOSE.

For asset pricing research, the VN30-Index is particularly valuable as it represents the investable universe for institutional investors and serves as the underlying for Vietnam’s most liquid derivatives contracts. The constituent stocks are reviewed semi-annually based on market capitalization, liquidity, and free-float requirements.

# Retrieve VN-Index historical data

2.3.1 Index Constituent Data

For factor model construction and portfolio analysis, access to index constituent lists and their weights is essential. While official constituent data requires subscription to exchange data feeds, we can approximate index membership using market capitalization and liquidity filters.

2.4 Macroeconomic Data from Vietnamese Sources

Asset pricing models often incorporate macroeconomic variables as predictors of expected returns or as state variables in conditional models. For the Vietnamese market, relevant macroeconomic data comes primarily from two sources: the General Statistics Office (GSO) and the State Bank of Vietnam (SBV).

2.4.1 Key Macroeconomic Indicators

The following macroeconomic variables are particularly relevant for Vietnamese financial research:

  1. Consumer Price Index (CPI): Essential for computing real returns and inflation-adjusted valuations. Vietnam experienced periods of high inflation, particularly during 2008 and 2011 when annual CPI exceeded 20%.

  2. Industrial Production Index (IPI): Proxy for economic activity and business cycle conditions.

  3. Money Supply (M2): Indicator of monetary policy stance and liquidity conditions.

  4. Credit Growth: Bank lending growth, a key driver of economic activity in Vietnam’s bank-dominated financial system.

  5. USD/VND Exchange Rate: Critical for international investors and companies with foreign currency exposure.

  6. Foreign Direct Investment (FDI): Indicator of international capital flows and economic confidence.

  7. Trade Balance: Export and import dynamics affecting corporate earnings.

Unfortunately, unlike the US Federal Reserve’s FRED database, Vietnamese macroeconomic data is not available through standardized APIs. Researchers must typically download data manually from GSO and SBV websites or use web scraping techniques.

# Structure for Vietnamese macroeconomic data

2.4.2 Risk-Free Rate Approximation

Determining an appropriate risk-free rate for Vietnam presents challenges not encountered in developed markets. Unlike the US Treasury market, Vietnam’s government bond market is relatively illiquid with limited secondary trading. Several alternatives exist:

  1. SBV Refinancing Rate: The policy rate set by the State Bank of Vietnam. Not directly investable but reflects monetary policy stance.

  2. Government Bond Yields: One-year or longer-term government bond yields from auction results. More investable but less liquid than US Treasuries.

  3. Interbank Rates: Overnight or term interbank lending rates. Reflect short-term funding costs but include credit risk.

  4. Adjusted US Rate: US Treasury rate plus expected VND depreciation, following uncovered interest rate parity.

def calculate_risk_free_rate(macro_data, method="refinancing"):
    """
    Calculate risk-free rate proxy for Vietnamese market.
    
    Parameters
    ----------
    macro_data : pd.DataFrame
        DataFrame with macroeconomic data
    method : str
        Method for risk-free rate: 'refinancing', 'bond', or 'adjusted_us'
    
    Returns
    -------
    pd.DataFrame
        DataFrame with date and monthly risk-free rate
    """
    if method == "refinancing":
        # Use SBV refinancing rate, convert annual to monthly
        rf = macro_data[["date", "refinancing_rate"]].copy()
        rf["rf_monthly"] = rf["refinancing_rate"] / 12 / 100
        
    elif method == "adjusted_us":
        # US rate + expected VND depreciation
        # Requires additional data on US rates and exchange rate expectations
        pass
    
    return rf[["date", "rf_monthly"]]

2.5 Setting Up a Database for Vietnamese Financial Data

Managing financial data across multiple sources and formats requires a systematic approach to data storage. We recommend using SQLite as the primary database engine for several reasons: it requires no server setup, stores the entire database in a single portable file, supports standard SQL queries, and integrates seamlessly with Python through the built-in sqlite3 module.

2.5.1 Database Schema Design

Our database schema is designed to support efficient queries for common research tasks while maintaining data integrity. We create separate tables for different data types with appropriate relationships.

import os
import sqlite3

# Create data directory if it doesn't exist
if not os.path.exists("data"):
    os.makedirs("data")

# Initialize SQLite database connection
tidy_finance_python = sqlite3.connect(
    "data/tidy_finance_python.sqlite"
)

2.5.2 Storing Data

With the database schema established, we can store our collected data using pandas’ to_sql() method.

# Store stock listing data
common_stocks.to_sql(
    name="stock_master",
    con=tidy_finance_python,
    if_exists="replace",
    index=False
)

# Store stock price data
stock_prices.to_sql(
    name="stock_prices_daily",
    con=tidy_finance_python,
    if_exists="replace",
    index=False
)

# Store market indices
vn_index.to_sql(
    name="market_indices",
    con=tidy_finance_python,
    if_exists="replace",
    index=False
)

# Store factor returns
factors_vietnam.to_sql(
    name="factors_monthly",
    con=tidy_finance_python,
    if_exists="replace",
    index=False
)

2.6 Querying and Updating the Database

Once data is stored in the database, retrieval is straightforward using SQL queries. The pandas read_sql_query() function executes a SQL statement and returns the results as a DataFrame.

# Query stock prices for specific symbols and date range
query = """
SELECT date, symbol, close, volume
FROM stock_prices_daily
WHERE symbol IN ('VNM', 'VIC', 'FPT', 'VHM', 'VCB')
  AND date >= '2020-01-01'
ORDER BY symbol, date
"""

selected_stocks = pd.read_sql_query(
    sql=query,
    con=tidy_finance_python,
    parse_dates=["date"]
)

# Query factor data merged with market returns
query_factors = """
SELECT f.date, f.mkt_rf, f.smb, f.hml, f.rf,
       m.cpi_yoy, m.credit_growth
FROM factors_monthly f
LEFT JOIN macro_monthly m ON f.date = m.date
WHERE f.date >= '2015-01-01'
ORDER BY f.date
"""

factor_data = pd.read_sql_query(
    sql=query_factors,
    con=tidy_finance_python,
    parse_dates=["date"]
)

2.6.1 Database Maintenance

Regular database maintenance ensures optimal performance and data integrity.

# Optimize database
tidy_finance_python.execute("VACUUM")

# Check database integrity
integrity_check = pd.read_sql_query(
    "PRAGMA integrity_check",
    tidy_finance_python
)
print(f"Integrity check: {integrity_check.iloc[0, 0]}")

# Get database statistics
table_stats = pd.read_sql_query("""
    SELECT name, 
           (SELECT COUNT(*) FROM stock_prices_daily) as price_rows,
           (SELECT COUNT(*) FROM stock_master) as stock_count,
           (SELECT COUNT(*) FROM factors_monthly) as factor_months
    FROM sqlite_master
    WHERE type='table' AND name='stock_master'
""", tidy_finance_python)

print(table_stats)

# Close connection when done
tidy_finance_python.close()

2.7 Alternative Data Sources for Vietnamese Markets

Beyond traditional price and fundamental data, researchers increasingly incorporate alternative data sources to gain unique insights into market dynamics.

2.7.1 Foreign Investor Flow Data

Foreign investor flow data is particularly valuable given the significant role of foreign capital in Vietnamese equity markets. The State Securities Commission publishes daily foreign ownership statistics by security.

2.7.2 News and Sentiment Data

Media sentiment from Vietnamese financial news sources offers another research avenue. Major outlets such as CafeF, VnExpress Finance, and Vietstock publish real-time news that can be analyzed for market sentiment.

2.8 Key Takeaways

  1. Market Structure Understanding: The Vietnamese financial market operates across three exchanges (HOSE, HNX, UPCoM) with distinct characteristics including foreign ownership limits, trading band restrictions, and a T+2 settlement cycle. Researchers must account for these institutional features in empirical analysis.

  2. Macroeconomic Data Challenges: Unlike developed markets with standardized APIs (e.g., FRED), Vietnamese macroeconomic data requires manual collection from government sources (GSO, SBV). Researchers should plan for this additional data gathering effort and implement systematic data management practices.

  3. Database-Centric Workflow: SQLite provides an efficient and portable database solution for managing Vietnamese financial data across research projects. The structured database approach enables reproducible research workflows, efficient queries, and easy data sharing among collaborators.

  4. Data Quality Imperative: Data quality validation is especially important for emerging market data. Implementing systematic checks for missing values, extreme returns, duplicate entries, and cross-source validation helps ensure research reliability and reproducibility.

  5. Alternative Data Opportunities: Foreign investor flows, corporate announcements, and media sentiment provide unique research opportunities in the Vietnamese market that can complement traditional price and fundamental analysis. These data sources can reveal insights about market dynamics not captured in standard datasets.

  6. Continuous Maintenance: Financial databases require ongoing maintenance including incremental updates, integrity checks, and optimization. Establishing systematic update procedures ensures data currency and database performance over time.