4 Market Microstructure in Vietnam

Note

In this chapter, we examine how the institutional design of Vietnamese equity markets, such as trading sessions, price limits, order types, and investor composition, shapes observed prices, returns, and liquidity. We quantify microstructure frictions and demonstrate why ignoring these frictions leads to biased inference in asset pricing tests.

Market microstructure is the study of how trading rules, order handling mechanisms, and market design affect price formation, transaction costs, and liquidity. The field, pioneered by Kyle (1985), Glosten and Milgrom (1985), and Hasbrouck (2007), provides the analytical toolkit for understanding why observed prices may deviate from fundamental values and for how long.

In developed markets with continuous electronic trading, designated market makers, and minimal regulatory constraints on price movement, microstructure frictions are typically second-order concerns for researchers working at monthly or lower frequencies. In Vietnam’s equity markets, this is emphatically not the case. Daily price limits, thin trading, a predominantly retail investor base, discrete tick sizes, and the absence of formal market-making arrangements generate frictions that propagate into monthly returns, distort factor loadings, and bias portfolio-level inference. Any serious empirical analysis of Vietnamese equities must therefore begin with a careful assessment of market microstructure.

This chapter provides that assessment. We first describe the institutional architecture of Vietnamese equity trading. We then develop diagnostics for the most consequential frictions, such as price limit hits, zero-return days, illiquidity, and non-synchronous trading, and quantify their severity in the cross-section of listed firms. Finally, we derive practical guidance for adjusting portfolio construction and asset pricing tests.

4.1 What Is Market Microstructure?

The textbook assumption of frictionless markets implies that prices continuously and costlessly incorporate information. Under this assumption, the observed return on any asset at any frequency equals the “true” return dictated by fundamentals. Market microstructure relaxes this assumption by recognizing that prices are generated by a specific trading process with real costs, constraints, and imperfections.

The canonical framework of Kyle (1985) models a market with three types of participants:

an informed trader who knows the asset’s fundamental value,
noise traders who trade for liquidity reasons, and
a market maker who sets prices to break even in expectation.

The key insight is that the market maker cannot distinguish informed from uninformed order flow, so prices adjust gradually to information, creating a wedge between the transaction price and the fundamental value. The size of this wedge (the bid-ask spread) and the speed of price adjustment (market depth) are the core objects of microstructure theory.

Glosten and Milgrom (1985) extend this framework to a sequential trade setting and show that the bid-ask spread has two components: an adverse selection component (compensation for trading against informed traders) and an order processing component (compensation for the mechanical costs of trading). Huang and Stoll (1997) further decompose the spread into realized spread and price impact components. These decompositions are important because they reveal different sources of trading costs and have different implications for market quality.

For empirical asset pricing, the key question is: at what frequency and under what conditions do microstructure effects become negligible? In highly liquid markets, Bali, Engle, and Murray (2016) argue that monthly returns are largely free of microstructure contamination. In Vietnam, as we demonstrate below, this is not the case. Microstructure effects persist at monthly and even quarterly frequencies for a substantial fraction of listed firms.

4.1.1 The Microstructure-Asset Pricing Interface

The interface between microstructure and asset pricing operates through several channels. First, illiquidity itself may be a priced risk factor. Amihud (2002) shows that expected illiquidity is positively related to expected stock returns, implying a liquidity premium. Pástor and Stambaugh (2003) develop an equilibrium model in which liquidity risk (i.e., the covariance of a stock’s liquidity with market liquidity) commands a risk premium. Second, microstructure noise in prices biases estimated betas, factor loadings, and test statistics. Scholes and Williams (1977) first identified this bias in the context of non-synchronous trading, and Dimson (1979) proposed an aggregated-coefficients estimator to correct it. Third, price limits and other regulatory constraints censor the return distribution, creating truncation bias in volatility estimates, return moments, and extreme-value statistics (Kim and Rhee 1997).

Table 4.1 summarizes these channels and their empirical consequences.

Table 4.1: Channels Through Which Microstructure Affects Asset Pricing

Channel	Mechanism	Empirical Consequence
Illiquidity premium	Compensation for bearing transaction costs and inventory risk	Cross-sectional return predictability by liquidity measures
Non-synchronous trading	Infrequent trading creates stale prices	Downward-biased betas, attenuated correlations, and spurious lead-lag
Price limits	Regulatory censoring of daily returns	Truncated return distributions, volatility spillover, and artificial autocorrelation
Discrete tick sizes	Prices constrained to a grid	Bid-ask bounce, return discreteness, biased volatility
Investor composition	Retail-dominated order flow	Noise trading, herding, sentiment-driven pricing

4.2 Trading Architecture in Vietnam

Vietnam operates two stock exchanges: the Ho Chi Minh Stock Exchange (HOSE), established in 2000, and the Hanoi Stock Exchange (HNX), established in 2005. HOSE lists larger firms and accounts for the majority of market capitalization and trading volume. HNX lists smaller firms and also operates the Unlisted Public Company Market (UPCoM) for firms that have not yet met full listing requirements. All three venues operate electronic limit order book systems without designated market makers.

4.2.1 Exchange Characteristics

Table 4.2 presents the key structural differences between HOSE, HNX, and UPCoM. These differences have direct implications for liquidity, price discovery, and the severity of microstructure frictions.

Table 4.2: Exchange Comparison: HOSE, HNX, and UPCoM

Feature	HOSE	HNX	UPCoM
Established	2000	2005	2009
Listing tier	Large-cap	Mid/small-cap	Pre-listing
Daily price limit	$\pm$ 7%	$\pm$ 10%	$\pm$ 15%
Tick size regime	Tiered by price	Tiered by price	100 VND
Trading lot	100 shares	100 shares	100 shares
Short selling	Limited	Not available	Not available
Foreign ownership cap	Industry-specific	Industry-specific	Industry-specific

The heterogeneous price limit bands across exchanges create a natural experiment for studying limit effects. HOSE’s tighter $\pm$ 7% band means that large-cap stocks are more frequently constrained than mid-cap stocks on HNX, conditional on the same information shock. UPCoM’s wider $\pm$ 15% band provides the least constrained environment, though its stocks are also the least liquid.

4.2.2 Trading Sessions

Each exchange operates a structured trading day with distinct sessions. Understanding session structure is essential because price formation mechanisms differ across sessions, and certain sessions are disproportionately important for benchmark pricing (Table 4.3).

Table 4.3: Trading Session Structure on HOSE

Session	Time	Mechanism	Price Discovery Role
Pre-opening	08:30–09:00	Order entry only, no matching	Reveals pre-open demand/supply
Opening auction (ATO)	09:00–09:15	Batch auction, single price	Sets opening price from accumulated orders
Continuous trading (Morning)	09:15–11:30	Continuous limit order matching	Primary price discovery
Lunch break	11:30–13:00	No trading	—
Continuous trading (Afternoon)	13:00–14:30	Continuous limit order matching	Primary price discovery
Closing auction (ATC)	14:30–14:45	Batch auction, single price	Sets closing price (benchmark)
Post-closing	14:45–15:00	Put-through (negotiated) trades	Block and negotiated transactions

The closing auction (ATC) deserves particular attention. The ATC price is the official closing price used for index calculation, NAV computation, and margin requirements. Because it is determined by a single-bid auction, it can be manipulated by strategically timed orders, a phenomenon documented in numerous emerging markets (Comerton-Forde and Tang 2009; Hillion and Suominen 2004). Researchers using daily closing prices should be aware that ATC prices may not reflect the continuous-session equilibrium, particularly for less liquid stocks where a single large order can move the closing price.

4.2.3 Order Types and Matching Rules

Vietnamese exchanges support a limited set of order types compared to developed markets (Table 4.4).

Table 4.4: Available Order Types

Order Type	Description	Availability
Limit order (LO)	Specifies price and quantity	All sessions
Market order (ATO/ATC)	Matches at auction price	Auction sessions only
Market-to-limit (MTL)	Converts to limit at best available	HNX only

The absence of iceberg orders, stop orders, and hidden orders means that the full limit order book is visible to all participants. While this enhances pre-trade transparency, it also means that large institutional orders face significant information leakage risk, which may deter institutional participation and reduce market depth.

Orders are matched on a strict price-time priority basis during continuous sessions. During auction sessions, a single clearing price is determined that maximizes executed volume. If multiple prices satisfy this criterion, the price closest to the previous closing price is selected.

4.2.4 Tick Size Structure

Tick sizes on HOSE are tiered by price level, which creates discontinuities in the bid-ask spread as a percentage of price (Table 4.5).

Table 4.5: Tick Size Schedule on HOSE

Price Range (VND)	Tick Size (VND)	Minimum Spread as % of Midpoint
< 10,000	10	0.10% at 10,000
10,000–49,900	50	0.10% at 50,000
≥ 50,000	100	0.20% at 50,000

The jump from a 50 VND tick to a 100 VND tick at the 50,000 VND boundary means that the minimum percentage spread doubles discontinuously. This creates a “tick size cliff” that can affect the cross-sectional distribution of bid-ask spreads and, consequently, the measurement of illiquidity (Vo and Doan 2023). Bessembinder (2003) document similar effects in other markets with tiered tick structures.

4.2.5 Investor Composition

The Vietnamese equity market is predominantly driven by retail investors. While foreign institutional investors account for a meaningful share of market capitalization (particularly in blue-chip stocks subject to foreign ownership limits), daily trading volume is overwhelmingly generated by domestic retail accounts.

This retail dominance has several consequences for microstructure. First, retail investors tend to submit smaller orders and trade more frequently, generating high message-to-trade ratios but limited depth at each price level. Second, retail order flow is more susceptible to herding and sentiment, which can amplify momentum and generate excess volatility (Barber et al. 2009; Kaniel et al. 2012). Third, the limited institutional presence means that sophisticated liquidity provision is scarce, particularly in mid- and small-cap stocks.

4.3 Price Limits and Their Consequences

Vietnam enforces daily price limits on all listed equities. A stock’s price cannot move beyond a fixed percentage of the previous day’s closing price within a single trading day. The limit bands are $\pm$ 7% on HOSE, $\pm$ 10% on HNX, and $\pm$ 15% on UPCoM.

4.3.1 Theoretical Framework

Price limits were introduced with the stated goal of reducing volatility and preventing panic-driven price dislocations. However, the academic literature presents a more nuanced picture. The “magnet effect” hypothesis (Subrahmanyam 1994) predicts that price limits actually accelerate price movement toward the limit as traders rush to execute before the limit is hit. The “delayed price discovery” hypothesis (Fama and French 1989) argues that limits merely postpone inevitable price adjustments, creating volatility spillover into subsequent days.

Formally, let $P_t^*$ denote the equilibrium price on day $t$ and $P_{t-1}^c$ the previous closing price. The observed return is:

\[ r_t^{obs} = \begin{cases} \bar{L} & \text{if } r_t^* \geq \bar{L} \\ r_t^* & \text{if } \underline{L} < r_t^* < \bar{L} \\ \underline{L} & \text{if } r_t^* \leq \underline{L} \end{cases} \tag{4.1}\]

where $r_t^* = \ln(P_t^* / P_{t-1}^c)$ is the latent (unconstrained) return, $\bar{L}$ is the upper limit, and $\underline{L}$ is the lower limit. The observed return $r_t^{obs}$ is a censored version of the true return. This censoring has several consequences:

Truncated moments: The observed variance $\text{Var}(r_t^{obs}) < \text{Var}(r_t^*)$ because extreme returns are clipped. This biases downward any volatility-based risk measure.
Artificial autocorrelation: When $r_t^{obs} = \bar{L}$ and $r_{t+1}^{obs} > 0$ (continued adjustment the next day), the return series exhibits positive autocorrelation that is purely mechanical, not informational.
Volatility spillover: Define excess volatility on day $t+1$ as $\sigma_{t+1}^2 - E[\sigma_{t+1}^2 | \text{no limit hit on day } t]$. Kim and Rhee (1997) and Chu and Qiu (2019) document significant positive spillover, where days following limit hits exhibit abnormally high volatility.
Biased extreme value statistics: Measures such as Value-at-Risk, Expected Shortfall, and maximum drawdown are mechanically bounded by the limit, understating true tail risk.

4.3.2 Detecting Price Limit Hits

We now implement a diagnostic to detect price limit hits in the daily data.

import pandas as pd
import numpy as np
import sqlite3

# Load daily price data
tidy_finance = sqlite3.connect(database="data/tidy_finance_python.sqlite")

# Assume prices_daily contains: symbol, date, close, exchange
prices_daily = pd.read_sql_query(
    # , exchange
    sql="""
        SELECT symbol, date, close
        FROM prices_daily
    """,
    con=tidy_finance,
    parse_dates=["date"]
).dropna()

# Define limit bands by exchange
limit_bands = {"HOSE": 0.07, "HNX": 0.10, "UPCoM": 0.15}

prices_daily = prices_daily.sort_values(["symbol", "date"])
prices_daily["prev_close"] = prices_daily.groupby("symbol")["close"].shift(1)
prices_daily["ret"] = prices_daily["close"] / prices_daily["prev_close"] - 1
prices_daily["limit_band"] = prices_daily["exchange"].map(limit_bands)

# A limit hit occurs when the return is within 0.1% of the theoretical limit
tolerance = 0.001
prices_daily["upper_hit"] = (
    prices_daily["ret"] >= prices_daily["limit_band"] - tolerance
)
prices_daily["lower_hit"] = (
    prices_daily["ret"] <= -prices_daily["limit_band"] + tolerance
)
prices_daily["limit_hit"] = (
    prices_daily["upper_hit"] | prices_daily["lower_hit"]
)

4.3.3 Frequency of Limit Hits

prices_daily["year_month"] = prices_daily["date"].dt.to_period("M")

limit_hit_monthly = (
    prices_daily
    .groupby(["year_month", "exchange"])
    .agg(
        total_obs=("limit_hit", "count"),
        limit_hits=("limit_hit", "sum")
    )
    .reset_index()
)
limit_hit_monthly["hit_rate"] = (
    limit_hit_monthly["limit_hits"] / limit_hit_monthly["total_obs"]
)
limit_hit_monthly["date"] = limit_hit_monthly["year_month"].dt.to_timestamp()

fig, ax = plt.subplots(figsize=(8, 4))

for exchange, color in zip(
    ["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"]
):
    subset = limit_hit_monthly[limit_hit_monthly["exchange"] == exchange]
    ax.plot(
        subset["date"], subset["hit_rate"] * 100,
        label=exchange, color=color, linewidth=1.2
    )

ax.set_ylabel("Limit Hit Rate (%)")
ax.set_xlabel("")
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()

Figure 4.1

4.3.4 Volatility Spillover Test

Following Kim and Rhee (1997), we test whether days following a limit hit exhibit abnormally high volatility. Define the dummy variable $D_t = 1$ if a limit hit occurred on day $t$, and estimate:

\[ \sigma_{t+1}^2 = \alpha + \beta D_t + \gamma \sigma_t^2 + \varepsilon_{t+1} \tag{4.2}\]

where $\sigma_t^2$ is the squared return. A positive and significant $\beta$ indicates volatility spillover attributable to the price limit.

import statsmodels.api as sm

# Panel-level volatility spillover test
prices_daily["sq_ret"] = prices_daily["ret"] ** 2
prices_daily["sq_ret_lead"] = prices_daily.groupby("symbol")["sq_ret"].shift(-1)
prices_daily["limit_hit_int"] = prices_daily["limit_hit"].astype(int)

spillover_data = prices_daily.dropna(subset=["sq_ret_lead", "sq_ret"])

X = sm.add_constant(spillover_data[["limit_hit_int", "sq_ret"]])
y = spillover_data["sq_ret_lead"]

model = sm.OLS(y, X).fit(cov_type="cluster", cov_kwds={"groups": spillover_data["symbol"]})

spillover_results = pd.DataFrame({
    "Coefficient": model.params,
    "Std. Error": model.bse,
    "t-stat": model.tvalues,
    "p-value": model.pvalues
}).round(6)

print(spillover_results)

Tip

A significant positive coefficient on the limit hit dummy confirms that Vietnamese price limits do not eliminate volatility, they merely redistribute it across days. This has direct implications for risk management: daily VaR measures computed from censored returns understate true risk exposure.

4.3.5 Return Autocorrelation Induced by Price Limits

Price limits mechanically induce positive autocorrelation in returns. To quantify this, we compute the first-order autocorrelation coefficient separately for stocks that hit limits frequently versus those that do not.

Table 4.6: Return Autocorrelation by Price Limit Hit Frequency

# Classify stocks by limit hit frequency
stock_limit_freq = (
    prices_daily
    .groupby("symbol")
    .agg(
        hit_rate=("limit_hit", "mean"),
        n_obs=("ret", "count")
    )
    .query("n_obs >= 250")  # At least 1 year of data
)

stock_limit_freq["limit_group"] = pd.qcut(
    stock_limit_freq["hit_rate"], q=3,
    labels=["Low", "Medium", "High"]
)

# Compute autocorrelation by group
def compute_autocorr(group_symbols):
    subset = prices_daily[prices_daily["symbol"].isin(group_symbols)].copy()
    subset["ret_lag"] = subset.groupby("symbol")["ret"].shift(1)
    return subset[["ret", "ret_lag"]].dropna().corr().iloc[0, 1]

autocorr_results = []
for group in ["Low", "Medium", "High"]:
    symbols = stock_limit_freq[stock_limit_freq["limit_group"] == group].index
    ac = compute_autocorr(symbols)
    n_stocks = len(symbols)
    avg_hit_rate = stock_limit_freq.loc[symbols, "hit_rate"].mean()
    autocorr_results.append({
        "Group": group,
        "N Stocks": n_stocks,
        "Avg Limit Hit Rate (%)": round(avg_hit_rate * 100, 2),
        "AR(1)": round(ac, 4)
    })

pd.DataFrame(autocorr_results).style.hide(axis="index")

The expected pattern is a monotonically increasing autocorrelation from the Low to High limit-hit group, confirming that the observed serial dependence in returns is at least partly an artifact of price censoring rather than genuine return predictability.

4.4 Liquidity, Thin Trading, and Zero Returns

Liquidity (i.e., the ability to trade quickly at low cost without moving the price) is a first-order concern in Vietnamese equities. A substantial fraction of listed firms, particularly on HNX and UPCoM, experience chronic illiquidity characterized by infrequent trading, wide bid-ask spreads, and frequent zero-return days.

4.4.1 Measuring Liquidity

The academic literature has developed numerous liquidity measures, each capturing a different dimension of market quality. @#tbl-liquidity-measures summarizes the measures most applicable to Vietnamese data, given typical data availability.

Table 4.7: Liquidity Measures for Vietnamese Equities

Measure	Formula	Interpretation	Data Required
Turnover ratio	$\text{TO}_{i,t} = \frac{\text{Volume}_{i,t}}{\text{Shares Outstanding}_{i}}$	Trading intensity relative to float	Volume, shares outstanding
Amihud illiquidity	$\text{ILLIQ}_{i,t} = \frac{1}{D} \sum_{d=1}^{D} \frac{\|r_{i,d}\|}{V_{i,d}}$	Price impact per unit of volume	Daily returns, daily volume
Zero-return proportion	$\text{ZR}_{i,t} = \frac{\#\{d : r_{i,d} = 0\}}{D}$	Frequency of non-trading or stale pricing	Daily returns
Roll spread	$\hat{S}_i = 2\sqrt{-\text{Cov}(r_{i,d}, r_{i,d-1})}$	Effective bid-ask spread estimate	Daily returns
Bid-ask spread	$\text{BA}_{i,d} = \frac{\text{Ask}_{i,d} - \text{Bid}_{i,d}}{(\text{Ask}_{i,d} + \text{Bid}_{i,d})/2}$	Direct transaction cost	Quote data

The Amihud illiquidity ratio (Amihud 2002) is particularly useful because it requires only daily return and volume data. It captures the price impact of trading (i.e., the return per unit of currency volume) and has been shown to correlate well with more sophisticated microstructure-based measures such as the effective spread Goyenko and Ukhov (2009).

4.4.2 Computing Liquidity Diagnostics

# Compute standard liquidity measures at the stock-month level
prices_daily["abs_ret"] = prices_daily["ret"].abs()
prices_daily["zero_return"] = (prices_daily["ret"] == 0).astype(int)
prices_daily["year_month"] = prices_daily["date"].dt.to_period("M")

# Assume volume is in shares and value is in VND
# Amihud: average |ret| / value (in billions VND)
prices_daily["amihud_daily"] = np.where(
    prices_daily["value"] > 0,
    prices_daily["abs_ret"] / (prices_daily["value"] / 1e9),
    np.nan
)

liquidity_monthly = (
    prices_daily
    .groupby(["symbol", "year_month"])
    .agg(
        zero_return_share=("zero_return", "mean"),
        avg_turnover=("turnover", "mean"),
        amihud=("amihud_daily", "mean"),
        trading_days=("ret", "count"),
        avg_daily_value=("value", "mean")
    )
    .reset_index()
)

# Flag severely illiquid stock-months
liquidity_monthly["illiquid_flag"] = (
    (liquidity_monthly["zero_return_share"] > 0.5) |
    (liquidity_monthly["trading_days"] < 10) |
    (liquidity_monthly["avg_daily_value"] < 1e8)  # < 100M VND/day
)

4.4.3 Cross-Sectional Distribution of Liquidity

Table 4.8: Cross-Sectional Distribution of Liquidity Measures (Latest Full Year)

latest_year = liquidity_monthly["year_month"].dt.year.max()
annual_liq = (
    liquidity_monthly[liquidity_monthly["year_month"].dt.year == latest_year]
    .groupby("symbol")
    .agg(
        zero_return_share=("zero_return_share", "mean"),
        avg_turnover=("avg_turnover", "mean"),
        amihud=("amihud", "mean"),
        avg_daily_value_m=("avg_daily_value", lambda x: x.mean() / 1e6)
    )
)

summary_stats = annual_liq.describe(percentiles=[0.10, 0.25, 0.50, 0.75, 0.90]).T
summary_stats = summary_stats[
    ["mean", "std", "10%", "25%", "50%", "75%", "90%"]
].round(4)
summary_stats.columns = ["Mean", "Std", "P10", "P25", "Median", "P75", "P90"]
summary_stats.index = [
    "Zero-Return Share",
    "Avg Daily Turnover",
    "Amihud Illiquidity",
    "Avg Daily Value (M VND)"
]
summary_stats

4.4.4 Liquidity Distribution Across Exchanges

# Merge exchange info
stock_exchange = (
    prices_daily[["symbol", "exchange"]]
    .drop_duplicates("symbol")
)
annual_liq_exch = annual_liq.merge(
    stock_exchange, left_index=True, right_on="symbol"
)

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# Zero-return share
for exchange, color in zip(
    ["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"]
):
    subset = annual_liq_exch[annual_liq_exch["exchange"] == exchange]
    axes[0].hist(
        subset["zero_return_share"], bins=30, alpha=0.6,
        color=color, label=exchange, density=True
    )
axes[0].set_xlabel("Zero-Return Share")
axes[0].set_ylabel("Density")
axes[0].legend(frameon=False)
axes[0].spines["top"].set_visible(False)
axes[0].spines["right"].set_visible(False)

# Amihud (log scale)
for exchange, color in zip(
    ["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"]
):
    subset = annual_liq_exch[annual_liq_exch["exchange"] == exchange]
    amihud_log = np.log(subset["amihud"].clip(lower=1e-10))
    axes[1].hist(
        amihud_log, bins=30, alpha=0.6,
        color=color, label=exchange, density=True
    )
axes[1].set_xlabel("Log Amihud Illiquidity")
axes[1].set_ylabel("Density")
axes[1].legend(frameon=False)
axes[1].spines["top"].set_visible(False)
axes[1].spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

Figure 4.2

The distributions typically reveal a bimodal pattern: HOSE stocks cluster at low illiquidity values, while HNX and especially UPCoM stocks exhibit a long right tail of extreme illiquidity. This heterogeneity implies that a single liquidity filter or treatment is insufficient for the entire cross-section.

4.4.5 Time Variation in Aggregate Liquidity

Market-wide liquidity is not constant. It deteriorates during crises, policy uncertainty, and periods of capital outflow, and improves during bull markets and periods of foreign inflow. The time variation in aggregate liquidity is itself a risk factor (Pástor and Stambaugh 2003).

agg_liquidity = (
    liquidity_monthly
    .groupby("year_month")
    .agg(
        median_amihud=("amihud", "median"),
        median_zero_ret=("zero_return_share", "median"),
        total_value=("avg_daily_value", "sum")
    )
    .reset_index()
)
agg_liquidity["date"] = agg_liquidity["year_month"].dt.to_timestamp()

fig, ax1 = plt.subplots(figsize=(8, 4))

ax1.plot(
    agg_liquidity["date"],
    np.log(agg_liquidity["median_amihud"].clip(lower=1e-10)),
    color="#2C73D2", linewidth=1.2
)
ax1.set_ylabel("Log Median Amihud", color="#2C73D2")
ax1.tick_params(axis="y", labelcolor="#2C73D2")

ax2 = ax1.twinx()
ax2.fill_between(
    agg_liquidity["date"],
    agg_liquidity["median_zero_ret"] * 100,
    alpha=0.3, color="#FF6B6B"
)
ax2.set_ylabel("Median Zero-Return Share (%)", color="#FF6B6B")
ax2.tick_params(axis="y", labelcolor="#FF6B6B")

ax1.spines["top"].set_visible(False)
plt.tight_layout()
plt.show()

Figure 4.3

Practical Recommendation

Before any asset pricing analysis, apply the following liquidity filter: exclude stock-months where the zero-return share exceeds 50%, where fewer than 15 trading days are observed, or where average daily trading value falls below a threshold (e.g., 100 million VND). Document the filter explicitly, and report sensitivity of results to alternative thresholds.

4.5 Bid-Ask Spread Estimation

In the absence of comprehensive quote data, the effective bid-ask spread can be estimated from transaction data using the method of Roll (1984). The Roll estimator exploits the fact that if the bid-ask bounce is the sole source of negative serial covariance in returns, then:

\[ \hat{S}_{\text{Roll}} = 2\sqrt{-\text{Cov}(\Delta p_t, \Delta p_{t-1})} \tag{4.3}\]

where $\Delta p_t = p_t - p_{t-1}$ is the price change. When the autocovariance is positive (which occurs when information-driven serial correlation dominates the bid-ask bounce), the Roll estimator is undefined. Hasbrouck (2009) proposes a Bayesian variant that handles this case by imposing a prior on the spread.

# Compute Roll spread estimate at the stock-month level
prices_daily["dprice"] = prices_daily.groupby("symbol")["close"].diff()
prices_daily["dprice_lag"] = prices_daily.groupby("symbol")["dprice"].shift(1)

roll_cov = (
    prices_daily
    .groupby(["symbol", "year_month"])
    .apply(
        lambda g: g[["dprice", "dprice_lag"]].dropna().cov().iloc[0, 1],
        include_groups=False
    )
    .reset_index(name="autocovariance")
)

# Roll spread is defined only when autocovariance is negative
roll_cov["roll_spread"] = np.where(
    roll_cov["autocovariance"] < 0,
    2 * np.sqrt(-roll_cov["autocovariance"]),
    np.nan
)

# As a percentage of price
roll_cov = roll_cov.merge(
    prices_daily.groupby(["symbol", "year_month"])["close"].mean()
    .reset_index(name="avg_price"),
    on=["symbol", "year_month"]
)
roll_cov["roll_spread_pct"] = roll_cov["roll_spread"] / roll_cov["avg_price"] * 100

Table 4.9: Distribution of Roll Spread Estimates (% of Price)

roll_summary = (
    roll_cov
    .dropna(subset=["roll_spread_pct"])
    .groupby("year_month")["roll_spread_pct"]
    .describe(percentiles=[0.25, 0.50, 0.75])
    .reset_index()
)

# Show latest year summary
latest_year_roll = roll_cov[
    roll_cov["year_month"].dt.year == roll_cov["year_month"].dt.year.max()
]
print(
    latest_year_roll["roll_spread_pct"]
    .dropna()
    .describe(percentiles=[0.10, 0.25, 0.50, 0.75, 0.90])
    .round(3)
)

4.6 Non-Synchronous Trading Bias

When stocks do not trade at the same frequency or at the same times, observed returns are misaligned. This non-synchronous trading bias, first formalized by Scholes and Williams (1977) and Lo and MacKinlay (1990), is one of the most consequential microstructure effects for asset pricing in thin markets.

4.6.1 The Problem

Suppose the true (unobserved) return process for stock $i$ follows a single-factor model:

\[ r_{i,t}^* = \alpha_i + \beta_i r_{m,t}^* + \varepsilon_{i,t} \tag{4.4}\]

where $r_{m,t}^*$ is the true market return and $\beta_i$ is the true beta. If stock $i$ last traded $k$ days before the end of day $t$, the observed return incorporates information only up to day $t - k$. Scholes and Williams (1977) show that the OLS estimate of beta from regressing observed returns on observed market returns is:

\[ \hat{\beta}_i^{OLS} = \beta_i \cdot \pi_i \tag{4.5}\]

where $\pi_i$ is the probability that stock $i$ trades on any given day. For a stock that trades on only 50% of days, the OLS beta is biased downward by 50%. This bias is severe in Vietnam, where many small-cap stocks trade on fewer than half of all trading days.

4.6.2 Quantifying the Bias

# Compute trading frequency: proportion of market days with nonzero volume
market_days = prices_daily.groupby("year_month")["date"].nunique()
trading_freq = (
    prices_daily[prices_daily["value"] > 0]
    .groupby(["symbol", "year_month"])["date"]
    .nunique()
    .reset_index(name="days_traded")
)
trading_freq = trading_freq.merge(
    market_days.reset_index().rename(columns={"date": "market_days"}),
    on="year_month"
)
trading_freq["trade_prob"] = trading_freq["days_traded"] / trading_freq["market_days"]

# Annual average
annual_trade_freq = (
    trading_freq
    .groupby("symbol")["trade_prob"]
    .mean()
    .reset_index(name="avg_trade_prob")
)

fig, ax = plt.subplots(figsize=(7, 4))
ax.hist(
    annual_trade_freq["avg_trade_prob"], bins=50,
    color="#2C73D2", edgecolor="white", alpha=0.8
)
ax.axvline(
    annual_trade_freq["avg_trade_prob"].median(),
    color="#FF6B6B", linestyle="--", linewidth=1.5,
    label=f"Median = {annual_trade_freq['avg_trade_prob'].median():.2f}"
)
ax.set_xlabel("Average Trading Probability (Fraction of Market Days)")
ax.set_ylabel("Number of Stocks")
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()

Figure 4.4

4.6.3 The Dimson Beta Correction

Dimson (1979) proposes a simple correction: include lagged and leading market returns in the beta regression:

\[ r_{i,t} = \alpha_i + \sum_{k=-K}^{K} \beta_{i,k} \, r_{m,t-k} + \varepsilon_{i,t} \tag{4.6}\]

The Dimson-corrected beta is $\hat{\beta}_i^{Dimson} = \sum_{k=-K}^{K} \hat{\beta}_{i,k}$. Typically $K = 1$ or $K = 2$ is sufficient. The summed coefficients capture the full response of the stock’s observed return to market information, regardless of when the stock actually trades.

# Estimate Dimson betas with K=1 lag and lead
# Merge market return
market_ret = (
    prices_daily
    .groupby("date")
    .apply(
        lambda g: np.average(g["ret"].dropna(), weights=g["mktcap"].loc[g["ret"].dropna().index])
        if g["ret"].dropna().shape[0] > 0 else np.nan,
        include_groups=False
    )
    .reset_index(name="rm")
)

prices_daily = prices_daily.merge(market_ret, on="date", how="left")
prices_daily["rm_lag1"] = prices_daily.groupby("symbol")["rm"].shift(1)
prices_daily["rm_lead1"] = prices_daily.groupby("symbol")["rm"].shift(-1)

def estimate_dimson_beta(group):
    """Estimate OLS and Dimson(K=1) betas for a single stock."""
    g = group.dropna(subset=["ret", "rm", "rm_lag1", "rm_lead1"])
    if len(g) < 60:
        return pd.Series({"beta_ols": np.nan, "beta_dimson": np.nan, "n_obs": len(g)})

    # OLS beta
    X_ols = sm.add_constant(g["rm"])
    ols_model = sm.OLS(g["ret"], X_ols).fit()
    beta_ols = ols_model.params["rm"]

    # Dimson beta
    X_dim = sm.add_constant(g[["rm_lag1", "rm", "rm_lead1"]])
    dim_model = sm.OLS(g["ret"], X_dim).fit()
    beta_dimson = dim_model.params[["rm_lag1", "rm", "rm_lead1"]].sum()

    return pd.Series({
        "beta_ols": beta_ols,
        "beta_dimson": beta_dimson,
        "n_obs": len(g)
    })

beta_comparison = (
    prices_daily
    .groupby("symbol")
    .apply(estimate_dimson_beta, include_groups=False)
    .reset_index()
)

beta_valid = beta_comparison.dropna()

fig, ax = plt.subplots(figsize=(6, 6))
ax.scatter(
    beta_valid["beta_ols"], beta_valid["beta_dimson"],
    alpha=0.3, s=10, color="#2C73D2"
)
lims = [
    min(ax.get_xlim()[0], ax.get_ylim()[0]),
    max(ax.get_xlim()[1], ax.get_ylim()[1])
]
ax.plot(lims, lims, "--", color="gray", linewidth=1)
ax.set_xlabel("OLS Beta")
ax.set_ylabel("Dimson Beta (K=1)")
ax.set_aspect("equal")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()

Figure 4.5

The scatter plot should reveal a systematic pattern: Dimson betas exceed OLS betas for most stocks, with the discrepancy largest for thinly traded stocks. Points above the 45-degree line indicate stocks whose OLS betas are biased downward by non-synchronous trading.

Table 4.10: Beta Bias by Trading Frequency Tercile

beta_with_freq = beta_valid.merge(annual_trade_freq, on="symbol")
beta_with_freq["freq_tercile"] = pd.qcut(
    beta_with_freq["avg_trade_prob"], q=3,
    labels=["Low (Thin)", "Medium", "High (Liquid)"]
)

beta_bias_summary = (
    beta_with_freq
    .groupby("freq_tercile")
    .agg(
        n_stocks=("symbol", "count"),
        avg_trade_prob=("avg_trade_prob", "mean"),
        mean_beta_ols=("beta_ols", "mean"),
        mean_beta_dimson=("beta_dimson", "mean"),
        median_beta_ols=("beta_ols", "median"),
        median_beta_dimson=("beta_dimson", "median")
    )
    .round(3)
)

beta_bias_summary["bias_pct"] = (
    (beta_bias_summary["mean_beta_dimson"] - beta_bias_summary["mean_beta_ols"])
    / beta_bias_summary["mean_beta_dimson"] * 100
).round(1)

beta_bias_summary

Warning

For the thinnest-traded tercile, OLS beta underestimates true systematic risk by 20-40% on average. Using uncorrected betas for cost of equity estimation or factor model tests will produce systematically incorrect results for these stocks.

4.6.4 The Scholes-Williams Estimator

An alternative correction, proposed by Scholes and Williams (1977), estimates beta as:

\[ \hat{\beta}_i^{SW} = \frac{\hat{\beta}_{i,-1} + \hat{\beta}_{i,0} + \hat{\beta}_{i,+1}}{1 + 2\hat{\rho}_m} \tag{4.7}\]

where $\hat{\beta}_{i,k}$ is the slope from regressing $r_{i,t}$ on $r_{m,t-k}$ alone, and $\hat{\rho}_m$ is the first-order autocorrelation of the market return. The Scholes-Williams estimator is consistent under the assumption that non-trading is the sole source of serial cross-correlation, while the Dimson estimator is more robust to additional sources of lead-lag structure.

def estimate_sw_beta(group):
    """Estimate Scholes-Williams beta."""
    g = group.dropna(subset=["ret", "rm", "rm_lag1", "rm_lead1"])
    if len(g) < 60:
        return np.nan

    # Separate regressions
    beta_lag = sm.OLS(g["ret"], sm.add_constant(g["rm_lag1"])).fit().params.iloc[1]
    beta_0 = sm.OLS(g["ret"], sm.add_constant(g["rm"])).fit().params.iloc[1]
    beta_lead = sm.OLS(g["ret"], sm.add_constant(g["rm_lead1"])).fit().params.iloc[1]

    # Market autocorrelation
    rho_m = g["rm"].autocorr(lag=1)

    beta_sw = (beta_lag + beta_0 + beta_lead) / (1 + 2 * rho_m)
    return beta_sw

beta_comparison["beta_sw"] = (
    prices_daily
    .groupby("symbol")
    .apply(estimate_sw_beta, include_groups=False)
    .values
)

4.7 Implications for Portfolio Construction

The microstructure frictions documented above have direct consequences for portfolio construction, particularly for strategies that involve rebalancing across the full cross-section of listed firms.

4.7.1 Equal-Weighted vs. Value-Weighted Returns

Equal-weighted portfolio returns give the same weight to each stock, including illiquid small-cap stocks that may contribute stale or noisy prices. Value-weighted returns tilt toward large, liquid stocks and are less susceptible to microstructure contamination.

monthly_returns = (
    prices_daily
    .groupby(["symbol", "year_month"])
    .agg(
        monthly_ret=("ret", lambda x: (1 + x).prod() - 1),
        last_mktcap=("mktcap", "last")
    )
    .reset_index()
)
monthly_returns["date"] = monthly_returns["year_month"].dt.to_timestamp()

# Equal-weighted
ew_ret = monthly_returns.groupby("date")["monthly_ret"].mean().reset_index(name="ew")

# Value-weighted
def vw_return(group):
    w = group["last_mktcap"] / group["last_mktcap"].sum()
    return (w * group["monthly_ret"]).sum()

vw_ret = (
    monthly_returns.groupby("date")
    .apply(vw_return, include_groups=False)
    .reset_index(name="vw")
)

port_comp = ew_ret.merge(vw_ret, on="date")

fig, ax = plt.subplots(figsize=(8, 4))
for col, label, color in [
    ("ew", "Equal-Weighted", "#FF6B6B"),
    ("vw", "Value-Weighted", "#2C73D2")
]:
    cum_ret = (1 + port_comp[col]).cumprod()
    ax.plot(port_comp["date"], cum_ret, label=label, color=color, linewidth=1.2)

ax.set_ylabel("Cumulative Return (Growth of 1 VND)")
ax.set_xlabel("")
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()

Figure 4.6

A persistent divergence between equal-weighted and value-weighted cumulative returns is a hallmark of microstructure effects: the equal-weighted portfolio overstates attainable returns because it implicitly assumes costless trading in illiquid stocks.

4.7.2 Recommended Liquidity Filters

Based on the diagnostics developed in this chapter, we recommend the following pre-analysis filters:

Tip

Always report results with and without liquidity filters. If results are qualitatively different, the baseline findings may be driven by microstructure artifacts rather than genuine economic effects.

4.7.3 Monthly vs. Daily Frequency

For most asset pricing applications, monthly return aggregation is preferable to daily analysis in Vietnam because:

Monthly returns smooth out intraday noise, bid-ask bounce, and price limit effects.
Stocks that trade infrequently within a month still produce a meaningful monthly return.
Factor portfolio sorts are conventionally conducted at monthly frequency.
Statistical tests have better size properties when microstructure noise is reduced.

However, monthly aggregation does not eliminate all biases. Stocks with zero returns for an entire month still contribute stale observations. The Dimson and Scholes-Williams corrections should still be applied at monthly frequency for beta estimation.

4.8 Implications for Asset Pricing Tests

4.8.1 Factor Model Estimation

Standard factor model estimation assumes that returns are observed synchronously and without censoring. In Vietnam, both assumptions are violated. The practical consequences are in Table 4.11

Table 4.11: Standard Assumptions and Their Violations

Assumption	Violation in Vietnam	Consequence
Synchronous observation	Thin trading	Biased betas, attenuated R²
Uncensored returns	Price limits	Truncated distributions, biased moments
Continuous trading	Discrete ticks	Return discreteness, bid-ask bounce
No transaction costs	Wide spreads	Overstated portfolio returns

4.8.2 Adjusted Testing Procedure

We recommend the following adjustments to standard asset pricing tests when applied to Vietnamese data:

Beta estimation: Use Dimson ($K \ge 1$) or Scholes-Williams betas, not OLS betas.
Factor construction: When forming size and value portfolios, apply liquidity filters before sorting. Consider excluding the smallest quintile of stocks by market capitalization, which is most affected by thin trading.
Return aggregation: Use monthly frequency. If daily analysis is necessary, include lagged market returns in the time-series regression.
Robust inference: Cluster standard errors by stock to account for persistent microstructure-induced serial correlation. Use Newey-West HAC standard errors with sufficient lags.
Price limit adjustment: For volatility analysis or risk measurement, consider the Chu and Qiu (2019) approach of modeling the latent (uncensored) return distribution using truncated regression:

\[ r_{i,t}^* \sim N(\mu_i, \sigma_i^2), \quad r_{i,t}^{obs} = \max(\underline{L}, \min(\bar{L}, r_{i,t}^*)) \tag{4.8}\]

Estimate $\mu_i$ and $\sigma_i^2$ via maximum likelihood for the truncated normal.

from scipy.optimize import minimize
from scipy.stats import norm

def truncated_normal_nll(params, returns, lower, upper):
    """Negative log-likelihood of truncated normal."""
    mu, log_sigma = params
    sigma = np.exp(log_sigma)

    # Interior observations
    interior = (returns > lower) & (returns < upper)
    ll_interior = norm.logpdf(returns[interior], mu, sigma)

    # Lower censored
    ll_lower = norm.logcdf(lower, mu, sigma)
    n_lower = (returns <= lower).sum()

    # Upper censored
    ll_upper = np.log(1 - norm.cdf(upper, mu, sigma) + 1e-15)
    n_upper = (returns >= upper).sum()

    nll = -(ll_interior.sum() + n_lower * ll_lower + n_upper * ll_upper)
    return nll

def estimate_true_volatility(returns, limit_band):
    """Estimate latent volatility correcting for price limit censoring."""
    result = minimize(
        truncated_normal_nll,
        x0=[returns.mean(), np.log(returns.std())],
        args=(returns.values, -limit_band, limit_band),
        method="Nelder-Mead"
    )
    mu, log_sigma = result.x
    return np.exp(log_sigma)

Sensitivity reporting: Always report key results under alternative specifications: with and without liquidity filters, using OLS vs. Dimson betas, at daily vs. monthly frequency, and using observed vs. truncation-corrected volatility.

4.9 Summary

This chapter has established that Vietnamese equity markets exhibit microstructure characteristics that materially affect observed prices, returns, and risk measures. The key findings are:

Price limits censor daily returns, inducing positive autocorrelation, volatility spillover, and truncated distributions. The $\pm$ 7% band on HOSE is particularly restrictive for volatile stocks.
Thin trading and zero returns afflict a substantial fraction of listed firms. Trading probabilities below 50% are common on HNX and UPCoM, generating non-synchronous trading bias that attenuates OLS beta estimates by 20-40%.
Illiquidity varies dramatically across the cross-section, with Amihud ratios spanning several orders of magnitude. Value-weighted portfolio returns are less contaminated than equal-weighted returns.
The Dimson and Scholes-Williams beta corrections effectively address non-synchronous trading bias and should be used as the default beta estimator for Vietnamese equities.
Liquidity filters should be applied before any asset pricing analysis, and results should be reported with and without these filters as a robustness check.

Ignoring these frictions does not merely add noise to empirical results, it systematically biases estimates in predictable directions. The diagnostics and corrections presented in this chapter provide the foundation for credible empirical asset pricing in Vietnam.

Amihud, Yakov. 2002. “Illiquidity and Stock Returns: Cross-Section and Time-Series Effects.” Journal of Financial Markets 5 (1): 31–56.

Bali, Turan G, Robert F Engle, and Scott Murray. 2016. Empirical Asset Pricing: The Cross Section of Stock Returns. John Wiley & Sons.

Barber, Brad M, Yi-Tsung Lee, Yu-Jane Liu, and Terrance Odean. 2009. “Just How Much Do Individual Investors Lose by Trading?” The Review of Financial Studies 22 (2): 609–32.

Bessembinder, Hendrik. 2003. “Trade Execution Costs and Market Quality After Decimalization.” Journal of Financial and Quantitative Analysis 38 (4): 747–77.

Chu, Xiaojun, and Jianying Qiu. 2019. “Forecasting Volatility with Price Limit Hits—Evidence from Chinese Stock Market.” Emerging Markets Finance and Trade 55 (5): 1034–50.

Comerton-Forde, Carole, and Kar Mei Tang. 2009. “Anonymity, Liquidity and Fragmentation.” Journal of Financial Markets 12 (3): 337–67.

Dimson, Elroy. 1979. “Risk Measurement When Shares Are Subject to Infrequent Trading.” Journal of Financial Economics 7 (2): 197–226.

Fama, Eugene F., and Kenneth R. French. 1989. “Business conditions and expected returns on stocks and bonds.” Journal of Financial Economics 25 (1): 23–49. https://doi.org/10.1016/0304-405X(89)90095-0.

Glosten, Lawrence R, and Paul R Milgrom. 1985. “Bid, Ask and Transaction Prices in a Specialist Market with Heterogeneously Informed Traders.” Journal of Financial Economics 14 (1): 71–100.

Goyenko, Ruslan Y, Craig W Holden, and Charles A Trzcinka. 2009. “Do Liquidity Measures Measure Liquidity?” Journal of Financial Economics 92 (2): 153–81.

Goyenko, Ruslan Y, and Andrey D Ukhov. 2009. “Stock and Bond Market Liquidity: A Long-Run Empirical Analysis.” Journal of Financial and Quantitative Analysis 44 (1): 189–212.

Hasbrouck, Joel. 2007. Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading. Oxford University Press.

———. 2009. “Trading Costs and Returns for US Equities: Estimating Effective Costs from Daily Data.” The Journal of Finance 64 (3): 1445–77.

Hillion, Pierre, and Matti Suominen. 2004. “The Manipulation of Closing Prices.” Journal of Financial Markets 7 (4): 351–75.

Huang, Roger D, and Hans R Stoll. 1997. “The Components of the Bid-Ask Spread: A General Approach.” The Review of Financial Studies 10 (4): 995–1034.

Kaniel, Ron, Shuming Liu, Gideon Saar, and Sheridan Titman. 2012. “Individual Investor Trading and Return Patterns Around Earnings Announcements.” The Journal of Finance 67 (2): 639–80.

Kim, Kenneth A, and S Ghon Rhee. 1997. “Price Limit Performance: Evidence from the Tokyo Stock Exchange.” The Journal of Finance 52 (2): 885–901.

Kyle, Albert S. 1985. “Continuous Auctions and Insider Trading.” Econometrica: Journal of the Econometric Society, 1315–35.

Lo, Andrew W, and A Craig MacKinlay. 1990. “An Econometric Analysis of Nonsynchronous Trading.” Journal of Econometrics 45 (1-2): 181–211.

Pástor, L’uboš, and Robert F Stambaugh. 2003. “Liquidity Risk and Expected Stock Returns.” Journal of Political Economy 111 (3): 642–85.

Roll, Richard. 1984. “A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient Market.” The Journal of Finance 39 (4): 1127–39.

Scholes, Myron, and Joseph Williams. 1977. “Estimating Betas from Nonsynchronous Data.” Journal of Financial Economics 5 (3): 309–27.

Subrahmanyam, Avanidhar. 1994. “Circuit Breakers and Market Volatility: A Theoretical Perspective.” The Journal of Finance 49 (1): 237–54.

Vo, Duc Hong, and Bao Doan. 2023. “Minimum Tick Size, Market Quality and Costs of Trade Execution in Vietnam.” Plos One 18 (5): e0285821.

# Market Microstructure in Vietnam ::: callout-note In this chapter, we examine how the institutional design of Vietnamese equity markets, such as trading sessions, price limits, order types, and investor composition, shapes observed prices, returns, and liquidity. We quantify microstructure frictions and demonstrate why ignoring these frictions leads to biased inference in asset pricing tests. ::: Market microstructure is the study of how trading rules, order handling mechanisms, and market design affect price formation, transaction costs, and liquidity. The field, pioneered by @kyle1985continuous, @glosten1985bid, and @hasbrouck2007empirical, provides the analytical toolkit for understanding why observed prices may deviate from fundamental values and for how long. In developed markets with continuous electronic trading, designated market makers, and minimal regulatory constraints on price movement, microstructure frictions are typically second-order concerns for researchers working at monthly or lower frequencies. In Vietnam's equity markets, this is emphatically not the case. Daily price limits, thin trading, a predominantly retail investor base, discrete tick sizes, and the absence of formal market-making arrangements generate frictions that propagate into monthly returns, distort factor loadings, and bias portfolio-level inference. Any serious empirical analysis of Vietnamese equities must therefore begin with a careful assessment of market microstructure. This chapter provides that assessment. We first describe the institutional architecture of Vietnamese equity trading. We then develop diagnostics for the most consequential frictions, such as price limit hits, zero-return days, illiquidity, and non-synchronous trading, and quantify their severity in the cross-section of listed firms. Finally, we derive practical guidance for adjusting portfolio construction and asset pricing tests. ## What Is Market Microstructure? The textbook assumption of frictionless markets implies that prices continuously and costlessly incorporate information. Under this assumption, the observed return on any asset at any frequency equals the "true" return dictated by fundamentals. Market microstructure relaxes this assumption by recognizing that prices are generated by a specific trading process with real costs, constraints, and imperfections. The canonical framework of @kyle1985continuous models a market with three types of participants: 1. an informed trader who knows the asset's fundamental value, 2. noise traders who trade for liquidity reasons, and 3. a market maker who sets prices to break even in expectation. The key insight is that the market maker cannot distinguish informed from uninformed order flow, so prices adjust gradually to information, creating a wedge between the transaction price and the fundamental value. The size of this wedge (the bid-ask spread) and the speed of price adjustment (market depth) are the core objects of microstructure theory. @glosten1985bid extend this framework to a sequential trade setting and show that the bid-ask spread has two components: an adverse selection component (compensation for trading against informed traders) and an order processing component (compensation for the mechanical costs of trading). @huang1997components further decompose the spread into realized spread and price impact components. These decompositions are important because they reveal different sources of trading costs and have different implications for market quality. For empirical asset pricing, the key question is: at what frequency and under what conditions do microstructure effects become negligible? In highly liquid markets, @bali2016empirical argue that monthly returns are largely free of microstructure contamination. In Vietnam, as we demonstrate below, this is not the case. Microstructure effects persist at monthly and even quarterly frequencies for a substantial fraction of listed firms. ### The Microstructure-Asset Pricing Interface The interface between microstructure and asset pricing operates through several channels. First, illiquidity itself may be a priced risk factor. @amihud2002illiquidity shows that expected illiquidity is positively related to expected stock returns, implying a liquidity premium. @pastor2003liquidity develop an equilibrium model in which liquidity risk (i.e., the covariance of a stock's liquidity with market liquidity) commands a risk premium. Second, microstructure noise in prices biases estimated betas, factor loadings, and test statistics. @scholes1977estimating first identified this bias in the context of non-synchronous trading, and @dimson1979risk proposed an aggregated-coefficients estimator to correct it. Third, price limits and other regulatory constraints censor the return distribution, creating truncation bias in volatility estimates, return moments, and extreme-value statistics [@kim1997price]. @tbl-microstructure-interface summarizes these channels and their empirical consequences. | Channel | Mechanism | Empirical Consequence | |--------------------|------------------|----------------------------------| | Illiquidity premium | Compensation for bearing transaction costs and inventory risk | Cross-sectional return predictability by liquidity measures | | Non-synchronous trading | Infrequent trading creates stale prices | Downward-biased betas, attenuated correlations, and spurious lead-lag | | Price limits | Regulatory censoring of daily returns | Truncated return distributions, volatility spillover, and artificial autocorrelation | | Discrete tick sizes | Prices constrained to a grid | Bid-ask bounce, return discreteness, biased volatility | | Investor composition | Retail-dominated order flow | Noise trading, herding, sentiment-driven pricing | : Channels Through Which Microstructure Affects Asset Pricing {#tbl-microstructure-interface} ## Trading Architecture in Vietnam Vietnam operates two stock exchanges: the Ho Chi Minh Stock Exchange (HOSE), established in 2000, and the Hanoi Stock Exchange (HNX), established in 2005. HOSE lists larger firms and accounts for the majority of market capitalization and trading volume. HNX lists smaller firms and also operates the Unlisted Public Company Market (UPCoM) for firms that have not yet met full listing requirements. All three venues operate electronic limit order book systems without designated market makers. ### Exchange Characteristics @tbl-exchange-comparison presents the key structural differences between HOSE, HNX, and UPCoM. These differences have direct implications for liquidity, price discovery, and the severity of microstructure frictions. | Feature | HOSE | HNX | UPCoM | |-------------------|------------------|------------------|------------------| | Established | 2000 | 2005 | 2009 | | Listing tier | Large-cap | Mid/small-cap | Pre-listing | | Daily price limit | $\pm$ 7% | $\pm$ 10% | $\pm$ 15% | | Tick size regime | Tiered by price | Tiered by price | 100 VND | | Trading lot | 100 shares | 100 shares | 100 shares | | Short selling | Limited | Not available | Not available | | Foreign ownership cap | Industry-specific | Industry-specific | Industry-specific | : Exchange Comparison: HOSE, HNX, and UPCoM {#tbl-exchange-comparison} The heterogeneous price limit bands across exchanges create a natural experiment for studying limit effects. HOSE's tighter $\pm$ 7% band means that large-cap stocks are more frequently constrained than mid-cap stocks on HNX, conditional on the same information shock. UPCoM's wider $\pm$ 15% band provides the least constrained environment, though its stocks are also the least liquid. ### Trading Sessions Each exchange operates a structured trading day with distinct sessions. Understanding session structure is essential because price formation mechanisms differ across sessions, and certain sessions are disproportionately important for benchmark pricing (@tbl-trading-sessions). | Session | Time | Mechanism | Price Discovery Role | |---|---|---|---| | Pre-opening | 08:30–09:00 | Order entry only, no matching | Reveals pre-open demand/supply | | Opening auction (ATO) | 09:00–09:15 | Batch auction, single price | Sets opening price from accumulated orders | | Continuous trading (Morning) | 09:15–11:30 | Continuous limit order matching | Primary price discovery | | Lunch break | 11:30–13:00 | No trading | — | | Continuous trading (Afternoon) | 13:00–14:30 | Continuous limit order matching | Primary price discovery | | Closing auction (ATC) | 14:30–14:45 | Batch auction, single price | Sets closing price (benchmark) | | Post-closing | 14:45–15:00 | Put-through (negotiated) trades | Block and negotiated transactions | : Trading Session Structure on HOSE {#tbl-trading-sessions} The closing auction (ATC) deserves particular attention. The ATC price is the official closing price used for index calculation, NAV computation, and margin requirements. Because it is determined by a single-bid auction, it can be manipulated by strategically timed orders, a phenomenon documented in numerous emerging markets [@comerton2009anonymity; @hillion2004manipulation]. Researchers using daily closing prices should be aware that ATC prices may not reflect the continuous-session equilibrium, particularly for less liquid stocks where a single large order can move the closing price. ### Order Types and Matching Rules Vietnamese exchanges support a limited set of order types compared to developed markets (@tbl-order-types). | Order Type | Description | Availability | |-----------------------|-------------------------|-------------------------| | Limit order (LO) | Specifies price and quantity | All sessions | | Market order (ATO/ATC) | Matches at auction price | Auction sessions only | | Market-to-limit (MTL) | Converts to limit at best available | HNX only | : Available Order Types {#tbl-order-types} The absence of iceberg orders, stop orders, and hidden orders means that the full limit order book is visible to all participants. While this enhances pre-trade transparency, it also means that large institutional orders face significant information leakage risk, which may deter institutional participation and reduce market depth. Orders are matched on a strict price-time priority basis during continuous sessions. During auction sessions, a single clearing price is determined that maximizes executed volume. If multiple prices satisfy this criterion, the price closest to the previous closing price is selected. ### Tick Size Structure Tick sizes on HOSE are tiered by price level, which creates discontinuities in the bid-ask spread as a percentage of price (@tbl-tick-sizes). | Price Range (VND) | Tick Size (VND) | Minimum Spread as % of Midpoint | |---|---|---| | < 10,000 | 10 | 0.10% at 10,000 | | 10,000–49,900 | 50 | 0.10% at 50,000 | | ≥ 50,000 | 100 | 0.20% at 50,000 | : Tick Size Schedule on HOSE {#tbl-tick-sizes} The jump from a 50 VND tick to a 100 VND tick at the 50,000 VND boundary means that the minimum percentage spread doubles discontinuously. This creates a "tick size cliff" that can affect the cross-sectional distribution of bid-ask spreads and, consequently, the measurement of illiquidity [@vo2023minimum]. @bessembinder2003trade document similar effects in other markets with tiered tick structures. ### Investor Composition The Vietnamese equity market is predominantly driven by retail investors. While foreign institutional investors account for a meaningful share of market capitalization (particularly in blue-chip stocks subject to foreign ownership limits), daily trading volume is overwhelmingly generated by domestic retail accounts. ```{python} #| label: fig-investor-composition #| include: false #| fig-cap: "Approximate Investor Composition by Trading Value on HOSE" import matplotlib.pyplot as plt import numpy as np categories = [ "Domestic Retail", "Domestic Institutional", "Foreign Institutional", "Foreign Retail" ] shares = [85, 5, 8, 2] colors = ["#2C73D2", "#0089BA", "#00B7C7", "#5DCEAF"] fig, ax = plt.subplots(figsize=(6.5, 4)) bars = ax.barh(categories, shares, color=colors, edgecolor="white", height=0.6) for bar, share in zip(bars, shares): ax.text( bar.get_width() + 1, bar.get_y() + bar.get_height() / 2, f"{share}%", va="center", fontsize=10 ) ax.set_xlabel("Share of Trading Value (%)") ax.set_xlim(0, 100) ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) plt.tight_layout() plt.show() ``` This retail dominance has several consequences for microstructure. First, retail investors tend to submit smaller orders and trade more frequently, generating high message-to-trade ratios but limited depth at each price level. Second, retail order flow is more susceptible to herding and sentiment, which can amplify momentum and generate excess volatility [@barber2009just; @kaniel2012individual]. Third, the limited institutional presence means that sophisticated liquidity provision is scarce, particularly in mid- and small-cap stocks. ## Price Limits and Their Consequences Vietnam enforces daily price limits on all listed equities. A stock's price cannot move beyond a fixed percentage of the previous day's closing price within a single trading day. The limit bands are $\pm$ 7% on HOSE, $\pm$ 10% on HNX, and $\pm$ 15% on UPCoM. ### Theoretical Framework Price limits were introduced with the stated goal of reducing volatility and preventing panic-driven price dislocations. However, the academic literature presents a more nuanced picture. The "magnet effect" hypothesis [@subrahmanyam1994circuit] predicts that price limits actually accelerate price movement toward the limit as traders rush to execute before the limit is hit. The "delayed price discovery" hypothesis [@Fama1989] argues that limits merely postpone inevitable price adjustments, creating volatility spillover into subsequent days. Formally, let $P_t^*$ denote the equilibrium price on day $t$ and $P_{t-1}^c$ the previous closing price. The observed return is: $$ r_t^{obs} = \begin{cases} \bar{L} & \text{if } r_t^* \geq \bar{L} \\ r_t^* & \text{if } \underline{L} < r_t^* < \bar{L} \\ \underline{L} & \text{if } r_t^* \leq \underline{L} \end{cases} $$ {#eq-price-limit} where $r_t^* = \ln(P_t^* / P_{t-1}^c)$ is the latent (unconstrained) return, $\bar{L}$ is the upper limit, and $\underline{L}$ is the lower limit. The observed return $r_t^{obs}$ is a censored version of the true return. This censoring has several consequences: 1. **Truncated moments**: The observed variance $\text{Var}(r_t^{obs}) < \text{Var}(r_t^*)$ because extreme returns are clipped. This biases downward any volatility-based risk measure. 2. **Artificial autocorrelation**: When $r_t^{obs} = \bar{L}$ and $r_{t+1}^{obs} > 0$ (continued adjustment the next day), the return series exhibits positive autocorrelation that is purely mechanical, not informational. 3. **Volatility spillover**: Define excess volatility on day $t+1$ as $\sigma_{t+1}^2 - E[\sigma_{t+1}^2 | \text{no limit hit on day } t]$. @kim1997price and @chu2019forecasting document significant positive spillover, where days following limit hits exhibit abnormally high volatility. 4. **Biased extreme value statistics**: Measures such as Value-at-Risk, Expected Shortfall, and maximum drawdown are mechanically bounded by the limit, understating true tail risk. ### Detecting Price Limit Hits We now implement a diagnostic to detect price limit hits in the daily data. ```{python} #| label: detect-limit-hits import pandas as pd import numpy as np import sqlite3 # Load daily price data tidy_finance = sqlite3.connect(database="data/tidy_finance_python.sqlite") # Assume prices_daily contains: symbol, date, close, exchange prices_daily = pd.read_sql_query( # , exchange sql=""" SELECT symbol, date, close FROM prices_daily """, con=tidy_finance, parse_dates=["date"] ).dropna() ``` ```{python} #| eval: false # Define limit bands by exchange limit_bands = {"HOSE": 0.07, "HNX": 0.10, "UPCoM": 0.15} prices_daily = prices_daily.sort_values(["symbol", "date"]) prices_daily["prev_close"] = prices_daily.groupby("symbol")["close"].shift(1) prices_daily["ret"] = prices_daily["close"] / prices_daily["prev_close"] - 1 prices_daily["limit_band"] = prices_daily["exchange"].map(limit_bands) # A limit hit occurs when the return is within 0.1% of the theoretical limit tolerance = 0.001 prices_daily["upper_hit"] = ( prices_daily["ret"] >= prices_daily["limit_band"] - tolerance ) prices_daily["lower_hit"] = ( prices_daily["ret"] <= -prices_daily["limit_band"] + tolerance ) prices_daily["limit_hit"] = ( prices_daily["upper_hit"] | prices_daily["lower_hit"] ) ``` ### Frequency of Limit Hits ```{python} #| eval: false #| label: fig-limit-hit-frequency #| fig-cap: "Monthly Frequency of Price Limit Hits by Exchange" prices_daily["year_month"] = prices_daily["date"].dt.to_period("M") limit_hit_monthly = ( prices_daily .groupby(["year_month", "exchange"]) .agg( total_obs=("limit_hit", "count"), limit_hits=("limit_hit", "sum") ) .reset_index() ) limit_hit_monthly["hit_rate"] = ( limit_hit_monthly["limit_hits"] / limit_hit_monthly["total_obs"] ) limit_hit_monthly["date"] = limit_hit_monthly["year_month"].dt.to_timestamp() fig, ax = plt.subplots(figsize=(8, 4)) for exchange, color in zip( ["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"] ): subset = limit_hit_monthly[limit_hit_monthly["exchange"] == exchange] ax.plot( subset["date"], subset["hit_rate"] * 100, label=exchange, color=color, linewidth=1.2 ) ax.set_ylabel("Limit Hit Rate (%)") ax.set_xlabel("") ax.legend(frameon=False) ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) plt.tight_layout() plt.show() ``` ### Volatility Spillover Test Following @kim1997price, we test whether days following a limit hit exhibit abnormally high volatility. Define the dummy variable $D_t = 1$ if a limit hit occurred on day $t$, and estimate: $$ \sigma_{t+1}^2 = \alpha + \beta D_t + \gamma \sigma_t^2 + \varepsilon_{t+1} $$ {#eq-vol-spillover} where $\sigma_t^2$ is the squared return. A positive and significant $\beta$ indicates volatility spillover attributable to the price limit. ```{python} #| eval: false #| label: volatility-spillover-test import statsmodels.api as sm # Panel-level volatility spillover test prices_daily["sq_ret"] = prices_daily["ret"] ** 2 prices_daily["sq_ret_lead"] = prices_daily.groupby("symbol")["sq_ret"].shift(-1) prices_daily["limit_hit_int"] = prices_daily["limit_hit"].astype(int) spillover_data = prices_daily.dropna(subset=["sq_ret_lead", "sq_ret"]) X = sm.add_constant(spillover_data[["limit_hit_int", "sq_ret"]]) y = spillover_data["sq_ret_lead"] model = sm.OLS(y, X).fit(cov_type="cluster", cov_kwds={"groups": spillover_data["symbol"]}) spillover_results = pd.DataFrame({ "Coefficient": model.params, "Std. Error": model.bse, "t-stat": model.tvalues, "p-value": model.pvalues }).round(6) print(spillover_results) ``` ::: callout-tip A significant positive coefficient on the limit hit dummy confirms that Vietnamese price limits do not eliminate volatility, they merely redistribute it across days. This has direct implications for risk management: daily VaR measures computed from censored returns understate true risk exposure. ::: ### Return Autocorrelation Induced by Price Limits Price limits mechanically induce positive autocorrelation in returns. To quantify this, we compute the first-order autocorrelation coefficient separately for stocks that hit limits frequently versus those that do not. ```{python} #| eval: false #| label: tbl-autocorrelation-by-limit #| tbl-cap: "Return Autocorrelation by Price Limit Hit Frequency" # Classify stocks by limit hit frequency stock_limit_freq = ( prices_daily .groupby("symbol") .agg( hit_rate=("limit_hit", "mean"), n_obs=("ret", "count") ) .query("n_obs >= 250") # At least 1 year of data ) stock_limit_freq["limit_group"] = pd.qcut( stock_limit_freq["hit_rate"], q=3, labels=["Low", "Medium", "High"] ) # Compute autocorrelation by group def compute_autocorr(group_symbols): subset = prices_daily[prices_daily["symbol"].isin(group_symbols)].copy() subset["ret_lag"] = subset.groupby("symbol")["ret"].shift(1) return subset[["ret", "ret_lag"]].dropna().corr().iloc[0, 1] autocorr_results = [] for group in ["Low", "Medium", "High"]: symbols = stock_limit_freq[stock_limit_freq["limit_group"] == group].index ac = compute_autocorr(symbols) n_stocks = len(symbols) avg_hit_rate = stock_limit_freq.loc[symbols, "hit_rate"].mean() autocorr_results.append({ "Group": group, "N Stocks": n_stocks, "Avg Limit Hit Rate (%)": round(avg_hit_rate * 100, 2), "AR(1)": round(ac, 4) }) pd.DataFrame(autocorr_results).style.hide(axis="index") ``` The expected pattern is a monotonically increasing autocorrelation from the Low to High limit-hit group, confirming that the observed serial dependence in returns is at least partly an artifact of price censoring rather than genuine return predictability. ## Liquidity, Thin Trading, and Zero Returns Liquidity (i.e., the ability to trade quickly at low cost without moving the price) is a first-order concern in Vietnamese equities. A substantial fraction of listed firms, particularly on HNX and UPCoM, experience chronic illiquidity characterized by infrequent trading, wide bid-ask spreads, and frequent zero-return days. ### Measuring Liquidity The academic literature has developed numerous liquidity measures, each capturing a different dimension of market quality. @#tbl-liquidity-measures summarizes the measures most applicable to Vietnamese data, given typical data availability. | Measure | Formula | Interpretation | Data Required | |------------------|------------------|------------------|------------------| | Turnover ratio | $\text{TO}_{i,t} = \frac{\text{Volume}_{i,t}}{\text{Shares Outstanding}_{i}}$ | Trading intensity relative to float | Volume, shares outstanding | | Amihud illiquidity | $\text{ILLIQ}_{i,t} = \frac{1}{D} \sum_{d=1}^{D} \frac{|r_{i,d}|}{V_{i,d}}$ | Price impact per unit of volume | Daily returns, daily volume | | Zero-return proportion | $\text{ZR}_{i,t} = \frac{\#\{d : r_{i,d} = 0\}}{D}$ | Frequency of non-trading or stale pricing | Daily returns | | Roll spread | $\hat{S}_i = 2\sqrt{-\text{Cov}(r_{i,d}, r_{i,d-1})}$ | Effective bid-ask spread estimate | Daily returns | | Bid-ask spread | $\text{BA}_{i,d} = \frac{\text{Ask}_{i,d} - \text{Bid}_{i,d}}{(\text{Ask}_{i,d} + \text{Bid}_{i,d})/2}$ | Direct transaction cost | Quote data | : Liquidity Measures for Vietnamese Equities {#tbl-liquidity-measures} The Amihud illiquidity ratio [@amihud2002illiquidity] is particularly useful because it requires only daily return and volume data. It captures the price impact of trading (i.e., the return per unit of currency volume) and has been shown to correlate well with more sophisticated microstructure-based measures such as the effective spread [see @goyenko2009liquidity and @goyenko2009stock for a comprehensive comparison]. ### Computing Liquidity Diagnostics ```{python} #| label: compute-liquidity #| eval: false # Compute standard liquidity measures at the stock-month level prices_daily["abs_ret"] = prices_daily["ret"].abs() prices_daily["zero_return"] = (prices_daily["ret"] == 0).astype(int) prices_daily["year_month"] = prices_daily["date"].dt.to_period("M") # Assume volume is in shares and value is in VND # Amihud: average |ret| / value (in billions VND) prices_daily["amihud_daily"] = np.where( prices_daily["value"] > 0, prices_daily["abs_ret"] / (prices_daily["value"] / 1e9), np.nan ) liquidity_monthly = ( prices_daily .groupby(["symbol", "year_month"]) .agg( zero_return_share=("zero_return", "mean"), avg_turnover=("turnover", "mean"), amihud=("amihud_daily", "mean"), trading_days=("ret", "count"), avg_daily_value=("value", "mean") ) .reset_index() ) # Flag severely illiquid stock-months liquidity_monthly["illiquid_flag"] = ( (liquidity_monthly["zero_return_share"] > 0.5) | (liquidity_monthly["trading_days"] < 10) | (liquidity_monthly["avg_daily_value"] < 1e8) # < 100M VND/day ) ``` ### Cross-Sectional Distribution of Liquidity ```{python} #| label: tbl-liquidity-distribution #| eval: false #| tbl-cap: "Cross-Sectional Distribution of Liquidity Measures (Latest Full Year)" latest_year = liquidity_monthly["year_month"].dt.year.max() annual_liq = ( liquidity_monthly[liquidity_monthly["year_month"].dt.year == latest_year] .groupby("symbol") .agg( zero_return_share=("zero_return_share", "mean"), avg_turnover=("avg_turnover", "mean"), amihud=("amihud", "mean"), avg_daily_value_m=("avg_daily_value", lambda x: x.mean() / 1e6) ) ) summary_stats = annual_liq.describe(percentiles=[0.10, 0.25, 0.50, 0.75, 0.90]).T summary_stats = summary_stats[ ["mean", "std", "10%", "25%", "50%", "75%", "90%"] ].round(4) summary_stats.columns = ["Mean", "Std", "P10", "P25", "Median", "P75", "P90"] summary_stats.index = [ "Zero-Return Share", "Avg Daily Turnover", "Amihud Illiquidity", "Avg Daily Value (M VND)" ] summary_stats ``` ### Liquidity Distribution Across Exchanges ```{python} #| label: fig-liquidity-by-exchange #| eval: false #| fig-cap: "Distribution of Zero-Return Share and Amihud Illiquidity by Exchange" # Merge exchange info stock_exchange = ( prices_daily[["symbol", "exchange"]] .drop_duplicates("symbol") ) annual_liq_exch = annual_liq.merge( stock_exchange, left_index=True, right_on="symbol" ) fig, axes = plt.subplots(1, 2, figsize=(10, 4)) # Zero-return share for exchange, color in zip( ["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"] ): subset = annual_liq_exch[annual_liq_exch["exchange"] == exchange] axes[0].hist( subset["zero_return_share"], bins=30, alpha=0.6, color=color, label=exchange, density=True ) axes[0].set_xlabel("Zero-Return Share") axes[0].set_ylabel("Density") axes[0].legend(frameon=False) axes[0].spines["top"].set_visible(False) axes[0].spines["right"].set_visible(False) # Amihud (log scale) for exchange, color in zip( ["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"] ): subset = annual_liq_exch[annual_liq_exch["exchange"] == exchange] amihud_log = np.log(subset["amihud"].clip(lower=1e-10)) axes[1].hist( amihud_log, bins=30, alpha=0.6, color=color, label=exchange, density=True ) axes[1].set_xlabel("Log Amihud Illiquidity") axes[1].set_ylabel("Density") axes[1].legend(frameon=False) axes[1].spines["top"].set_visible(False) axes[1].spines["right"].set_visible(False) plt.tight_layout() plt.show() ``` The distributions typically reveal a bimodal pattern: HOSE stocks cluster at low illiquidity values, while HNX and especially UPCoM stocks exhibit a long right tail of extreme illiquidity. This heterogeneity implies that a single liquidity filter or treatment is insufficient for the entire cross-section. ### Time Variation in Aggregate Liquidity Market-wide liquidity is not constant. It deteriorates during crises, policy uncertainty, and periods of capital outflow, and improves during bull markets and periods of foreign inflow. The time variation in aggregate liquidity is itself a risk factor [@pastor2003liquidity]. ```{python} #| label: fig-aggregate-liquidity #| eval: false #| fig-cap: "Time Series of Aggregate Liquidity (Market-Wide Amihud Illiquidity)" agg_liquidity = ( liquidity_monthly .groupby("year_month") .agg( median_amihud=("amihud", "median"), median_zero_ret=("zero_return_share", "median"), total_value=("avg_daily_value", "sum") ) .reset_index() ) agg_liquidity["date"] = agg_liquidity["year_month"].dt.to_timestamp() fig, ax1 = plt.subplots(figsize=(8, 4)) ax1.plot( agg_liquidity["date"], np.log(agg_liquidity["median_amihud"].clip(lower=1e-10)), color="#2C73D2", linewidth=1.2 ) ax1.set_ylabel("Log Median Amihud", color="#2C73D2") ax1.tick_params(axis="y", labelcolor="#2C73D2") ax2 = ax1.twinx() ax2.fill_between( agg_liquidity["date"], agg_liquidity["median_zero_ret"] * 100, alpha=0.3, color="#FF6B6B" ) ax2.set_ylabel("Median Zero-Return Share (%)", color="#FF6B6B") ax2.tick_params(axis="y", labelcolor="#FF6B6B") ax1.spines["top"].set_visible(False) plt.tight_layout() plt.show() ``` ::: callout-important ## Practical Recommendation Before any asset pricing analysis, apply the following liquidity filter: exclude stock-months where the zero-return share exceeds 50%, where fewer than 15 trading days are observed, or where average daily trading value falls below a threshold (e.g., 100 million VND). Document the filter explicitly, and report sensitivity of results to alternative thresholds. ::: ## Bid-Ask Spread Estimation In the absence of comprehensive quote data, the effective bid-ask spread can be estimated from transaction data using the method of @roll1984simple. The Roll estimator exploits the fact that if the bid-ask bounce is the sole source of negative serial covariance in returns, then: $$ \hat{S}_{\text{Roll}} = 2\sqrt{-\text{Cov}(\Delta p_t, \Delta p_{t-1})} $$ {#eq-roll} where $\Delta p_t = p_t - p_{t-1}$ is the price change. When the autocovariance is positive (which occurs when information-driven serial correlation dominates the bid-ask bounce), the Roll estimator is undefined. @hasbrouck2009trading proposes a Bayesian variant that handles this case by imposing a prior on the spread. ```{python} #| label: compute-roll-spread #| eval: false # Compute Roll spread estimate at the stock-month level prices_daily["dprice"] = prices_daily.groupby("symbol")["close"].diff() prices_daily["dprice_lag"] = prices_daily.groupby("symbol")["dprice"].shift(1) roll_cov = ( prices_daily .groupby(["symbol", "year_month"]) .apply( lambda g: g[["dprice", "dprice_lag"]].dropna().cov().iloc[0, 1], include_groups=False ) .reset_index(name="autocovariance") ) # Roll spread is defined only when autocovariance is negative roll_cov["roll_spread"] = np.where( roll_cov["autocovariance"] < 0, 2 * np.sqrt(-roll_cov["autocovariance"]), np.nan ) # As a percentage of price roll_cov = roll_cov.merge( prices_daily.groupby(["symbol", "year_month"])["close"].mean() .reset_index(name="avg_price"), on=["symbol", "year_month"] ) roll_cov["roll_spread_pct"] = roll_cov["roll_spread"] / roll_cov["avg_price"] * 100 ``` ```{python} #| label: tbl-roll-spread-summary #| eval: false #| tbl-cap: "Distribution of Roll Spread Estimates (% of Price)" roll_summary = ( roll_cov .dropna(subset=["roll_spread_pct"]) .groupby("year_month")["roll_spread_pct"] .describe(percentiles=[0.25, 0.50, 0.75]) .reset_index() ) # Show latest year summary latest_year_roll = roll_cov[ roll_cov["year_month"].dt.year == roll_cov["year_month"].dt.year.max() ] print( latest_year_roll["roll_spread_pct"] .dropna() .describe(percentiles=[0.10, 0.25, 0.50, 0.75, 0.90]) .round(3) ) ``` ## Non-Synchronous Trading Bias When stocks do not trade at the same frequency or at the same times, observed returns are misaligned. This non-synchronous trading bias, first formalized by @scholes1977estimating and @lo1990econometric, is one of the most consequential microstructure effects for asset pricing in thin markets. ### The Problem Suppose the true (unobserved) return process for stock $i$ follows a single-factor model: $$ r_{i,t}^* = \alpha_i + \beta_i r_{m,t}^* + \varepsilon_{i,t} $$ {#eq-true-factor} where $r_{m,t}^*$ is the true market return and $\beta_i$ is the true beta. If stock $i$ last traded $k$ days before the end of day $t$, the observed return incorporates information only up to day $t - k$. @scholes1977estimating show that the OLS estimate of beta from regressing observed returns on observed market returns is: $$ \hat{\beta}_i^{OLS} = \beta_i \cdot \pi_i $$ {#eq-beta-bias} where $\pi_i$ is the probability that stock $i$ trades on any given day. For a stock that trades on only 50% of days, the OLS beta is biased downward by 50%. This bias is severe in Vietnam, where many small-cap stocks trade on fewer than half of all trading days. ### Quantifying the Bias ```{python} #| label: fig-trading-frequency #| eval: false #| fig-cap: "Distribution of Daily Trading Frequency Across Listed Stocks" # Compute trading frequency: proportion of market days with nonzero volume market_days = prices_daily.groupby("year_month")["date"].nunique() trading_freq = ( prices_daily[prices_daily["value"] > 0] .groupby(["symbol", "year_month"])["date"] .nunique() .reset_index(name="days_traded") ) trading_freq = trading_freq.merge( market_days.reset_index().rename(columns={"date": "market_days"}), on="year_month" ) trading_freq["trade_prob"] = trading_freq["days_traded"] / trading_freq["market_days"] # Annual average annual_trade_freq = ( trading_freq .groupby("symbol")["trade_prob"] .mean() .reset_index(name="avg_trade_prob") ) fig, ax = plt.subplots(figsize=(7, 4)) ax.hist( annual_trade_freq["avg_trade_prob"], bins=50, color="#2C73D2", edgecolor="white", alpha=0.8 ) ax.axvline( annual_trade_freq["avg_trade_prob"].median(), color="#FF6B6B", linestyle="--", linewidth=1.5, label=f"Median = {annual_trade_freq['avg_trade_prob'].median():.2f}" ) ax.set_xlabel("Average Trading Probability (Fraction of Market Days)") ax.set_ylabel("Number of Stocks") ax.legend(frameon=False) ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) plt.tight_layout() plt.show() ``` ### The Dimson Beta Correction @dimson1979risk proposes a simple correction: include lagged and leading market returns in the beta regression: $$ r_{i,t} = \alpha_i + \sum_{k=-K}^{K} \beta_{i,k} \, r_{m,t-k} + \varepsilon_{i,t} $$ {#eq-dimson} The Dimson-corrected beta is $\hat{\beta}_i^{Dimson} = \sum_{k=-K}^{K} \hat{\beta}_{i,k}$. Typically $K = 1$ or $K = 2$ is sufficient. The summed coefficients capture the full response of the stock's observed return to market information, regardless of when the stock actually trades. ```{python} #| label: dimson-beta-estimation #| eval: false # Estimate Dimson betas with K=1 lag and lead # Merge market return market_ret = ( prices_daily .groupby("date") .apply( lambda g: np.average(g["ret"].dropna(), weights=g["mktcap"].loc[g["ret"].dropna().index]) if g["ret"].dropna().shape[0] > 0 else np.nan, include_groups=False ) .reset_index(name="rm") ) prices_daily = prices_daily.merge(market_ret, on="date", how="left") prices_daily["rm_lag1"] = prices_daily.groupby("symbol")["rm"].shift(1) prices_daily["rm_lead1"] = prices_daily.groupby("symbol")["rm"].shift(-1) def estimate_dimson_beta(group): """Estimate OLS and Dimson(K=1) betas for a single stock.""" g = group.dropna(subset=["ret", "rm", "rm_lag1", "rm_lead1"]) if len(g) < 60: return pd.Series({"beta_ols": np.nan, "beta_dimson": np.nan, "n_obs": len(g)}) # OLS beta X_ols = sm.add_constant(g["rm"]) ols_model = sm.OLS(g["ret"], X_ols).fit() beta_ols = ols_model.params["rm"] # Dimson beta X_dim = sm.add_constant(g[["rm_lag1", "rm", "rm_lead1"]]) dim_model = sm.OLS(g["ret"], X_dim).fit() beta_dimson = dim_model.params[["rm_lag1", "rm", "rm_lead1"]].sum() return pd.Series({ "beta_ols": beta_ols, "beta_dimson": beta_dimson, "n_obs": len(g) }) beta_comparison = ( prices_daily .groupby("symbol") .apply(estimate_dimson_beta, include_groups=False) .reset_index() ) ``` ```{python} #| label: fig-beta-comparison #| eval: false #| fig-cap: "OLS Beta vs. Dimson-Corrected Beta" beta_valid = beta_comparison.dropna() fig, ax = plt.subplots(figsize=(6, 6)) ax.scatter( beta_valid["beta_ols"], beta_valid["beta_dimson"], alpha=0.3, s=10, color="#2C73D2" ) lims = [ min(ax.get_xlim()[0], ax.get_ylim()[0]), max(ax.get_xlim()[1], ax.get_ylim()[1]) ] ax.plot(lims, lims, "--", color="gray", linewidth=1) ax.set_xlabel("OLS Beta") ax.set_ylabel("Dimson Beta (K=1)") ax.set_aspect("equal") ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) plt.tight_layout() plt.show() ``` The scatter plot should reveal a systematic pattern: Dimson betas exceed OLS betas for most stocks, with the discrepancy largest for thinly traded stocks. Points above the 45-degree line indicate stocks whose OLS betas are biased downward by non-synchronous trading. ```{python} #| label: tbl-beta-bias-by-liquidity #| eval: false #| tbl-cap: "Beta Bias by Trading Frequency Tercile" beta_with_freq = beta_valid.merge(annual_trade_freq, on="symbol") beta_with_freq["freq_tercile"] = pd.qcut( beta_with_freq["avg_trade_prob"], q=3, labels=["Low (Thin)", "Medium", "High (Liquid)"] ) beta_bias_summary = ( beta_with_freq .groupby("freq_tercile") .agg( n_stocks=("symbol", "count"), avg_trade_prob=("avg_trade_prob", "mean"), mean_beta_ols=("beta_ols", "mean"), mean_beta_dimson=("beta_dimson", "mean"), median_beta_ols=("beta_ols", "median"), median_beta_dimson=("beta_dimson", "median") ) .round(3) ) beta_bias_summary["bias_pct"] = ( (beta_bias_summary["mean_beta_dimson"] - beta_bias_summary["mean_beta_ols"]) / beta_bias_summary["mean_beta_dimson"] * 100 ).round(1) beta_bias_summary ``` ::: callout-warning For the thinnest-traded tercile, OLS beta underestimates true systematic risk by 20-40% on average. Using uncorrected betas for cost of equity estimation or factor model tests will produce systematically incorrect results for these stocks. ::: ### The Scholes-Williams Estimator An alternative correction, proposed by @scholes1977estimating, estimates beta as: $$ \hat{\beta}_i^{SW} = \frac{\hat{\beta}_{i,-1} + \hat{\beta}_{i,0} + \hat{\beta}_{i,+1}}{1 + 2\hat{\rho}_m} $$ {#eq-sw} where $\hat{\beta}_{i,k}$ is the slope from regressing $r_{i,t}$ on $r_{m,t-k}$ alone, and $\hat{\rho}_m$ is the first-order autocorrelation of the market return. The Scholes-Williams estimator is consistent under the assumption that non-trading is the sole source of serial cross-correlation, while the Dimson estimator is more robust to additional sources of lead-lag structure. ```{python} #| label: scholes-williams-beta #| eval: false def estimate_sw_beta(group): """Estimate Scholes-Williams beta.""" g = group.dropna(subset=["ret", "rm", "rm_lag1", "rm_lead1"]) if len(g) < 60: return np.nan # Separate regressions beta_lag = sm.OLS(g["ret"], sm.add_constant(g["rm_lag1"])).fit().params.iloc[1] beta_0 = sm.OLS(g["ret"], sm.add_constant(g["rm"])).fit().params.iloc[1] beta_lead = sm.OLS(g["ret"], sm.add_constant(g["rm_lead1"])).fit().params.iloc[1] # Market autocorrelation rho_m = g["rm"].autocorr(lag=1) beta_sw = (beta_lag + beta_0 + beta_lead) / (1 + 2 * rho_m) return beta_sw beta_comparison["beta_sw"] = ( prices_daily .groupby("symbol") .apply(estimate_sw_beta, include_groups=False) .values ) ``` ## Implications for Portfolio Construction The microstructure frictions documented above have direct consequences for portfolio construction, particularly for strategies that involve rebalancing across the full cross-section of listed firms. ### Equal-Weighted vs. Value-Weighted Returns Equal-weighted portfolio returns give the same weight to each stock, including illiquid small-cap stocks that may contribute stale or noisy prices. Value-weighted returns tilt toward large, liquid stocks and are less susceptible to microstructure contamination. ```{python} #| label: fig-ew-vs-vw #| eval: false #| fig-cap: "Cumulative Returns: Equal-Weighted vs. Value-Weighted Market Portfolio" monthly_returns = ( prices_daily .groupby(["symbol", "year_month"]) .agg( monthly_ret=("ret", lambda x: (1 + x).prod() - 1), last_mktcap=("mktcap", "last") ) .reset_index() ) monthly_returns["date"] = monthly_returns["year_month"].dt.to_timestamp() # Equal-weighted ew_ret = monthly_returns.groupby("date")["monthly_ret"].mean().reset_index(name="ew") # Value-weighted def vw_return(group): w = group["last_mktcap"] / group["last_mktcap"].sum() return (w * group["monthly_ret"]).sum() vw_ret = ( monthly_returns.groupby("date") .apply(vw_return, include_groups=False) .reset_index(name="vw") ) port_comp = ew_ret.merge(vw_ret, on="date") fig, ax = plt.subplots(figsize=(8, 4)) for col, label, color in [ ("ew", "Equal-Weighted", "#FF6B6B"), ("vw", "Value-Weighted", "#2C73D2") ]: cum_ret = (1 + port_comp[col]).cumprod() ax.plot(port_comp["date"], cum_ret, label=label, color=color, linewidth=1.2) ax.set_ylabel("Cumulative Return (Growth of 1 VND)") ax.set_xlabel("") ax.legend(frameon=False) ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) plt.tight_layout() plt.show() ``` A persistent divergence between equal-weighted and value-weighted cumulative returns is a hallmark of microstructure effects: the equal-weighted portfolio overstates attainable returns because it implicitly assumes costless trading in illiquid stocks. ### Recommended Liquidity Filters Based on the diagnostics developed in this chapter, we recommend the following pre-analysis filters: ```{python} #| label: tbl-recommended-filters #| include: false #| tbl-cap: "Recommended Liquidity Filters for Asset Pricing Research" filters = pd.DataFrame({ "Filter": [ "Minimum trading days per month", "Maximum zero-return share per month", "Minimum average daily value (VND)", "Minimum listing age", "Exclude UPCoM stocks" ], "Threshold": [ "≥ 15 days", "≤ 50%", "≥ 100 million", "≥ 6 months", "Optional but recommended" ], "Rationale": [ "Ensures sufficient price observations for return computation", "Removes stocks with predominantly stale prices", "Ensures rebalancing is approximately feasible", "Avoids IPO effects and thin early trading", "UPCoM stocks have minimal disclosure and extreme illiquidity" ] }) filters.style.hide(axis="index") ``` ::: callout-tip Always report results with and without liquidity filters. If results are qualitatively different, the baseline findings may be driven by microstructure artifacts rather than genuine economic effects. ::: ### Monthly vs. Daily Frequency For most asset pricing applications, monthly return aggregation is preferable to daily analysis in Vietnam because: 1. Monthly returns smooth out intraday noise, bid-ask bounce, and price limit effects. 2. Stocks that trade infrequently within a month still produce a meaningful monthly return. 3. Factor portfolio sorts are conventionally conducted at monthly frequency. 4. Statistical tests have better size properties when microstructure noise is reduced. However, monthly aggregation does not eliminate all biases. Stocks with zero returns for an entire month still contribute stale observations. The Dimson and Scholes-Williams corrections should still be applied at monthly frequency for beta estimation. ## Implications for Asset Pricing Tests ### Factor Model Estimation Standard factor model estimation assumes that returns are observed synchronously and without censoring. In Vietnam, both assumptions are violated. The practical consequences are in @tbl-assumption-violations | Assumption | Violation in Vietnam | Consequence | |-------------------|--------------------------------|---------------------| | Synchronous observation | Thin trading | Biased betas, attenuated R² | | Uncensored returns | Price limits | Truncated distributions, biased moments | | Continuous trading | Discrete ticks | Return discreteness, bid-ask bounce | | No transaction costs | Wide spreads | Overstated portfolio returns | : Standard Assumptions and Their Violations {#tbl-assumption-violations} ### Adjusted Testing Procedure We recommend the following adjustments to standard asset pricing tests when applied to Vietnamese data: 1. **Beta estimation**: Use Dimson ($K \ge 1$) or Scholes-Williams betas, not OLS betas. 2. **Factor construction**: When forming size and value portfolios, apply liquidity filters before sorting. Consider excluding the smallest quintile of stocks by market capitalization, which is most affected by thin trading. 3. **Return aggregation**: Use monthly frequency. If daily analysis is necessary, include lagged market returns in the time-series regression. 4. **Robust inference**: Cluster standard errors by stock to account for persistent microstructure-induced serial correlation. Use Newey-West HAC standard errors with sufficient lags. 5. **Price limit adjustment**: For volatility analysis or risk measurement, consider the @chu2019forecasting approach of modeling the latent (uncensored) return distribution using truncated regression: $$ r_{i,t}^* \sim N(\mu_i, \sigma_i^2), \quad r_{i,t}^{obs} = \max(\underline{L}, \min(\bar{L}, r_{i,t}^*)) $$ {#eq-truncated} Estimate $\mu_i$ and $\sigma_i^2$ via maximum likelihood for the truncated normal. ```{python} #| label: truncated-volatility from scipy.optimize import minimize from scipy.stats import norm def truncated_normal_nll(params, returns, lower, upper): """Negative log-likelihood of truncated normal.""" mu, log_sigma = params sigma = np.exp(log_sigma) # Interior observations interior = (returns > lower) & (returns < upper) ll_interior = norm.logpdf(returns[interior], mu, sigma) # Lower censored ll_lower = norm.logcdf(lower, mu, sigma) n_lower = (returns <= lower).sum() # Upper censored ll_upper = np.log(1 - norm.cdf(upper, mu, sigma) + 1e-15) n_upper = (returns >= upper).sum() nll = -(ll_interior.sum() + n_lower * ll_lower + n_upper * ll_upper) return nll def estimate_true_volatility(returns, limit_band): """Estimate latent volatility correcting for price limit censoring.""" result = minimize( truncated_normal_nll, x0=[returns.mean(), np.log(returns.std())], args=(returns.values, -limit_band, limit_band), method="Nelder-Mead" ) mu, log_sigma = result.x return np.exp(log_sigma) ``` 6. **Sensitivity reporting**: Always report key results under alternative specifications: with and without liquidity filters, using OLS vs. Dimson betas, at daily vs. monthly frequency, and using observed vs. truncation-corrected volatility. ## Summary This chapter has established that Vietnamese equity markets exhibit microstructure characteristics that materially affect observed prices, returns, and risk measures. The key findings are: 1. **Price limits** censor daily returns, inducing positive autocorrelation, volatility spillover, and truncated distributions. The $\pm$ 7% band on HOSE is particularly restrictive for volatile stocks. 2. **Thin trading and zero returns** afflict a substantial fraction of listed firms. Trading probabilities below 50% are common on HNX and UPCoM, generating non-synchronous trading bias that attenuates OLS beta estimates by 20-40%. 3. **Illiquidity** varies dramatically across the cross-section, with Amihud ratios spanning several orders of magnitude. Value-weighted portfolio returns are less contaminated than equal-weighted returns. 4. **The Dimson and Scholes-Williams beta corrections** effectively address non-synchronous trading bias and should be used as the default beta estimator for Vietnamese equities. 5. **Liquidity filters** should be applied before any asset pricing analysis, and results should be reported with and without these filters as a robustness check. Ignoring these frictions does not merely add noise to empirical results, it systematically biases estimates in predictable directions. The diagnostics and corrections presented in this chapter provide the foundation for credible empirical asset pricing in Vietnam.