import pandas as pd
import numpy as np
import sqlite3
# Load daily price data
tidy_finance = sqlite3.connect(database="data/tidy_finance_python.sqlite")
# Assume prices_daily contains: symbol, date, close, exchange
prices_daily = pd.read_sql_query(
# , exchange
sql="""
SELECT symbol, date, close
FROM prices_daily
""",
con=tidy_finance,
parse_dates=["date"]
).dropna()4 Market Microstructure in Vietnam
In this chapter, we examine how the institutional design of Vietnamese equity markets, such as trading sessions, price limits, order types, and investor composition, shapes observed prices, returns, and liquidity. We quantify microstructure frictions and demonstrate why ignoring these frictions leads to biased inference in asset pricing tests.
Market microstructure is the study of how trading rules, order handling mechanisms, and market design affect price formation, transaction costs, and liquidity. The field, pioneered by Kyle (1985), Glosten and Milgrom (1985), and Hasbrouck (2007), provides the analytical toolkit for understanding why observed prices may deviate from fundamental values and for how long.
In developed markets with continuous electronic trading, designated market makers, and minimal regulatory constraints on price movement, microstructure frictions are typically second-order concerns for researchers working at monthly or lower frequencies. In Vietnam’s equity markets, this is emphatically not the case. Daily price limits, thin trading, a predominantly retail investor base, discrete tick sizes, and the absence of formal market-making arrangements generate frictions that propagate into monthly returns, distort factor loadings, and bias portfolio-level inference. Any serious empirical analysis of Vietnamese equities must therefore begin with a careful assessment of market microstructure.
This chapter provides that assessment. We first describe the institutional architecture of Vietnamese equity trading. We then develop diagnostics for the most consequential frictions, such as price limit hits, zero-return days, illiquidity, and non-synchronous trading, and quantify their severity in the cross-section of listed firms. Finally, we derive practical guidance for adjusting portfolio construction and asset pricing tests.
4.1 What Is Market Microstructure?
The textbook assumption of frictionless markets implies that prices continuously and costlessly incorporate information. Under this assumption, the observed return on any asset at any frequency equals the “true” return dictated by fundamentals. Market microstructure relaxes this assumption by recognizing that prices are generated by a specific trading process with real costs, constraints, and imperfections.
The canonical framework of Kyle (1985) models a market with three types of participants:
- an informed trader who knows the asset’s fundamental value,
- noise traders who trade for liquidity reasons, and
- a market maker who sets prices to break even in expectation.
The key insight is that the market maker cannot distinguish informed from uninformed order flow, so prices adjust gradually to information, creating a wedge between the transaction price and the fundamental value. The size of this wedge (the bid-ask spread) and the speed of price adjustment (market depth) are the core objects of microstructure theory.
Glosten and Milgrom (1985) extend this framework to a sequential trade setting and show that the bid-ask spread has two components: an adverse selection component (compensation for trading against informed traders) and an order processing component (compensation for the mechanical costs of trading). Huang and Stoll (1997) further decompose the spread into realized spread and price impact components. These decompositions are important because they reveal different sources of trading costs and have different implications for market quality.
For empirical asset pricing, the key question is: at what frequency and under what conditions do microstructure effects become negligible? In highly liquid markets, Bali, Engle, and Murray (2016) argue that monthly returns are largely free of microstructure contamination. In Vietnam, as we demonstrate below, this is not the case. Microstructure effects persist at monthly and even quarterly frequencies for a substantial fraction of listed firms.
4.1.1 The Microstructure-Asset Pricing Interface
The interface between microstructure and asset pricing operates through several channels. First, illiquidity itself may be a priced risk factor. Amihud (2002) shows that expected illiquidity is positively related to expected stock returns, implying a liquidity premium. Pástor and Stambaugh (2003) develop an equilibrium model in which liquidity risk (i.e., the covariance of a stock’s liquidity with market liquidity) commands a risk premium. Second, microstructure noise in prices biases estimated betas, factor loadings, and test statistics. Scholes and Williams (1977) first identified this bias in the context of non-synchronous trading, and Dimson (1979) proposed an aggregated-coefficients estimator to correct it. Third, price limits and other regulatory constraints censor the return distribution, creating truncation bias in volatility estimates, return moments, and extreme-value statistics (Kim and Rhee 1997).
Table 4.1 summarizes these channels and their empirical consequences.
| Channel | Mechanism | Empirical Consequence |
|---|---|---|
| Illiquidity premium | Compensation for bearing transaction costs and inventory risk | Cross-sectional return predictability by liquidity measures |
| Non-synchronous trading | Infrequent trading creates stale prices | Downward-biased betas, attenuated correlations, and spurious lead-lag |
| Price limits | Regulatory censoring of daily returns | Truncated return distributions, volatility spillover, and artificial autocorrelation |
| Discrete tick sizes | Prices constrained to a grid | Bid-ask bounce, return discreteness, biased volatility |
| Investor composition | Retail-dominated order flow | Noise trading, herding, sentiment-driven pricing |
4.2 Trading Architecture in Vietnam
Vietnam operates two stock exchanges: the Ho Chi Minh Stock Exchange (HOSE), established in 2000, and the Hanoi Stock Exchange (HNX), established in 2005. HOSE lists larger firms and accounts for the majority of market capitalization and trading volume. HNX lists smaller firms and also operates the Unlisted Public Company Market (UPCoM) for firms that have not yet met full listing requirements. All three venues operate electronic limit order book systems without designated market makers.
4.2.1 Exchange Characteristics
Table 4.2 presents the key structural differences between HOSE, HNX, and UPCoM. These differences have direct implications for liquidity, price discovery, and the severity of microstructure frictions.
| Feature | HOSE | HNX | UPCoM |
|---|---|---|---|
| Established | 2000 | 2005 | 2009 |
| Listing tier | Large-cap | Mid/small-cap | Pre-listing |
| Daily price limit | \(\pm\) 7% | \(\pm\) 10% | \(\pm\) 15% |
| Tick size regime | Tiered by price | Tiered by price | 100 VND |
| Trading lot | 100 shares | 100 shares | 100 shares |
| Short selling | Limited | Not available | Not available |
| Foreign ownership cap | Industry-specific | Industry-specific | Industry-specific |
The heterogeneous price limit bands across exchanges create a natural experiment for studying limit effects. HOSE’s tighter \(\pm\) 7% band means that large-cap stocks are more frequently constrained than mid-cap stocks on HNX, conditional on the same information shock. UPCoM’s wider \(\pm\) 15% band provides the least constrained environment, though its stocks are also the least liquid.
4.2.2 Trading Sessions
Each exchange operates a structured trading day with distinct sessions. Understanding session structure is essential because price formation mechanisms differ across sessions, and certain sessions are disproportionately important for benchmark pricing (Table 4.3).
| Session | Time | Mechanism | Price Discovery Role |
|---|---|---|---|
| Pre-opening | 08:30–09:00 | Order entry only, no matching | Reveals pre-open demand/supply |
| Opening auction (ATO) | 09:00–09:15 | Batch auction, single price | Sets opening price from accumulated orders |
| Continuous trading (Morning) | 09:15–11:30 | Continuous limit order matching | Primary price discovery |
| Lunch break | 11:30–13:00 | No trading | — |
| Continuous trading (Afternoon) | 13:00–14:30 | Continuous limit order matching | Primary price discovery |
| Closing auction (ATC) | 14:30–14:45 | Batch auction, single price | Sets closing price (benchmark) |
| Post-closing | 14:45–15:00 | Put-through (negotiated) trades | Block and negotiated transactions |
The closing auction (ATC) deserves particular attention. The ATC price is the official closing price used for index calculation, NAV computation, and margin requirements. Because it is determined by a single-bid auction, it can be manipulated by strategically timed orders, a phenomenon documented in numerous emerging markets (Comerton-Forde and Tang 2009; Hillion and Suominen 2004). Researchers using daily closing prices should be aware that ATC prices may not reflect the continuous-session equilibrium, particularly for less liquid stocks where a single large order can move the closing price.
4.2.3 Order Types and Matching Rules
Vietnamese exchanges support a limited set of order types compared to developed markets (Table 4.4).
| Order Type | Description | Availability |
|---|---|---|
| Limit order (LO) | Specifies price and quantity | All sessions |
| Market order (ATO/ATC) | Matches at auction price | Auction sessions only |
| Market-to-limit (MTL) | Converts to limit at best available | HNX only |
The absence of iceberg orders, stop orders, and hidden orders means that the full limit order book is visible to all participants. While this enhances pre-trade transparency, it also means that large institutional orders face significant information leakage risk, which may deter institutional participation and reduce market depth.
Orders are matched on a strict price-time priority basis during continuous sessions. During auction sessions, a single clearing price is determined that maximizes executed volume. If multiple prices satisfy this criterion, the price closest to the previous closing price is selected.
4.2.4 Tick Size Structure
Tick sizes on HOSE are tiered by price level, which creates discontinuities in the bid-ask spread as a percentage of price (Table 4.5).
| Price Range (VND) | Tick Size (VND) | Minimum Spread as % of Midpoint |
|---|---|---|
| < 10,000 | 10 | 0.10% at 10,000 |
| 10,000–49,900 | 50 | 0.10% at 50,000 |
| ≥ 50,000 | 100 | 0.20% at 50,000 |
The jump from a 50 VND tick to a 100 VND tick at the 50,000 VND boundary means that the minimum percentage spread doubles discontinuously. This creates a “tick size cliff” that can affect the cross-sectional distribution of bid-ask spreads and, consequently, the measurement of illiquidity (Vo and Doan 2023). Bessembinder (2003) document similar effects in other markets with tiered tick structures.
4.2.5 Investor Composition
The Vietnamese equity market is predominantly driven by retail investors. While foreign institutional investors account for a meaningful share of market capitalization (particularly in blue-chip stocks subject to foreign ownership limits), daily trading volume is overwhelmingly generated by domestic retail accounts.
This retail dominance has several consequences for microstructure. First, retail investors tend to submit smaller orders and trade more frequently, generating high message-to-trade ratios but limited depth at each price level. Second, retail order flow is more susceptible to herding and sentiment, which can amplify momentum and generate excess volatility (Barber et al. 2009; Kaniel et al. 2012). Third, the limited institutional presence means that sophisticated liquidity provision is scarce, particularly in mid- and small-cap stocks.
4.3 Price Limits and Their Consequences
Vietnam enforces daily price limits on all listed equities. A stock’s price cannot move beyond a fixed percentage of the previous day’s closing price within a single trading day. The limit bands are \(\pm\) 7% on HOSE, \(\pm\) 10% on HNX, and \(\pm\) 15% on UPCoM.
4.3.1 Theoretical Framework
Price limits were introduced with the stated goal of reducing volatility and preventing panic-driven price dislocations. However, the academic literature presents a more nuanced picture. The “magnet effect” hypothesis (Subrahmanyam 1994) predicts that price limits actually accelerate price movement toward the limit as traders rush to execute before the limit is hit. The “delayed price discovery” hypothesis (Fama and French 1989) argues that limits merely postpone inevitable price adjustments, creating volatility spillover into subsequent days.
Formally, let \(P_t^*\) denote the equilibrium price on day \(t\) and \(P_{t-1}^c\) the previous closing price. The observed return is:
\[ r_t^{obs} = \begin{cases} \bar{L} & \text{if } r_t^* \geq \bar{L} \\ r_t^* & \text{if } \underline{L} < r_t^* < \bar{L} \\ \underline{L} & \text{if } r_t^* \leq \underline{L} \end{cases} \tag{4.1}\]
where \(r_t^* = \ln(P_t^* / P_{t-1}^c)\) is the latent (unconstrained) return, \(\bar{L}\) is the upper limit, and \(\underline{L}\) is the lower limit. The observed return \(r_t^{obs}\) is a censored version of the true return. This censoring has several consequences:
Truncated moments: The observed variance \(\text{Var}(r_t^{obs}) < \text{Var}(r_t^*)\) because extreme returns are clipped. This biases downward any volatility-based risk measure.
Artificial autocorrelation: When \(r_t^{obs} = \bar{L}\) and \(r_{t+1}^{obs} > 0\) (continued adjustment the next day), the return series exhibits positive autocorrelation that is purely mechanical, not informational.
Volatility spillover: Define excess volatility on day \(t+1\) as \(\sigma_{t+1}^2 - E[\sigma_{t+1}^2 | \text{no limit hit on day } t]\). Kim and Rhee (1997) and Chu and Qiu (2019) document significant positive spillover, where days following limit hits exhibit abnormally high volatility.
Biased extreme value statistics: Measures such as Value-at-Risk, Expected Shortfall, and maximum drawdown are mechanically bounded by the limit, understating true tail risk.
4.3.2 Detecting Price Limit Hits
We now implement a diagnostic to detect price limit hits in the daily data.
# Define limit bands by exchange
limit_bands = {"HOSE": 0.07, "HNX": 0.10, "UPCoM": 0.15}
prices_daily = prices_daily.sort_values(["symbol", "date"])
prices_daily["prev_close"] = prices_daily.groupby("symbol")["close"].shift(1)
prices_daily["ret"] = prices_daily["close"] / prices_daily["prev_close"] - 1
prices_daily["limit_band"] = prices_daily["exchange"].map(limit_bands)
# A limit hit occurs when the return is within 0.1% of the theoretical limit
tolerance = 0.001
prices_daily["upper_hit"] = (
prices_daily["ret"] >= prices_daily["limit_band"] - tolerance
)
prices_daily["lower_hit"] = (
prices_daily["ret"] <= -prices_daily["limit_band"] + tolerance
)
prices_daily["limit_hit"] = (
prices_daily["upper_hit"] | prices_daily["lower_hit"]
)4.3.3 Frequency of Limit Hits
prices_daily["year_month"] = prices_daily["date"].dt.to_period("M")
limit_hit_monthly = (
prices_daily
.groupby(["year_month", "exchange"])
.agg(
total_obs=("limit_hit", "count"),
limit_hits=("limit_hit", "sum")
)
.reset_index()
)
limit_hit_monthly["hit_rate"] = (
limit_hit_monthly["limit_hits"] / limit_hit_monthly["total_obs"]
)
limit_hit_monthly["date"] = limit_hit_monthly["year_month"].dt.to_timestamp()
fig, ax = plt.subplots(figsize=(8, 4))
for exchange, color in zip(
["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"]
):
subset = limit_hit_monthly[limit_hit_monthly["exchange"] == exchange]
ax.plot(
subset["date"], subset["hit_rate"] * 100,
label=exchange, color=color, linewidth=1.2
)
ax.set_ylabel("Limit Hit Rate (%)")
ax.set_xlabel("")
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()4.3.4 Volatility Spillover Test
Following Kim and Rhee (1997), we test whether days following a limit hit exhibit abnormally high volatility. Define the dummy variable \(D_t = 1\) if a limit hit occurred on day \(t\), and estimate:
\[ \sigma_{t+1}^2 = \alpha + \beta D_t + \gamma \sigma_t^2 + \varepsilon_{t+1} \tag{4.2}\]
where \(\sigma_t^2\) is the squared return. A positive and significant \(\beta\) indicates volatility spillover attributable to the price limit.
import statsmodels.api as sm
# Panel-level volatility spillover test
prices_daily["sq_ret"] = prices_daily["ret"] ** 2
prices_daily["sq_ret_lead"] = prices_daily.groupby("symbol")["sq_ret"].shift(-1)
prices_daily["limit_hit_int"] = prices_daily["limit_hit"].astype(int)
spillover_data = prices_daily.dropna(subset=["sq_ret_lead", "sq_ret"])
X = sm.add_constant(spillover_data[["limit_hit_int", "sq_ret"]])
y = spillover_data["sq_ret_lead"]
model = sm.OLS(y, X).fit(cov_type="cluster", cov_kwds={"groups": spillover_data["symbol"]})
spillover_results = pd.DataFrame({
"Coefficient": model.params,
"Std. Error": model.bse,
"t-stat": model.tvalues,
"p-value": model.pvalues
}).round(6)
print(spillover_results)A significant positive coefficient on the limit hit dummy confirms that Vietnamese price limits do not eliminate volatility, they merely redistribute it across days. This has direct implications for risk management: daily VaR measures computed from censored returns understate true risk exposure.
4.3.5 Return Autocorrelation Induced by Price Limits
Price limits mechanically induce positive autocorrelation in returns. To quantify this, we compute the first-order autocorrelation coefficient separately for stocks that hit limits frequently versus those that do not.
# Classify stocks by limit hit frequency
stock_limit_freq = (
prices_daily
.groupby("symbol")
.agg(
hit_rate=("limit_hit", "mean"),
n_obs=("ret", "count")
)
.query("n_obs >= 250") # At least 1 year of data
)
stock_limit_freq["limit_group"] = pd.qcut(
stock_limit_freq["hit_rate"], q=3,
labels=["Low", "Medium", "High"]
)
# Compute autocorrelation by group
def compute_autocorr(group_symbols):
subset = prices_daily[prices_daily["symbol"].isin(group_symbols)].copy()
subset["ret_lag"] = subset.groupby("symbol")["ret"].shift(1)
return subset[["ret", "ret_lag"]].dropna().corr().iloc[0, 1]
autocorr_results = []
for group in ["Low", "Medium", "High"]:
symbols = stock_limit_freq[stock_limit_freq["limit_group"] == group].index
ac = compute_autocorr(symbols)
n_stocks = len(symbols)
avg_hit_rate = stock_limit_freq.loc[symbols, "hit_rate"].mean()
autocorr_results.append({
"Group": group,
"N Stocks": n_stocks,
"Avg Limit Hit Rate (%)": round(avg_hit_rate * 100, 2),
"AR(1)": round(ac, 4)
})
pd.DataFrame(autocorr_results).style.hide(axis="index")The expected pattern is a monotonically increasing autocorrelation from the Low to High limit-hit group, confirming that the observed serial dependence in returns is at least partly an artifact of price censoring rather than genuine return predictability.
4.4 Liquidity, Thin Trading, and Zero Returns
Liquidity (i.e., the ability to trade quickly at low cost without moving the price) is a first-order concern in Vietnamese equities. A substantial fraction of listed firms, particularly on HNX and UPCoM, experience chronic illiquidity characterized by infrequent trading, wide bid-ask spreads, and frequent zero-return days.
4.4.1 Measuring Liquidity
The academic literature has developed numerous liquidity measures, each capturing a different dimension of market quality. @#tbl-liquidity-measures summarizes the measures most applicable to Vietnamese data, given typical data availability.
| Measure | Formula | Interpretation | Data Required |
|---|---|---|---|
| Turnover ratio | \(\text{TO}_{i,t} = \frac{\text{Volume}_{i,t}}{\text{Shares Outstanding}_{i}}\) | Trading intensity relative to float | Volume, shares outstanding |
| Amihud illiquidity | \(\text{ILLIQ}_{i,t} = \frac{1}{D} \sum_{d=1}^{D} \frac{|r_{i,d}|}{V_{i,d}}\) | Price impact per unit of volume | Daily returns, daily volume |
| Zero-return proportion | \(\text{ZR}_{i,t} = \frac{\#\{d : r_{i,d} = 0\}}{D}\) | Frequency of non-trading or stale pricing | Daily returns |
| Roll spread | \(\hat{S}_i = 2\sqrt{-\text{Cov}(r_{i,d}, r_{i,d-1})}\) | Effective bid-ask spread estimate | Daily returns |
| Bid-ask spread | \(\text{BA}_{i,d} = \frac{\text{Ask}_{i,d} - \text{Bid}_{i,d}}{(\text{Ask}_{i,d} + \text{Bid}_{i,d})/2}\) | Direct transaction cost | Quote data |
The Amihud illiquidity ratio (Amihud 2002) is particularly useful because it requires only daily return and volume data. It captures the price impact of trading (i.e., the return per unit of currency volume) and has been shown to correlate well with more sophisticated microstructure-based measures such as the effective spread Goyenko and Ukhov (2009).
4.4.2 Computing Liquidity Diagnostics
# Compute standard liquidity measures at the stock-month level
prices_daily["abs_ret"] = prices_daily["ret"].abs()
prices_daily["zero_return"] = (prices_daily["ret"] == 0).astype(int)
prices_daily["year_month"] = prices_daily["date"].dt.to_period("M")
# Assume volume is in shares and value is in VND
# Amihud: average |ret| / value (in billions VND)
prices_daily["amihud_daily"] = np.where(
prices_daily["value"] > 0,
prices_daily["abs_ret"] / (prices_daily["value"] / 1e9),
np.nan
)
liquidity_monthly = (
prices_daily
.groupby(["symbol", "year_month"])
.agg(
zero_return_share=("zero_return", "mean"),
avg_turnover=("turnover", "mean"),
amihud=("amihud_daily", "mean"),
trading_days=("ret", "count"),
avg_daily_value=("value", "mean")
)
.reset_index()
)
# Flag severely illiquid stock-months
liquidity_monthly["illiquid_flag"] = (
(liquidity_monthly["zero_return_share"] > 0.5) |
(liquidity_monthly["trading_days"] < 10) |
(liquidity_monthly["avg_daily_value"] < 1e8) # < 100M VND/day
)4.4.3 Cross-Sectional Distribution of Liquidity
latest_year = liquidity_monthly["year_month"].dt.year.max()
annual_liq = (
liquidity_monthly[liquidity_monthly["year_month"].dt.year == latest_year]
.groupby("symbol")
.agg(
zero_return_share=("zero_return_share", "mean"),
avg_turnover=("avg_turnover", "mean"),
amihud=("amihud", "mean"),
avg_daily_value_m=("avg_daily_value", lambda x: x.mean() / 1e6)
)
)
summary_stats = annual_liq.describe(percentiles=[0.10, 0.25, 0.50, 0.75, 0.90]).T
summary_stats = summary_stats[
["mean", "std", "10%", "25%", "50%", "75%", "90%"]
].round(4)
summary_stats.columns = ["Mean", "Std", "P10", "P25", "Median", "P75", "P90"]
summary_stats.index = [
"Zero-Return Share",
"Avg Daily Turnover",
"Amihud Illiquidity",
"Avg Daily Value (M VND)"
]
summary_stats4.4.4 Liquidity Distribution Across Exchanges
# Merge exchange info
stock_exchange = (
prices_daily[["symbol", "exchange"]]
.drop_duplicates("symbol")
)
annual_liq_exch = annual_liq.merge(
stock_exchange, left_index=True, right_on="symbol"
)
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
# Zero-return share
for exchange, color in zip(
["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"]
):
subset = annual_liq_exch[annual_liq_exch["exchange"] == exchange]
axes[0].hist(
subset["zero_return_share"], bins=30, alpha=0.6,
color=color, label=exchange, density=True
)
axes[0].set_xlabel("Zero-Return Share")
axes[0].set_ylabel("Density")
axes[0].legend(frameon=False)
axes[0].spines["top"].set_visible(False)
axes[0].spines["right"].set_visible(False)
# Amihud (log scale)
for exchange, color in zip(
["HOSE", "HNX", "UPCoM"], ["#2C73D2", "#FF6B6B", "#5DCEAF"]
):
subset = annual_liq_exch[annual_liq_exch["exchange"] == exchange]
amihud_log = np.log(subset["amihud"].clip(lower=1e-10))
axes[1].hist(
amihud_log, bins=30, alpha=0.6,
color=color, label=exchange, density=True
)
axes[1].set_xlabel("Log Amihud Illiquidity")
axes[1].set_ylabel("Density")
axes[1].legend(frameon=False)
axes[1].spines["top"].set_visible(False)
axes[1].spines["right"].set_visible(False)
plt.tight_layout()
plt.show()The distributions typically reveal a bimodal pattern: HOSE stocks cluster at low illiquidity values, while HNX and especially UPCoM stocks exhibit a long right tail of extreme illiquidity. This heterogeneity implies that a single liquidity filter or treatment is insufficient for the entire cross-section.
4.4.5 Time Variation in Aggregate Liquidity
Market-wide liquidity is not constant. It deteriorates during crises, policy uncertainty, and periods of capital outflow, and improves during bull markets and periods of foreign inflow. The time variation in aggregate liquidity is itself a risk factor (Pástor and Stambaugh 2003).
agg_liquidity = (
liquidity_monthly
.groupby("year_month")
.agg(
median_amihud=("amihud", "median"),
median_zero_ret=("zero_return_share", "median"),
total_value=("avg_daily_value", "sum")
)
.reset_index()
)
agg_liquidity["date"] = agg_liquidity["year_month"].dt.to_timestamp()
fig, ax1 = plt.subplots(figsize=(8, 4))
ax1.plot(
agg_liquidity["date"],
np.log(agg_liquidity["median_amihud"].clip(lower=1e-10)),
color="#2C73D2", linewidth=1.2
)
ax1.set_ylabel("Log Median Amihud", color="#2C73D2")
ax1.tick_params(axis="y", labelcolor="#2C73D2")
ax2 = ax1.twinx()
ax2.fill_between(
agg_liquidity["date"],
agg_liquidity["median_zero_ret"] * 100,
alpha=0.3, color="#FF6B6B"
)
ax2.set_ylabel("Median Zero-Return Share (%)", color="#FF6B6B")
ax2.tick_params(axis="y", labelcolor="#FF6B6B")
ax1.spines["top"].set_visible(False)
plt.tight_layout()
plt.show()Before any asset pricing analysis, apply the following liquidity filter: exclude stock-months where the zero-return share exceeds 50%, where fewer than 15 trading days are observed, or where average daily trading value falls below a threshold (e.g., 100 million VND). Document the filter explicitly, and report sensitivity of results to alternative thresholds.
4.5 Bid-Ask Spread Estimation
In the absence of comprehensive quote data, the effective bid-ask spread can be estimated from transaction data using the method of Roll (1984). The Roll estimator exploits the fact that if the bid-ask bounce is the sole source of negative serial covariance in returns, then:
\[ \hat{S}_{\text{Roll}} = 2\sqrt{-\text{Cov}(\Delta p_t, \Delta p_{t-1})} \tag{4.3}\]
where \(\Delta p_t = p_t - p_{t-1}\) is the price change. When the autocovariance is positive (which occurs when information-driven serial correlation dominates the bid-ask bounce), the Roll estimator is undefined. Hasbrouck (2009) proposes a Bayesian variant that handles this case by imposing a prior on the spread.
# Compute Roll spread estimate at the stock-month level
prices_daily["dprice"] = prices_daily.groupby("symbol")["close"].diff()
prices_daily["dprice_lag"] = prices_daily.groupby("symbol")["dprice"].shift(1)
roll_cov = (
prices_daily
.groupby(["symbol", "year_month"])
.apply(
lambda g: g[["dprice", "dprice_lag"]].dropna().cov().iloc[0, 1],
include_groups=False
)
.reset_index(name="autocovariance")
)
# Roll spread is defined only when autocovariance is negative
roll_cov["roll_spread"] = np.where(
roll_cov["autocovariance"] < 0,
2 * np.sqrt(-roll_cov["autocovariance"]),
np.nan
)
# As a percentage of price
roll_cov = roll_cov.merge(
prices_daily.groupby(["symbol", "year_month"])["close"].mean()
.reset_index(name="avg_price"),
on=["symbol", "year_month"]
)
roll_cov["roll_spread_pct"] = roll_cov["roll_spread"] / roll_cov["avg_price"] * 100roll_summary = (
roll_cov
.dropna(subset=["roll_spread_pct"])
.groupby("year_month")["roll_spread_pct"]
.describe(percentiles=[0.25, 0.50, 0.75])
.reset_index()
)
# Show latest year summary
latest_year_roll = roll_cov[
roll_cov["year_month"].dt.year == roll_cov["year_month"].dt.year.max()
]
print(
latest_year_roll["roll_spread_pct"]
.dropna()
.describe(percentiles=[0.10, 0.25, 0.50, 0.75, 0.90])
.round(3)
)4.6 Non-Synchronous Trading Bias
When stocks do not trade at the same frequency or at the same times, observed returns are misaligned. This non-synchronous trading bias, first formalized by Scholes and Williams (1977) and Lo and MacKinlay (1990), is one of the most consequential microstructure effects for asset pricing in thin markets.
4.6.1 The Problem
Suppose the true (unobserved) return process for stock \(i\) follows a single-factor model:
\[ r_{i,t}^* = \alpha_i + \beta_i r_{m,t}^* + \varepsilon_{i,t} \tag{4.4}\]
where \(r_{m,t}^*\) is the true market return and \(\beta_i\) is the true beta. If stock \(i\) last traded \(k\) days before the end of day \(t\), the observed return incorporates information only up to day \(t - k\). Scholes and Williams (1977) show that the OLS estimate of beta from regressing observed returns on observed market returns is:
\[ \hat{\beta}_i^{OLS} = \beta_i \cdot \pi_i \tag{4.5}\]
where \(\pi_i\) is the probability that stock \(i\) trades on any given day. For a stock that trades on only 50% of days, the OLS beta is biased downward by 50%. This bias is severe in Vietnam, where many small-cap stocks trade on fewer than half of all trading days.
4.6.2 Quantifying the Bias
# Compute trading frequency: proportion of market days with nonzero volume
market_days = prices_daily.groupby("year_month")["date"].nunique()
trading_freq = (
prices_daily[prices_daily["value"] > 0]
.groupby(["symbol", "year_month"])["date"]
.nunique()
.reset_index(name="days_traded")
)
trading_freq = trading_freq.merge(
market_days.reset_index().rename(columns={"date": "market_days"}),
on="year_month"
)
trading_freq["trade_prob"] = trading_freq["days_traded"] / trading_freq["market_days"]
# Annual average
annual_trade_freq = (
trading_freq
.groupby("symbol")["trade_prob"]
.mean()
.reset_index(name="avg_trade_prob")
)
fig, ax = plt.subplots(figsize=(7, 4))
ax.hist(
annual_trade_freq["avg_trade_prob"], bins=50,
color="#2C73D2", edgecolor="white", alpha=0.8
)
ax.axvline(
annual_trade_freq["avg_trade_prob"].median(),
color="#FF6B6B", linestyle="--", linewidth=1.5,
label=f"Median = {annual_trade_freq['avg_trade_prob'].median():.2f}"
)
ax.set_xlabel("Average Trading Probability (Fraction of Market Days)")
ax.set_ylabel("Number of Stocks")
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()4.6.3 The Dimson Beta Correction
Dimson (1979) proposes a simple correction: include lagged and leading market returns in the beta regression:
\[ r_{i,t} = \alpha_i + \sum_{k=-K}^{K} \beta_{i,k} \, r_{m,t-k} + \varepsilon_{i,t} \tag{4.6}\]
The Dimson-corrected beta is \(\hat{\beta}_i^{Dimson} = \sum_{k=-K}^{K} \hat{\beta}_{i,k}\). Typically \(K = 1\) or \(K = 2\) is sufficient. The summed coefficients capture the full response of the stock’s observed return to market information, regardless of when the stock actually trades.
# Estimate Dimson betas with K=1 lag and lead
# Merge market return
market_ret = (
prices_daily
.groupby("date")
.apply(
lambda g: np.average(g["ret"].dropna(), weights=g["mktcap"].loc[g["ret"].dropna().index])
if g["ret"].dropna().shape[0] > 0 else np.nan,
include_groups=False
)
.reset_index(name="rm")
)
prices_daily = prices_daily.merge(market_ret, on="date", how="left")
prices_daily["rm_lag1"] = prices_daily.groupby("symbol")["rm"].shift(1)
prices_daily["rm_lead1"] = prices_daily.groupby("symbol")["rm"].shift(-1)
def estimate_dimson_beta(group):
"""Estimate OLS and Dimson(K=1) betas for a single stock."""
g = group.dropna(subset=["ret", "rm", "rm_lag1", "rm_lead1"])
if len(g) < 60:
return pd.Series({"beta_ols": np.nan, "beta_dimson": np.nan, "n_obs": len(g)})
# OLS beta
X_ols = sm.add_constant(g["rm"])
ols_model = sm.OLS(g["ret"], X_ols).fit()
beta_ols = ols_model.params["rm"]
# Dimson beta
X_dim = sm.add_constant(g[["rm_lag1", "rm", "rm_lead1"]])
dim_model = sm.OLS(g["ret"], X_dim).fit()
beta_dimson = dim_model.params[["rm_lag1", "rm", "rm_lead1"]].sum()
return pd.Series({
"beta_ols": beta_ols,
"beta_dimson": beta_dimson,
"n_obs": len(g)
})
beta_comparison = (
prices_daily
.groupby("symbol")
.apply(estimate_dimson_beta, include_groups=False)
.reset_index()
)beta_valid = beta_comparison.dropna()
fig, ax = plt.subplots(figsize=(6, 6))
ax.scatter(
beta_valid["beta_ols"], beta_valid["beta_dimson"],
alpha=0.3, s=10, color="#2C73D2"
)
lims = [
min(ax.get_xlim()[0], ax.get_ylim()[0]),
max(ax.get_xlim()[1], ax.get_ylim()[1])
]
ax.plot(lims, lims, "--", color="gray", linewidth=1)
ax.set_xlabel("OLS Beta")
ax.set_ylabel("Dimson Beta (K=1)")
ax.set_aspect("equal")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()The scatter plot should reveal a systematic pattern: Dimson betas exceed OLS betas for most stocks, with the discrepancy largest for thinly traded stocks. Points above the 45-degree line indicate stocks whose OLS betas are biased downward by non-synchronous trading.
beta_with_freq = beta_valid.merge(annual_trade_freq, on="symbol")
beta_with_freq["freq_tercile"] = pd.qcut(
beta_with_freq["avg_trade_prob"], q=3,
labels=["Low (Thin)", "Medium", "High (Liquid)"]
)
beta_bias_summary = (
beta_with_freq
.groupby("freq_tercile")
.agg(
n_stocks=("symbol", "count"),
avg_trade_prob=("avg_trade_prob", "mean"),
mean_beta_ols=("beta_ols", "mean"),
mean_beta_dimson=("beta_dimson", "mean"),
median_beta_ols=("beta_ols", "median"),
median_beta_dimson=("beta_dimson", "median")
)
.round(3)
)
beta_bias_summary["bias_pct"] = (
(beta_bias_summary["mean_beta_dimson"] - beta_bias_summary["mean_beta_ols"])
/ beta_bias_summary["mean_beta_dimson"] * 100
).round(1)
beta_bias_summaryFor the thinnest-traded tercile, OLS beta underestimates true systematic risk by 20-40% on average. Using uncorrected betas for cost of equity estimation or factor model tests will produce systematically incorrect results for these stocks.
4.6.4 The Scholes-Williams Estimator
An alternative correction, proposed by Scholes and Williams (1977), estimates beta as:
\[ \hat{\beta}_i^{SW} = \frac{\hat{\beta}_{i,-1} + \hat{\beta}_{i,0} + \hat{\beta}_{i,+1}}{1 + 2\hat{\rho}_m} \tag{4.7}\]
where \(\hat{\beta}_{i,k}\) is the slope from regressing \(r_{i,t}\) on \(r_{m,t-k}\) alone, and \(\hat{\rho}_m\) is the first-order autocorrelation of the market return. The Scholes-Williams estimator is consistent under the assumption that non-trading is the sole source of serial cross-correlation, while the Dimson estimator is more robust to additional sources of lead-lag structure.
def estimate_sw_beta(group):
"""Estimate Scholes-Williams beta."""
g = group.dropna(subset=["ret", "rm", "rm_lag1", "rm_lead1"])
if len(g) < 60:
return np.nan
# Separate regressions
beta_lag = sm.OLS(g["ret"], sm.add_constant(g["rm_lag1"])).fit().params.iloc[1]
beta_0 = sm.OLS(g["ret"], sm.add_constant(g["rm"])).fit().params.iloc[1]
beta_lead = sm.OLS(g["ret"], sm.add_constant(g["rm_lead1"])).fit().params.iloc[1]
# Market autocorrelation
rho_m = g["rm"].autocorr(lag=1)
beta_sw = (beta_lag + beta_0 + beta_lead) / (1 + 2 * rho_m)
return beta_sw
beta_comparison["beta_sw"] = (
prices_daily
.groupby("symbol")
.apply(estimate_sw_beta, include_groups=False)
.values
)4.7 Implications for Portfolio Construction
The microstructure frictions documented above have direct consequences for portfolio construction, particularly for strategies that involve rebalancing across the full cross-section of listed firms.
4.7.1 Equal-Weighted vs. Value-Weighted Returns
Equal-weighted portfolio returns give the same weight to each stock, including illiquid small-cap stocks that may contribute stale or noisy prices. Value-weighted returns tilt toward large, liquid stocks and are less susceptible to microstructure contamination.
monthly_returns = (
prices_daily
.groupby(["symbol", "year_month"])
.agg(
monthly_ret=("ret", lambda x: (1 + x).prod() - 1),
last_mktcap=("mktcap", "last")
)
.reset_index()
)
monthly_returns["date"] = monthly_returns["year_month"].dt.to_timestamp()
# Equal-weighted
ew_ret = monthly_returns.groupby("date")["monthly_ret"].mean().reset_index(name="ew")
# Value-weighted
def vw_return(group):
w = group["last_mktcap"] / group["last_mktcap"].sum()
return (w * group["monthly_ret"]).sum()
vw_ret = (
monthly_returns.groupby("date")
.apply(vw_return, include_groups=False)
.reset_index(name="vw")
)
port_comp = ew_ret.merge(vw_ret, on="date")
fig, ax = plt.subplots(figsize=(8, 4))
for col, label, color in [
("ew", "Equal-Weighted", "#FF6B6B"),
("vw", "Value-Weighted", "#2C73D2")
]:
cum_ret = (1 + port_comp[col]).cumprod()
ax.plot(port_comp["date"], cum_ret, label=label, color=color, linewidth=1.2)
ax.set_ylabel("Cumulative Return (Growth of 1 VND)")
ax.set_xlabel("")
ax.legend(frameon=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()A persistent divergence between equal-weighted and value-weighted cumulative returns is a hallmark of microstructure effects: the equal-weighted portfolio overstates attainable returns because it implicitly assumes costless trading in illiquid stocks.
4.7.2 Recommended Liquidity Filters
Based on the diagnostics developed in this chapter, we recommend the following pre-analysis filters:
Always report results with and without liquidity filters. If results are qualitatively different, the baseline findings may be driven by microstructure artifacts rather than genuine economic effects.
4.7.3 Monthly vs. Daily Frequency
For most asset pricing applications, monthly return aggregation is preferable to daily analysis in Vietnam because:
- Monthly returns smooth out intraday noise, bid-ask bounce, and price limit effects.
- Stocks that trade infrequently within a month still produce a meaningful monthly return.
- Factor portfolio sorts are conventionally conducted at monthly frequency.
- Statistical tests have better size properties when microstructure noise is reduced.
However, monthly aggregation does not eliminate all biases. Stocks with zero returns for an entire month still contribute stale observations. The Dimson and Scholes-Williams corrections should still be applied at monthly frequency for beta estimation.
4.8 Implications for Asset Pricing Tests
4.8.1 Factor Model Estimation
Standard factor model estimation assumes that returns are observed synchronously and without censoring. In Vietnam, both assumptions are violated. The practical consequences are in Table 4.11
| Assumption | Violation in Vietnam | Consequence |
|---|---|---|
| Synchronous observation | Thin trading | Biased betas, attenuated R² |
| Uncensored returns | Price limits | Truncated distributions, biased moments |
| Continuous trading | Discrete ticks | Return discreteness, bid-ask bounce |
| No transaction costs | Wide spreads | Overstated portfolio returns |
4.8.2 Adjusted Testing Procedure
We recommend the following adjustments to standard asset pricing tests when applied to Vietnamese data:
Beta estimation: Use Dimson (\(K \ge 1\)) or Scholes-Williams betas, not OLS betas.
Factor construction: When forming size and value portfolios, apply liquidity filters before sorting. Consider excluding the smallest quintile of stocks by market capitalization, which is most affected by thin trading.
Return aggregation: Use monthly frequency. If daily analysis is necessary, include lagged market returns in the time-series regression.
Robust inference: Cluster standard errors by stock to account for persistent microstructure-induced serial correlation. Use Newey-West HAC standard errors with sufficient lags.
Price limit adjustment: For volatility analysis or risk measurement, consider the Chu and Qiu (2019) approach of modeling the latent (uncensored) return distribution using truncated regression:
\[ r_{i,t}^* \sim N(\mu_i, \sigma_i^2), \quad r_{i,t}^{obs} = \max(\underline{L}, \min(\bar{L}, r_{i,t}^*)) \tag{4.8}\]
Estimate \(\mu_i\) and \(\sigma_i^2\) via maximum likelihood for the truncated normal.
from scipy.optimize import minimize
from scipy.stats import norm
def truncated_normal_nll(params, returns, lower, upper):
"""Negative log-likelihood of truncated normal."""
mu, log_sigma = params
sigma = np.exp(log_sigma)
# Interior observations
interior = (returns > lower) & (returns < upper)
ll_interior = norm.logpdf(returns[interior], mu, sigma)
# Lower censored
ll_lower = norm.logcdf(lower, mu, sigma)
n_lower = (returns <= lower).sum()
# Upper censored
ll_upper = np.log(1 - norm.cdf(upper, mu, sigma) + 1e-15)
n_upper = (returns >= upper).sum()
nll = -(ll_interior.sum() + n_lower * ll_lower + n_upper * ll_upper)
return nll
def estimate_true_volatility(returns, limit_band):
"""Estimate latent volatility correcting for price limit censoring."""
result = minimize(
truncated_normal_nll,
x0=[returns.mean(), np.log(returns.std())],
args=(returns.values, -limit_band, limit_band),
method="Nelder-Mead"
)
mu, log_sigma = result.x
return np.exp(log_sigma)- Sensitivity reporting: Always report key results under alternative specifications: with and without liquidity filters, using OLS vs. Dimson betas, at daily vs. monthly frequency, and using observed vs. truncation-corrected volatility.
4.9 Summary
This chapter has established that Vietnamese equity markets exhibit microstructure characteristics that materially affect observed prices, returns, and risk measures. The key findings are:
Price limits censor daily returns, inducing positive autocorrelation, volatility spillover, and truncated distributions. The \(\pm\) 7% band on HOSE is particularly restrictive for volatile stocks.
Thin trading and zero returns afflict a substantial fraction of listed firms. Trading probabilities below 50% are common on HNX and UPCoM, generating non-synchronous trading bias that attenuates OLS beta estimates by 20-40%.
Illiquidity varies dramatically across the cross-section, with Amihud ratios spanning several orders of magnitude. Value-weighted portfolio returns are less contaminated than equal-weighted returns.
The Dimson and Scholes-Williams beta corrections effectively address non-synchronous trading bias and should be used as the default beta estimator for Vietnamese equities.
Liquidity filters should be applied before any asset pricing analysis, and results should be reported with and without these filters as a robustness check.
Ignoring these frictions does not merely add noise to empirical results, it systematically biases estimates in predictable directions. The diagnostics and corrections presented in this chapter provide the foundation for credible empirical asset pricing in Vietnam.